Christian Heilmann

Posts Tagged ‘opentables’

Adding a world globe and location information to your site with YQL

Sunday, August 1st, 2010

Whilst looking around the open tables in YQL I found a table with earthquake information released by the United States Geological Survey. One thing the RSS feeds returned from that service had was quite a cool picture of Earth with the location as a star:

Example of the globes rendered by the USGS web service

Looking at the source I realised that the image URL has a certain logic to it:

http://earthquake.usgs.gov/images/globes/50_40.jpg

The first number is the latitude, the second the longitude of the location. Each of them need to be multiples of 5 to result in an image. Try it out by changing the values.

Using this, I put together an open YQL table to render some HTML that shows a the globe image and the information the Yahoo GeoPlanet web service has available about that location.

You can use the table with the following YQL statement:

select * from geo.globeimage where place=”sfo” and type=”data” and location=”true”

Open this in the console here or see the results as XML.

The different parameters are:

place
The geographical location, like SFO for San Francisco Airport or London, UK for London, England
type
the type of the image. If you provide data as the parameter the image gets returned as inline data. This renders the badge much faster as the image doesn’t need to get loaded from the USGS server.
location
A Boolean if want to show the list of location information or not

The above statement would render the following HTML:


sfo

  • Name: San Francisco International Airport

  • Placetype: Airport

  • Country: United States

  • Latitude: 37.614761

  • Longitude: -122.391876

  • WOEID: 12521721

In order to use this without going through YQL, I’ve put together a small JavaScript:

globebadge.init({
element:’ID or reference of element to add the badge to‘,
location:’the geographical location you want to show‘,
showlist:true or false – if set to true the script displays the place information as an HTML list.
});

For example:

globebadge.init({
element:’badge’,
location:’Batman’,
showlist:true
});

This will render in your browser like the following image:

globebadge

You can find the source of the badge script on GitHub:

Notice that I am testing for the browser. If we have IE6 I do not return the image as a data URI, otherwise I do.

If you want to see it in action and try it out with a few locations, check out the demo page for Geoglobes.

You can see the globeimage open table for YQL at the YQL table repository:

Another example how you can find cool stuff and then turn it into a web service with YQL :)

Using YQL to read HTML from a document that requires POST data

Monday, November 16th, 2009

YQL is a very cool tool to extract data from HTML documents on the web. Let’s face facts: HTML is a terrible data format as far too many documents out there are either broken, have a wrong encoding or simply are not structured the way they should be. Therefore it can be quite a mess to try to read a HTML document and then find what you were looking for using regular expressions or tools that expect XML compatible HTML documents. Python fans will know about beautiful soup for example that does quite a good job working around most of these issues.

Using YQL you can however use a simple web service to extract data from HTML documents. As an added bonus, the YQL engine will remove falsely encoded characters and run the data retrieved through HTML Tidy to get valid HTML back. For example to get the body content of CNN.com all you’d need to do is a:

select * from HTML where url="http://cnn.com"

The really cool thing about YQL is that it allows you to XPATH to filter down the data you want to extract. For example to get all the links from cnn.com you can use:

select * from html where xpath="//a" and url="http://cnn.com"

If you only want to have the text content of the links you can do the following:

select content from html where xpath="//a" and url="http://cnn.com"

You could use this for example to translate links using the Google translation API:

select * from google.translate where q in (
  select content from html where url="http://cnn.com" and xpath="//a"
) and target="fr"

Now, the other day my esteemed colleague Dirk Ginader came up with a bit of a brain teaser for me. His question was what to do when the HTML document you try to get needs POST data sent to it for it to render properly? You can append GET parameters to the URL, but not POST so the normal HTML document is not enough.

The good news is that YQL allows you to extend it in many ways, one of them is using an execute block in an open table to convert data with JavaScript on the server. The JavaScript has full e4x support and allows you to do any HTTP request. So the first step to solve Dirk’s dilemma was to write a demo page (the form was added to test it out):

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
  "http://www.w3.org/TR/html4/strict.dtd">
<html>
  <title>Test for HTML POST table</title>
 
<body>
  <p>Below this should be a "yay!" when 
    the right POST data was submitted.</p>
<?php if(isset($_POST['foo']) && isset($_POST['bar'])){
  echo "<p>yay!</p>";
}?>
<form action="index.php" method="post" accept-charset="utf-8">
  <input type="text" name="foo" value="is">
  <input type="text" name="bar" value="set">
  <input type="submit" value="Continue &rarr;">
</form>
  </body>
</html>

The next step was to write an open table for YQL that does the necessary request and transformations.

<?xml version="1.0" encoding="UTF-8"?>
<table xmlns="http://query.yahooapis.com/v1/schema/table.xsd">
  <meta>
  <author>Christian Heilmann</author>
  <description>HTML pages that need post data</description>
  <sampleQuery><![CDATA[
select * from {table} where
url='http://isithackday.com/hacks/htmlpost/index.php' 
and postdata="foo=foo&bar=bar" and xpath="//p"]]></sampleQuery>
  <documentationURL></documentationURL>
  </meta>
  <bindings>
    <select itemPath="" produces="XML">
    <urls>
      <url>{url}</url>
    </urls>
    <inputs>
      <key id="url" type="xs:string" required="true" paramType="variable"/>
      <key id="postdata" type="xs:string" required="true" paramType="variable"/>
      <key id="xpath" type="xs:string" required="true" paramType="variable"/>
    </inputs>
    <execute>
    <![CDATA[
      var myRequest = y.rest(url);  
      var data = myRequest.accept('text/html').
                 contentType("application/x-www-form-urlencoded").
                 post(postdata).response;
      var xdata = y.xpath(data,xpath);
      response.object = <postresult>{xdata}</postresult>;
    ]]>
    </execute>
  </select> 
  </bindings>
</table>

Using this, you can now send POST data to any HTML document (unless its robots.txt blocks the YQL server or it needs authentication) and get the HTML content back. To make it work, you define the table using the “use” command:

use "http://isithackday.com/hacks/htmlpost/htmlpost.xml" as htmlpost;
select * from htmlpost where
url='http://isithackday.com/hacks/htmlpost/index.php'
and postdata="foo=foo&bar=bar" and xpath="//p"

You can try this example in the console.

I’ve also added the table to the open YQL tables repository on github so it should show up sooner or later in the console.

Here’s a quick explanation what is going on:

<?xml version="1.0" encoding="UTF-8"?>
<table xmlns="http://query.yahooapis.com/v1/schema/table.xsd">
  <meta>
  <author>Christian Heilmann</author>
  <description>HTML pages that need post data</description>
  <sampleQuery><![CDATA[
select * from {table} where
url='http://isithackday.com/hacks/htmlpost/index.php' 
and postdata="foo=foo&bar=bar" and xpath="//p"]]></sampleQuery>
  <documentationURL></documentationURL>
  </meta>

You define the schema and add meta data like the author, a description and a sample query. The latter is really important as that will show up in the YQL console when people click the table. You should normally also provide a documentation URL, but this post wasn’t written when I wrote the table so I kept it empty.

  <bindings>
    <select itemPath="" produces="XML">
    <urls>
      <url>{url}</url>
    </urls>

The bindings of the table describe the real API data endpoints the table points to. You have select, insert, update and delete – much like any other database. You provide an itemPath to cut down on the data returned and tell YQL if the data returned is XML or JSON.

    <inputs>
      <key id="url" type="xs:string" required="true" paramType="variable"/>
      <key id="postdata" type="xs:string" required="true" paramType="variable"/>
      <key id="xpath" type="xs:string" required="true" paramType="variable"/>
    </inputs>

The inputs section defines what variables are expected, if they are required and what their IDs are. These IDs will be available for you as variables in the embedded JavaScript block and are normally defined by the API your table points to.

    <execute>
    <![CDATA[
      var myRequest = y.rest(url);  
      var data = myRequest.accept('text/html').
                 contentType("application/x-www-form-urlencoded").
                 post(postdata).response;
      var xdata = y.xpath(data,xpath);
      response.object = <postresult>{xdata}</postresult>;
    ]]>
    </execute>

Here comes the JavaScript magic inside the execute block. The y.rest(url) command sends a REST query to the URL. in the easiest form this would just mean to get the data back but in our case we need to define a few more things. We expect html back so we set the request accept header to text/html. This also ensures that the result is run through HTML Tidy before it is returned. The content type has to be like a form submission and we need to send the string postdata as a post request. The response then contains whatever our request brings back.

As we want to have the handy functionality of the original HTML table, we also need to do an xpath transformation which is done with the method of the same name.

Any JavaScript in the execute block needs to define a response.object which will become the result of the YQL query. As you can see, the E4X support of YQL allows you to simply write XML blocks without any DOM pains and you can embed any JavaScript variables inside curly braces.

  </select> 
  </bindings>
</table>

And we’re done. Using YQL execute you can move a lot of JavaScript that does complex transformations to the Yahoo server farm without slowing down your end user’s computers. And you have a secure environment to boot as there are no DOM vulnerabilities.

Getting a list of Flickr photos by location and/or search term with a YQL open table

Monday, November 2nd, 2009

Displaying photos from Flickr can be daunting. The API needs authentication and the RSS, JSON or LOLcode output is very limited. The way around is using YQL and its Flickr tables. That way it is pretty easy to search Flickr:

select * from flickr.photos.search where text=”panda”

Try out the Panda search in the YQL console.

The output format has a lot of information in there, but sadly enough, not all. For example the real name of the owner or the description is missing. Therefore you need to go through yet another Flickr API to get the whole data set:

select * from flickr.photos.info where photo_id in(
select id from flickr.photos.search where text=”panda”
)

Guess what? You can also try this more detailed query in the console.

I’ve shown before how easy it is to display Flickr Photos retrieved that way:


The issue with that though is that it uses JavaScript and JavaScript may be turned off (think Blackberries). Of course you can do the same thing in PHP but I’d wager to say more people to JavaScript than PHP these days.

The main issue is that Flickr returns the photos in a pretty weird format and that you need a script like the one above to turn it into a simple HTML list.

The good news is that YQL with the execute command allows you to embed JavaScript in your open tables. That way you can write a table that does all the necessary transformations and returns the data as a simple list for immediate use:




select * from {table} where location=”london,UK”
Christian Heilmann
http://www.wait-till-i.com/2009/11/01/getting-a-list-of-flickr-photos-by-location-andor-search-term-with-a-yql-open-table
Searches Flickr by location and/or search term and returns an HTML list that can be immediately used in a mashup.





You’ll notice that while the E4X support is very powerful, it can be a bit confusing to look at on first sight. Once you got your head around though it becomes much cleaner that way.

You can use this table like any other open table via the use command in YQL:

use “http://github.com/codepo8/yql-tables/raw/master/flickr/flickr.photolist.xml” as flickr;
select * from flickr where text=”me” and location=”uk” and amount=20

try it in the console.

I’ve wrapped one more API in there – the Yahoo Geo API to determine a place from a name should you want to search by location. All in all you have three parameters in this open table – all of which are optional:

  • text – the search text
  • location – the geographical location
  • amount – the amount of photos to be returned

If you look at the table source, you can also see that I hard-wired the license of the photos to 4 which is CC-BY. So if you link the photos back to Flickr you both satisfied Flickr’s terms and the original photographer’s.

Now, the easiest way to use this output is by using YQL’s JSON-P-X output format. This is XML with a callback which returns a JSON object with the HTML as a string instead of a convoluted JSON object. See the JSON-P-X output here.

That way you can easily use it in JavaScript:


And also in PHP:


$url = ‘http://query.yahooapis.com/v1/public/yql?q=use%20%22http://github.com/codepo8/yql-tables/raw/master/flickr/flickr.photolist.xml%22%20as%20flickr;%20select%20*%20from%20flickr%20where%20text=%22me%22%20and%20location=%22uk%22%20and%20amount=20&format=xml&diagnostics=false’;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
curl_close($ch);
$output = preg_replace(‘/.*
    /’,’
      ‘,$output);
      $output = preg_replace(‘/
    .*/’,’
‘,$output);
$output = preg_replace(‘//’,’‘,$output);
$output = preg_replace(‘//’,’‘,$output);
echo $output;
?>

You can see both in action on the demo page.

So by using YQL open tables you can not only access complex APIs with YQL, but you could also write complete mashups in JavaScript and have them executed in a safe environment on Yahoo’s servers. Your end of the mashup is simply the display which could be a form that works with Ajax when JavaScript is available and renders a static page in PHP (or whatever other server-side language) when JavaScript is turned off. You only need to do one HTTP request – the rest is executed and cached on the YQL server farm – everybody wins.