Christian Heilmann

Posts Tagged ‘yql’

TTMMHTM: Scuba attacks in Norway,safer internet, mouse tracking and SMS API

Wednesday, February 10th, 2010

Things that made me happy this morning:

Explaining what YQL is to non-technical people

Monday, February 8th, 2010

A friend of mine was asked to produce some videos and screencasts to explain YQL to non-technical people and asked me to give her a definition. Here’s what I came up with:

What is YQL and what is it good for ?

The Internet as we see it (web sites, games, videos) is only a shop window to the thing that really drives it: data. All the things you see are based on information one provider offers another provider. Even you are a data provider – if you comment on a video on YouTube or you add a person to a photo on Facebook you create a data set. All this data is made available behind the scenes to different people – this could be developers in the same company or partners or – for example in the case of the Yahoo Developer Network – other developers.

The information comes in different formats. For example the photos on http://www.flickr.com/photos/tags/cat are available in “JSON” format at http://api.flickr.com/services/feeds/photos_public.gne?tags=cat&lang=en-us&format=json.

To you this might look like nothing much but developers have a lot of fun with this information. It gets especially interesting when you mix and match different information sources to get a new picture. You can for example map the news onto the world to give them geographical context.

All of this sharing of information follows certain conventions much like normal conversation does. You need to ask the right question to the right person in the right format to get the correct information back. In the case of data on the Internet this gets even more confusing as all the data providers speak different languages and give you information back in formats that you might not understand. Imagine going shopping for clothes and each shop assistant speaks a different language and each shop has different size charts for the same type of clothing.

In normal communication amongst people with different languages we have translators – sometimes even simultaneous ones. And this is what YQL is. YQL is a simple to understand language to allow you to get information from any source on the web and gives it back to you in a language you understand. Every resource on the web can get a simple name and developers can call this name up to get to the data it offers. So for example getting photos from flickr becomes

select * from flickr.photos.search where text=’cat’

Translating a text to French becomes

select * from google.translate where q=’hubcap’ and target=’fr’

You can also mix and match different sources, for example you can get the latest headlines from yahoo’s top stories and translate them to French:

select * from google.translate where target=’fr’ and q in (
select title from rss where url=’http://rss.news.yahoo.com/rss/topstories’
)

This language is very close to another language called SQL which is the standard for accessing information in databases. So, in essence, YQL turns the internet into a massive database where developers can access information and remix it to make it easier for end users to see relationships between different pieces of information or even develop interfaces that make it possible for users to get access to the information.

For example this great lecture on YouTube can be listened to by blind users using the EasyYouTube interface that uses YQL to get to the information of the video.

YQL can be accessed on a pure programmatic level but the easiest way to get a glimpse of its translation and access powers is to use the console.

YQL is a great communicator – it allows you to speak to all kind of data sources in the right language, ask the right questions for you, find the right internal phone numbers to get to even other resources needed to get to a certain piece of information and give back to you only what you asked for and not a whole mass of information you will never need. And it is amazingly fast in doing so as Yahoo built it to deal with exactly the same problem of data communication inside the company.

Adding map links and a small map to any text using JavaScript – addmap.js

Friday, January 29th, 2010

As part of a larger article, I am currently building some tools that use geographical information. The first one is a pure JavaScript solution to link locations in a text to Google Maps and to show a small map under the text. You can see it in action by clicking the screenshot below.

Analyse text and add a map with its locations in pure JavaScript by  you.

All you need to do to use the script is get your own Google Maps key and embed the script in the page you want to analyse, giving it the ID of the main text element as a parameter:


You can customise the look and feel by writing your own CSS for the generated HTML and setting the addmap.config.width and addmap.config.height properties to resize the map.

Under the hood here is what happens:

The code of addmap.js is available on GitHub.

How I build my data.gov.uk mashup – UK-House-Prices.com

Thursday, January 21st, 2010

UK-House-Prices.com is a web site to see how the prices in a certain area changed over the years using a data set released by the UK government as part of the data.gov.uk initiative.

Here’s a screencast showing the app:

The first step was to get the right data. I was lucky enough to be invited to the initial “hack day” and pre-release of the data and looked around for something to mash up. Initially I wanted to do something with environmental data but I found a lot of it to be very old. Therefore I just did a search for “2009” at data.gov.uk and found that the house prices data from 1996 to now in England and Wales is available. The plan was set. This was it:

  • I wanted to build an interface to show this information that was very fast, very portable and show a nice map of the area next to the numbers.
  • I wanted to build this as a web app and as an application for the Yahoo homepage (as I needed to build one as a demo anyways)
  • Traffic and speed was the most important issue – as this might get huge.

Cleaning and converting data

I got the spreadsheet and was confronted with my old nemesis: Excel. After saving the sheet as CSV and spending some fun time regular expressions and split() I had the data in a cleaner, and more usable version (JSON, specifically). One fun part is that when there was no data available for a certain area the field was either “..”, “n/a” or just empty. Something to work around. The numbers were also formatted like 100,312 which is nice on the eye but needs un-doing when you want to sort them outside Excel.

Once I had the list of locations and their numbers I wanted to turn them into geographical locations to display maps of the area. For this I used Yahoo Placemaker, especially the YQL table (see an example for Rugby in the YQL console). This is the script I ran over the list of locations:


$out = ‘’;
for($i=0;$i $select = preg_replace(‘/,.*/’,’‘,$lines[$i]);
$select = preg_replace(‘/ UA/’,’‘,$select);
$url = ‘http://query.yahooapis.com/v1/public/yql?q=select%20match.place.woeId%2Cmatch.place.centroid%20from%20geo.placemaker%20where%20documentContent%20%3D%20%22’.urlencode($select.’,uk’).’%22%20AND%20documentType%3D%22text%2Fplain%22%20and%20appid%20%3D%20%22%22%20limit%201&format=json&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys’;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
curl_close($ch);
$data = json_decode($output);
echo ‘{“place”:”’.$select.’”,’;
echo ‘”w”:”’.$data->query->results->matches->match->place->woeId.’”,’;
echo ‘”lat”:”’.$data->query->results->matches->match->place->centroid->latitude.’”,’;
echo ‘”lon”:”’.$data->query->results->matches->match->place->centroid->longitude.’”’.”},n”;
;

}

That was that – I had a data set I can work with.

Adding more information

The next thing I wanted to add was some more information about the area which meant using maps. As both Yahoo and Google maps have static map versions but are rate limited I wondered if there is a free version of that. And there is. Openstreetmap was the answer, especially the somewhat unofficial API I found with Google. To play safe, I wrote a script that gets the images and I cache it on my server to avoid killing this API.

I also wanted to show currently available houses in the area in case you are looking to buy. For this the natural choice for me was to use Nestoria as they also have an open YQL table (see the Nestoria table in the YQL console). So I used YQL and sorted the results by date:

select * from nestoria.search where place_name="Rugby" | sort(field='updated_in_days')

Using this I can get offers in the area live:

$url = ‘http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20nestoria.search%20where%20place_name%3D%22’.urlencode($city).’%22%20|%20sort%28field%3D%27updated_in_days%27%29&format=json&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&diagnostics=false’;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
curl_close($ch);
$data = json_decode($output);
if($data->query->results){
$i=0;
$results = array_slice($data->query->results->listings,0,5);
if(sizeof($results)>0){
echo ‘

Current property listings (powered by Nestoria)

    ‘;
    foreach($results as $r){
    echo ‘
  • lister_url).’”>‘.($r->title).’‘;
    echo ‘

    Price: ‘.($r->price_formatted).’, Type of property: ‘.ucfirst($r->property_type).’, Updated: ‘.($r->updated_in_days_formatted).’ (‘.($r->updated_in_days).’ days)

    ‘;
    echo ‘

    Listed at: ‘.($r->datasource_name).’ by ‘.($r->lister_name).’.

    ‘;
    echo ‘
  • ‘;
    }

    echo ‘

‘;
}

}

Finding a charting solution

Adding interactive charts was the next step. I had a few issues with that:

  • While Google charts are full of win, they are rate-limited and I didn’t want to pull images. As the app was also meant to become a Yahoo application every image would have to be run through Caja for safety reasons which slowed it down.
  • Canvas and Flash solutions like YUI charts or Raphael were also not possible because of the performance of the YAP app.

So I wrote my own pure CSS bar charts to work around that issue.

Building the API

I put all these solutions together and built a small API that will give me the search results with three parameters: the location as an id and the start and end of the time range.

http://uk-house-prices.com/graphs.php?loc=1&start=10&end=20

Building the interface

To build the interface, I went all-out YUI. I took the YUI grids builder to create the main layout, the AutoComplete demo, the dual slider demo and the button and put them all together. Add an Ajax call to the form, and you are done. OK, I admit, there was quite a bit of cleaning up to be done :)

Notice that I am using progressive enhancement all the way. Without JavaScript you get dropdowns:

UK House Prices - without JavaScript by  you.

That’s it

The next thing I had to do is move the app over to the Yahoo Application Platform which was easy as I based it on an API - but this is another blog post :)

Loading external content with Ajax using jQuery and YQL

Sunday, January 10th, 2010

Let’s solve the problem of loading external content (on other domains) with Ajax in jQuery. All the code you see here is available on GitHub and can be seen on this demo page so no need to copy and paste!

OK, Ajax with jQuery is very easy to do – like most solutions it is a few lines:

$(document).ready(function(){
  $('.ajaxtrigger').click(function(){
    $('#target').load('ajaxcontent.html');
  });
});

Check out this simple and obtrusive Ajax demo to see what it does.

This will turn all elements with the class of ajaxtrigger into triggers to load “ajaxcontent.html” and display its contents in the element with the ID target.

This is terrible, as it most of the time means that people will use pointless links like “click me” with # as the href, but this is not the problem for today. I am working on a larger article with all the goodies about Ajax usability and accessibility.

However, to make this more re-usable we could do the following:

$(document).ready(function(){
  $('.ajaxtrigger').click(function(){
    $('#target').load($(this).attr('href'));
    return false;
  });
});

You can then use load some content to load the content and you make the whole thing re-usable.

Check out this more reusable Ajax demo to see what it does.

The issue I wanted to find a nice solution for is the one that happens when you click on the second link in the demo: loading external files fails as Ajax doesn’t allow for cross-domain loading of content. This means that see my portfolio will fail to load the Ajax content and fail silently at that. You can click the link until you are blue in the face but nothing happens. A dirty hack to avoid this is just allowing the browser to load the document if somebody really tries to load an external link.

Check out this allowing external links to be followed to see what it does.

$(document).ready(function(){
  $('.ajaxtrigger').click(function(){
    var url = $(this).attr('href');
    if(url.match('^http')){
      return true;
    } else {
      $('#target').load(url);
      return false;
    }
  });
});

Proxying with PHP

If you look around the web you will find the solution in most of the cases to be PHP proxy scripts (or any other language). Something using cURL could be for example proxy.php:

<?php
$url = $_GET['url'];
$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL, $url); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
$output = curl_exec($ch); 
curl_close($ch);
echo $content;
?>

People then could use this with a slightly changed script (using a proxy):

$(document).ready(function(){
  $('.ajaxtrigger').click(function(){
    var url = $(this).attr('href');
    if(url.match('^http')){
      url = 'proxy.php?url=' + url;
    }
    $('#target').load(url);
    return false;
  });
});

It is also a spectacularly stupid idea to have a proxy script like that. The reason is that without filtering people can use this to load any document of your server and display it in the page (simply use firebug to rename the link to show anything on your server), they can use it to inject a mass-mailer script into your document or simply use this to redirect to any other web resource and make it look like your server was the one that sent it. It is spammer’s heaven.

Use a white-listing and filtering proxy!

So if you want to use a proxy, make sure to white-list the allowed URIs. Furthermore it is a good plan to get rid of everything but the body of the other HTML document. Another good idea is to filter out scripts. This prevents display glitches and scripts you don’t want executed on your site to get executed.

Something like this:

<?php
$url = $_GET['url'];
$allowedurls = array(
  'http://developer.yahoo.com',
  'http://icant.co.uk'
);
if(in_array($url,$allowedurls)){
  $ch = curl_init(); 
  curl_setopt($ch, CURLOPT_URL, $url); 
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
  $output = curl_exec($ch); 
  curl_close($ch);
  $content = preg_replace('/.*<body[^>]*>/msi','',$output);
  $content = preg_replace('/<\/body>.*/msi','',$content);
  $content = preg_replace('/<?\/body[^>]*>/msi','',$content);
  $content = preg_replace('/[\r|\n]+/msi','',$content);
  $content = preg_replace('/<--[\S\s]*?-->/msi','',$content);
  $content = preg_replace('/<noscript[^>]*>[\S\s]*?'.
                          '<\/noscript>/msi',
                          '',$content);
  $content = preg_replace('/<script[^>]*>[\S\s]*?<\/script>/msi',
                          '',$content);
  $content = preg_replace('/<script.*\/>/msi','',$content);
  echo $content;
} else {
  echo 'Error: URL not allowed to load here.';
}
?>

Pure JavaScript solution using YQL

But what if you have no server access or you want to stay in JavaScript? Not to worry – it can be done. YQL allows you to load any HTML document and get it back in JSON. As jQuery has a nice interface to load JSON, this can be used together to achieve what we want to.

Getting HTML from YQL is as easy as using:

select * from html where url="http://icant.co.uk"

YQL does a few things extra for us:

  • It loads the HTML document and sanitizes it
  • It runs the HTML document through HTML Tidy to remove things .NETnasty frameworks considered markup.
  • It caches the HTML for a while
  • It only returns the body content of the HTML - so no styling (other than inline styles) will get through.

As output formats you can choose XML or JSON. If you define a callback parameter for JSON you get JSON-P with all the HTML as a JavaScript Object – not fun to re-assemble:

foo({
  "query":{
  "count":"1",
  "created":"2010-01-10T07:51:43Z",
  "lang":"en-US",
  "updated":"2010-01-10T07:51:43Z",
  "uri":"http://query.yahoo[...whatever...]k%22",
  "results":{
    "body":{
      "div":{
        "id":"doc2",
        "div":[{"id":"hd",
          "h1":"icant.co.uk - everything Christian Heilmann"
        },
        {"id":"bd",
        "div":[
        {"div":[{"h2":"About this and me","
        [... and so on...]
}}}}}}}});

When you define a callback with the XML output you get a function call with the HTML data as string in an Array – much easier:

foo({
  "query":{
  "count":"1",
  "created":"2010-01-10T07:47:40Z",
  "lang":"en-US",
  "updated":"2010-01-10T07:47:40Z",
  "uri":"http://query.y[...who cares...]%22"},
  "results":[
    "<body>\n    <div id=\"doc2\">\n<div id=\"hd\">\n 
     <h1>icant.co.uk - \n
     everything Christian Heilmann<\/h1>\n 
      ... and so on ..."
  ]
});

Using jQuery’s getJSON() method and accessing the YQL endpoint this is easy to implement:

$.getJSON("http://query.yahooapis.com/v1/public/yql?"+
          "q=select%20*%20from%20html%20where%20url%3D%22"+
          encodeURIComponent(url)+
          "%22&format=xml'&callback=?",
  function(data){
    if(data.results[0]){
      var data = filterData(data.results[0]);
      container.html(data);
    } else {
      var errormsg = '<p>Error: can't load the page.</p>';
      container.html(errormsg);
    }
  }
);

Putting it all together you have a cross-domain Ajax solution with jQuery and YQL:

$(document).ready(function(){
  var container = $('#target');
  $('.ajaxtrigger').click(function(){
    doAjax($(this).attr('href'));
    return false;
  });
  function doAjax(url){
    // if it is an external URI
    if(url.match('^http')){
      // call YQL
      $.getJSON("http://query.yahooapis.com/v1/public/yql?"+
                "q=select%20*%20from%20html%20where%20url%3D%22"+
                encodeURIComponent(url)+
                "%22&format=xml'&callback=?",
        // this function gets the data from the successful 
        // JSON-P call
        function(data){
          // if there is data, filter it and render it out
          if(data.results[0]){
            var data = filterData(data.results[0]);
            container.html(data);
          // otherwise tell the world that something went wrong
          } else {
            var errormsg = '<p>Error: can't load the page.</p>';
            container.html(errormsg);
          }
        }
      );
    // if it is not an external URI, use Ajax load()
    } else {
      $('#target').load(url);
    }
  }
  // filter out some nasties
  function filterData(data){
    data = data.replace(/<?\/body[^>]*>/g,'');
    data = data.replace(/[\r|\n]+/g,'');
    data = data.replace(/<--[\S\s]*?-->/g,'');
    data = data.replace(/<noscript[^>]*>[\S\s]*?<\/noscript>/g,'');
    data = data.replace(/<script[^>]*>[\S\s]*?<\/script>/g,'');
    data = data.replace(/<script.*\/>/,'');
    return data;
  }
});

This is rough and ready of course. A real Ajax solution should also consider timeout and not found scenarios. Check out the full version with loading indicators, error handling and yellow fade for inspiration.