Christian Heilmann

You are currently browsing the Christian Heilmann blog archives for November, 2009.

Archive for November, 2009

Turning a web folder with data into an API using YQL execute

Monday, November 30th, 2009

Yesterday Johan Bouveng challenged me on Facebook with another brain teaser. Johan had found a nice resource for weather forecasts for airports. The data came back in TAF or METAR formats and he already had a parser for this (thank f*** – what a mess these formats). Now, what he wanted to have is an API to get the weather forecast data from the following resources:

Using YQL and YQL execute this was pretty easy. All I had to do was to write an open table that reads the correct file.

// check if taf or metar was requested or return an error
if(datatype 'taf' || datatype ‘metar’){
var returnobj,url;

// get the correct url using the airport ID and format
url = ‘’ +
‘metar/stations/’ + airportid + ‘.TXT’
} else {
url = ‘’+
‘taf/stations/’ + airportid +’.TXT’;

// do a REST call and get the response back
var out =;

// if there is no data returned, return an error.
if(out == ‘’){
returnobj = Airport {airportid} not found.;

// otherwise return the data in the TXT file
} else {
returnobj = {out};

} else{
// error condition for wrong datatype
returnobj = Datatype must be either taf or metar.;

// give back the data to YQL
response.object = returnobj;

Having done this you can now use it as a table in YQL:

use “” as aw;
select * from aw where airportid=”AAXX” and datatype=”taf”;

As you can see, you don’t have to be a genius to build your own API :)

Want to work for Yahoo? We’re looking for web developers in the UK and the US.

Sunday, November 29th, 2009

A new year is coming which always means new opportunities. If you think Yahoo is a place to work and help us build the biggest web sites on the internet here’s your chance. Requirements are:

  • Hand-coded (X)HTML, CSS, and JavaScript
  • Solid knowledge of standards-based, accessible, cross-browser web development
  • Basic PHP or other front-end language (e.g. Python, Perl, Ruby) programming skills
  • Experience in developing web applications with rich client interfaces using AJAX, drag and drop, and other DOM Scripting techniques.
  • Experience with JavaScript libraries such as jQuery, Prototype, and especially the YUI
  • Experience developing functionality/applications by assembling existing code modules
  • User-level experience with *IX-style command line (BSD/Linux)
  • Experience using version control systems such as CVS & Subversion, including branching and merging
  • Experience with bug tracking software

The full job specification for the UK is here= and the US openings are also available.

The US jobs are located in Sunnyvale, California and the UK ones in Covent Garden, London, bang in the centre of theatreland and a stone-throw from Soho.

You can also send me your CV or contact me on Twitter: @codepo8

Frontloaded and zipped up – the Full Frontal 2009 keynote

Saturday, November 21st, 2009

Here are the slides, the audio recording and my notes for the keynote of the full frontal conference held yesterday in Brighton, England. It was a blast, thank you Remy and Julie!

Slides on Slideshare

Audio recording

You can get the recording of the talk over at – recorded on my macbook, so there are some volume fluctuations.

Talk description

The following was the description of the talk introducing the ideas to the attendees of full frontal.

Frontloaded and zipped up – do loose types sink ships?

JavaScript had a bumpy ride up to now, from its origins as a CGI-replacement, initiator of countless popups and annoying effects over the renaissance as Ajax enabler up to becoming wrapped up in libraries to work around the hell that is browser differences. With the ubiquity of JavaScript comes a new challenge. How do we keep JavaScript safe when browsers don’t really distinguish between different sources and give them all the same rights? Why do we still judge the usefulness of JavaScript by how badly browsers speak it? Learn about some environments you can use JavaScript in securely and marvel at the magic and annoyances that are technologies that try to put a lock on the issue of JavaScript security.

A quick trip down memory lane.

When I first encountered JavaScript it was mainly used to do simple calculators, window manipulation and simple form validation. The main interface used was the browser object model with window being the main object and form and element being the collections to manipulate. You added content either by changing the value of a form field or by using document.write() with the latter being different from browser to browser. The other thing you had was the images array and this is what we used extensively to create rollovers.

Event handling was done with on{event} inline handlers and the body always had an onload handler on it.

Bring on the bling!

That however did not stop us from already abusing JavaScript to create pointless bells and whistles. Status bar tickers, title changing scripts and moving popup windows were the first to annoy the end user and they were just the start.

More bling.

With browsers starting to allow you to manipulate more of the document (via document.all and document.layers) and new and bespoke CSS extensions we had even more options to do very annoying and pointless things. Animated menus, rainbow cycling scrollbars, the floating (and flickering) Geocities logo, mousetrails and other abominations were built to bling up our sites and subsequently the audience got sick of JavaScript and discarded it as a toy.

Ajax for the win!

This all changed when Ajax came around and there was no way not to have some way or another you load content on demand using XMLhttpRequest – if you wanted to have a cool web site that is. And of course people used it wrongly.

Security scares.

As people used JavaScript to load information that should not be visible to the world and it is easy to intercept and see everything that happens in a browser in JavaScript we have more and more security scares coming up.

Is JavaScript a security problem?

This bears the question if JavaScript in itself is a security problem and if we should discard it at all.

Security flaws start at the backend but JavaScript gets the blame.

Last week I came across an interesting survey by the security company Cenzic – get the PDF here. They looked at the state of the web and the main security problems in the first two quarters of 2009. The survey showed that the browser was responsible for only 8% of the overall security issues.

One thing that is interesting is that most security flaws start with a problem on the backend but get blamed on JavaScript. XSS is a backend problem, but it becomes a problem as JavaScript is designed to give scripts too many rights.

JavaScript implementation vs. JavaScript

The problem is not JavaScript itself – well, not exclusively – it is mostly the implementation of it in browsers. And funnily enough this is how we measure the quality of the language. It is like judging the quality of a book by its movie.

Browsers don’t care where JavaScript comes from.

To a browser, every JavaScript has the same rights to the content of the page and other things JavaScript can reach – and that includes cookies. When I can steal your cookies I can steal your users’ identities and this is a big security issue.

Browsers are full of security holes.

The other issue is that browsers are full of security faults. This can be interesting as people complain about IE6 and its flaws, but the survey actually ranked Firefox and Safari as the most vulnerable browsers. The reasons are plugins in the case of Firefox and – in Safari’s case – the iPhone. Interesting targets are always successful platforms.

Plugins have and still are a main source for security issues. Especially in the case of IE Flash and PDF display was always a problem. The reason is simple – plugins extend the reach of the browser into the file system and that is an interesting attack vector. So if you offer PDF documents and you want to keep your system secure it might be a good idea to loop them through a script that sets a header that forces user download – this also allows you to add statistics to the PDF downloads.

So we can’t use JavaScript, right?

Which brings a lot of people not to trust JavaScript at all and see it as the source of all evil. Plugins like NoScript are all the rage and the security-conscious are happy to call JavaScript the source of all evil.

It is about spreading the joy of JavaScript.

JavaScript is an amazingly useful part of the interfaces we give our end users. Totally turning it off or not using it means we give up on a lot of things that our users should get and expect from an interface in 2009. I like that I can write a message while an attachment uploads in the background.

Learning JavaScript

The first thing to remember is that this is not 1997. We don’t have to learn JavaScript by looking at other people’s source code. Opera’s web standards curriculum and The Yahoo video theatre are great resources to take your first steps into the JavaScript world.

What to use JavaScript for

The main thing is to remember what we should use JavaScript for:

  • slicker interfaces (autocomplete, asynchronous uploading)
  • warning users about flawed entries (password strength for example)
  • extending the interface options of HTML to become an application language (sliders, maps, comboboxes…)
  • Any visual effect that cannot be done safely with CSS (animation, menus…)

CSS has come a long way but unless you can control the animation and be sure it works cross-browser it is not a replacement. Menu systems using CSS only are a gimmick as they cannot be made keyboard accessible.

What not to use JavaScript for

  • Sensitive information (credit card numbers, any real user data)
  • Cookie handling containing session data
  • Trying to protect content (right-click scripts, email obfuscation)
  • Replacing your server / saving on server traffic without a fallback

What if you need more?

All this becomes an issue when you get into developing large web products where you push the envelope of what can be done with the web and the technologies right now. The new Yahoo homepage is one of these examples – in it we wanted to allow third party developers to build own applications and run them safely inside ours without endangering the privacy of our users.

You can limit yourself

One thing you can do is to limit yourself to the “safe” parts of a language. Douglas Crockford’s AdSafe takes this approach and is meant as a guideline for ad providers.

You can pre-process JavaScript

The other option is to enforce the limitation of the language by pre-processing JavaScript and converting it to a safer subset. The main tool for this nowadays is Caja which has been invented by Google and now made workable by Google and Yahoo for the Open Social platform. Caja converts JavaScript to a safe subset – either on the client or on the server.

Things Caja doesn’t allow you to do

To ensure the security of our applications, Caja stops you from using some things you might have gotten accustomed to using in the last few years.

Caja and HTML

Here are the things you cannot use in HTML:

  • name attributes
  • custom attributes
  • custom tags
  • unclosed tags
  • embed
  • iframe
  • link rel=”...”
  • javascript:void(0)
  • radio buttons in IE
  • relative URLs

Caja and JavaScript

Things you need to keep out of your JavaScript:

  • eval()
  • new Function()
  • strings as event handlers (node.onclick = ‘...’;)
  • names ending with double / triple underscores
  • with function (with (obj) { ... })
  • implicit global variables (specify var variable)
  • calling a method as a function
  • document.write
  • window.event
  • ajax requests returning JS

Caja and CSS

And last but not least things deemed dangerous in CSS are:

  • star hacks
  • underscore hacks
  • IE conditionals
  • Insert-after clear fix
  • expression()
  • *@import

Caja ready code examples

You can find a good collection of Caja ready code examples in the Yahoo Application Platform documentation.

Caja problems and making it easier

Whilst Caja is a great idea to ensure the security of widgets it is not without its problems. If you chose client-side conversion it means a massive dent in the performance of your application and even with server-side conversion it becomes harder to build new systems. For starters, Caja-converted code is very hard to read and therefore debug and in many cases it means that as a developer you need to change your ways.

Libraries and Caja compliance

Much like we fix browsers, we can also use libraries to make our Caja-compliant development easier. The first library to be fully Caja compliant is the Yahoo User Interface library and other libraries like jQuery have also shown interest in compliance.

Abstracting the issue with an own language – YML

The other way to make it easier to write secure code is to abstract most of th changes to our normal development ways out into an own markup language. Facebook had done this and in Yahoo’s case there is the Yahoo Markup language or short YML. Using this language in a widget for the Yahoo homepage you can do Ajax requests and dig into the Yahoo social graph without having to write any JavaScript or server-side code.

Extending browsers

Another interesting way to make JavaScript development more interesting is to think about browser extensions. This starts with GreaseMonkey which allows Firefox users to extend any web site out there with new functionality using a few lines of Dom Scripting – a great way for example to do quick prototyping. Google Gears, Yahoo Browser Plus and and Mozilla Jetpack kick this idea up a notch and give you new APIs to extend the reach of the browser into local storage, allow for database access in JavaScript and give you worker threads to do heavy computations without slowing down the main interface. These extensions give browsers the power we would love to have to be able to deliver real applications inside browsers.

Moving out of the browser

The other thing you can do with JavaScript these days is to move outside the browser and take your HTML, CSS and JavaScript solutions to other platforms.

Widget frameworks

Widget frameworks have been around for a while with Konfabulator and Apple Dashboard widgets leading the way. Opera also allows you to run small applications outside the confines of a browser window. The interesting thing about widgets is that they always looked much prettier than most web solutions – mainly because PNG support was a given and not something you had to hack for MSIE.

W3C widgets

W3C widgets are a standard that allows you to zip up an HTML document with CSS, JavaScript and images and run it as a self-contained widget. Peter-Paul Koch has written a great introduction to W3C widgets and several mobile phone providers (first and foremost Vodafone) offer a way to run these widgets on handsets without the need to learn any mobile OS language or tools.

Adobe Air

Adobe Air has made it possible for web developers to write full-blown installable applications that run across several operating systems and have access to databases and the file system. Probably the most successful apps are Twitter clients and music apps like Spotify.

Command line JavaScript – Rhino

If you don’t like all the fancy visual stuff and you want to use JavaScript to do some heavy data conversion you can use JavaScript on the command line using Rhino which is a Java implementation of JavaScript. The really cool thing about writing JavaScript for the command line is that it supports all the features of the language and you are not at the mercy of a browser to do it right.

Turning JavaScript Mashups into web services.

One rather new opportunity for developers is that you can use YQL or Yahoo Query Language to easily mash-up and filter data from several data sources on the web. YQL allows you to:

  • mashup data with a SQL-style syntax
  • filter down to the absolutely necessary data
  • return as XML, JSON, JSON-P and JSON-P-X
  • use Yahoo as a high-speed proxy to retrieve data from various sources.
  • use Yahoo as a rate limiting and caching proxy when providing data.

Retrieving data from an HTML document and choosing the right output format

Using YQL it is dead easy for example to retrieve the headlines from an HTML document with the following statement.

select * from html where url=”” and xpath=”//h3”

YQL is a web service in itself and you can retrieve the data returned from this request in different formats.

  • XML returns the data as an XML file which is not that useful in a JavaScript environment.
  • JSON is natively supported and therefore much easier to parse.
  • JSON-P wraps the returned JSON object in a JavaScript function call and thereby makes it very easy to use in a script node (either hardcoded or created on the fly).
  • JSON-P-X wraps the returned JSON object in a JavaScript function call and returns the XML content (in this case the scraped HTML) as a string. This makes it very easy to use innerHTML to render the data in a browser without having to loop through the JSON object and re-assemble the string.

Retrieving photos for a certain geographical location

As a demo, try this out. In order to retrieve photos for a certain geographical location you can use the geo and Flickr APIs in a single YQL statement:

select farm,id,secret,owner.realname,server,title,urls.url.content
from where photo_id in(
select id from where woe_id in(
select woeid from geo.places where text=”london”


Try it out in your browser to see the resulting data.

Using a few lines of DOM scripting you can turn this into a nice web site showing these photos.

Moving JavaScript solutions into YQL to turn them into web services

The problem with the solution above is that you make yourself dependent on JavaScript to show these photos. If you want to still use JavaScript but allow users without it to see these photos you can use a YQL open table with embedded JavaScript to do the conversion. YQL uses Rhino to run and execute your JavaScript server-side and returns you the content you created inside an XML or JSON file. As JavaScript is executed on the server, you have full E4X support to make the use of XML painless and you can use advanced JavaScript like for each:

var amt = amount || 10;
var query = ‘select farm,id,secret,owner.realname,server,title,’+
‘urls.url.content from where ‘+
‘photo_id in (select id from‘+
amount + ‘) where ‘;
query += ‘woe_id in (select woeid from geo.places where text=”’ +
location+’”) and ‘;

query += ’ text=”’ + text + ‘” and license=4)’
var x = y.query(query);
var out =

This, embedded in an open table means you can retrieve photos from Flickr as a UL now using the following YQL statement:

select * from flickr.photolist where text=”me” and location=”uk” and amount=20

You can then display the photos returned with PHP:

$url = ‘;%20select%20*%20from%20flickr%20where%20text=%22me%22%20and%20location=%22uk%22%20and%20amount=20&format=xml&diagnostics=false’;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETRNTRANSFER, 1);
$output = curl_exec($ch);
$output = preg_replace(‘/.*
      $output = preg_replace(‘/
$output = preg_replace(‘//’,’‘,$output);
$output = preg_replace(‘//’,’‘,$output);
echo $output;

Or with a very simple JavaScript, thanks to the JSON-P-X output format:

Another example – scraping HTML from web pages that need POST data

Another powerful example of what you can do with JavaScript when you embed it into a YQL table is the following:

Christian Heilmann
HTML pages that need post data
select * from {table} where
and postdata=”foo=foo&bar=bar” and xpath=”//p”]]>

As explained in detail in this blog post this JavaScript extends the HTML scraping option of YQL to allow for POST data to be sent to a document before retrieving the HTML:

select * from htmlpost where
and postdata=”foo=foo&bar=bar” and xpath=”//p”

Notice that YQL execute gives you full REST and HTTP support and has the xpath conversion built-in as a on own function.

oAuth in JavaScript – the netflix example

Another interesting example is the open table provided by Netflix, which shows you how you can use oAuth in JavaScript:

// Include the OAuth libraries from

// Collect all the parameters
var encodedurl = request.url;
var accessor = { consumerSecret: cks, tokenSecret: “”};
var message = { action: encodedurl, method: “GET”, parameters: [[“oauth_consumer_key”,ck],[“oauth_version”,”1.0”]]};

// Sign the request
OAuth.SignatureMethod.sign(message, accessor);

try {
// get the content from service along with the OAuth header, and return the result back out
response.object = request.contentType(‘application/xml’).header(“Authorization”, OAuth.getAuthorizationHeader(“”, message.parameters)).get().response;
} catch(err) {
response.object = {‘result’:’failure’, ‘error’: err};

Liberating our JavaScript

As you can see switching environments liberates our JavaScript solutions and gives us much tighter security. So open your minds and don’t judge JavaScript by its implementation. Instead have fun with it and use it wisely. With great power comes great responsibility.

Inviting UK companies to build an app for the Yahoo homepage. Come around next Thursday

Tuesday, November 17th, 2009

There is no rest for the wicked. Coming back from Japan and preparing for Friday’s Full Frontal conference I just spent an hour in one of the final meetings for the Yahoo Application Platform event next Thursday. As you might know, you can build applications that run on the Yahoo platform and can be installed by any of our users. We now run a competition for UK companies to build a cool branded app with the main prize being a guaranteed slot in the Yahoo UK homepage.

Sounds good? Come around next Thursday and see what it is all about. Here’s the official invite:

Technical Workshop for the Yahoo! Application Platform
The Yahoo! Application Platform allows developers to reach Yahoo!’s millions of users and improve the Yahoo! user experience by building and deploying sophisticated new applications for Yahoo! pages.
We’d like to invite you to a technical workshop about YAP led by Yahoo! developer evangelist Christian Heilmann.
It will give an overview of the technology, including some examples of applications that have already been developed, followed by a deep dive into the YAP platform. Please bring your laptop (and a 3g dongle if you have one just in case the venue wifi lets us down).
After some refreshments, there will also be an opportunity to get hands-on with YAP, as Chris steps you through creating a sample application.
The event will be held at Century (61-63 Shaftesbury Avenue, London, W1D 6LQ) on 26th November with registration starting at 6:30pm.
We’ll also be announcing a competition for the best apps, with the opportunity for a placement on the Yahoo! homepage (as well as a prize you can take home!) – more details on the night! We’re looking to you to help us build the next big thing on Yahoo!

If you want to be part of this and your company is located in the UK, send an email to and we’ll be in touch with you if there is still space.

Using YQL to read HTML from a document that requires POST data

Monday, November 16th, 2009

YQL is a very cool tool to extract data from HTML documents on the web. Let’s face facts: HTML is a terrible data format as far too many documents out there are either broken, have a wrong encoding or simply are not structured the way they should be. Therefore it can be quite a mess to try to read a HTML document and then find what you were looking for using regular expressions or tools that expect XML compatible HTML documents. Python fans will know about beautiful soup for example that does quite a good job working around most of these issues.

Using YQL you can however use a simple web service to extract data from HTML documents. As an added bonus, the YQL engine will remove falsely encoded characters and run the data retrieved through HTML Tidy to get valid HTML back. For example to get the body content of all you’d need to do is a:

select * from HTML where url=""

The really cool thing about YQL is that it allows you to XPATH to filter down the data you want to extract. For example to get all the links from you can use:

select * from html where xpath="//a" and url=""

If you only want to have the text content of the links you can do the following:

select content from html where xpath="//a" and url=""

You could use this for example to translate links using the Google translation API:

select * from google.translate where q in (
  select content from html where url="" and xpath="//a"
) and target="fr"

Now, the other day my esteemed colleague Dirk Ginader came up with a bit of a brain teaser for me. His question was what to do when the HTML document you try to get needs POST data sent to it for it to render properly? You can append GET parameters to the URL, but not POST so the normal HTML document is not enough.

The good news is that YQL allows you to extend it in many ways, one of them is using an execute block in an open table to convert data with JavaScript on the server. The JavaScript has full e4x support and allows you to do any HTTP request. So the first step to solve Dirk’s dilemma was to write a demo page (the form was added to test it out):

  <title>Test for HTML POST table</title>
  <p>Below this should be a "yay!" when 
    the right POST data was submitted.</p>
<?php if(isset($_POST['foo']) && isset($_POST['bar'])){
  echo "<p>yay!</p>";
<form action="index.php" method="post" accept-charset="utf-8">
  <input type="text" name="foo" value="is">
  <input type="text" name="bar" value="set">
  <input type="submit" value="Continue &rarr;">

The next step was to write an open table for YQL that does the necessary request and transformations.

<?xml version="1.0" encoding="UTF-8"?>
<table xmlns="">
  <author>Christian Heilmann</author>
  <description>HTML pages that need post data</description>
select * from {table} where
and postdata="foo=foo&bar=bar" and xpath="//p"]]></sampleQuery>
    <select itemPath="" produces="XML">
      <key id="url" type="xs:string" required="true" paramType="variable"/>
      <key id="postdata" type="xs:string" required="true" paramType="variable"/>
      <key id="xpath" type="xs:string" required="true" paramType="variable"/>
      var myRequest =;  
      var data = myRequest.accept('text/html').
      var xdata = y.xpath(data,xpath);
      response.object = <postresult>{xdata}</postresult>;

Using this, you can now send POST data to any HTML document (unless its robots.txt blocks the YQL server or it needs authentication) and get the HTML content back. To make it work, you define the table using the “use” command:

use "" as htmlpost;
select * from htmlpost where
and postdata="foo=foo&bar=bar" and xpath="//p"

You can try this example in the console.

I’ve also added the table to the open YQL tables repository on github so it should show up sooner or later in the console.

Here’s a quick explanation what is going on:

<?xml version="1.0" encoding="UTF-8"?>
<table xmlns="">
  <author>Christian Heilmann</author>
  <description>HTML pages that need post data</description>
select * from {table} where
and postdata="foo=foo&bar=bar" and xpath="//p"]]></sampleQuery>

You define the schema and add meta data like the author, a description and a sample query. The latter is really important as that will show up in the YQL console when people click the table. You should normally also provide a documentation URL, but this post wasn’t written when I wrote the table so I kept it empty.

    <select itemPath="" produces="XML">

The bindings of the table describe the real API data endpoints the table points to. You have select, insert, update and delete – much like any other database. You provide an itemPath to cut down on the data returned and tell YQL if the data returned is XML or JSON.

      <key id="url" type="xs:string" required="true" paramType="variable"/>
      <key id="postdata" type="xs:string" required="true" paramType="variable"/>
      <key id="xpath" type="xs:string" required="true" paramType="variable"/>

The inputs section defines what variables are expected, if they are required and what their IDs are. These IDs will be available for you as variables in the embedded JavaScript block and are normally defined by the API your table points to.

      var myRequest =;  
      var data = myRequest.accept('text/html').
      var xdata = y.xpath(data,xpath);
      response.object = <postresult>{xdata}</postresult>;

Here comes the JavaScript magic inside the execute block. The command sends a REST query to the URL. in the easiest form this would just mean to get the data back but in our case we need to define a few more things. We expect html back so we set the request accept header to text/html. This also ensures that the result is run through HTML Tidy before it is returned. The content type has to be like a form submission and we need to send the string postdata as a post request. The response then contains whatever our request brings back.

As we want to have the handy functionality of the original HTML table, we also need to do an xpath transformation which is done with the method of the same name.

Any JavaScript in the execute block needs to define a response.object which will become the result of the YQL query. As you can see, the E4X support of YQL allows you to simply write XML blocks without any DOM pains and you can embed any JavaScript variables inside curly braces.


And we’re done. Using YQL execute you can move a lot of JavaScript that does complex transformations to the Yahoo server farm without slowing down your end user’s computers. And you have a secure environment to boot as there are no DOM vulnerabilities.