yql | Christian Heilmann

Posts Tagged ‘yql’

Frontloaded and zipped up – the Full Frontal 2009 keynote

Saturday, November 21st, 2009

Here are the slides, the audio recording and my notes for the keynote of the full frontal conference held yesterday in Brighton, England. It was a blast, thank you Remy and Julie!

Slides on Slideshare

Frontloaded and zipped up – the full frontal keynote

Audio recording

You can get the recording of the talk over at archive.org – recorded on my macbook, so there are some volume fluctuations.

Talk description

The following was the description of the talk introducing the ideas to the attendees of full frontal.

Frontloaded and zipped up – do loose types sink ships?

JavaScript had a bumpy ride up to now, from its origins as a CGI-replacement, initiator of countless popups and annoying effects over the renaissance as Ajax enabler up to becoming wrapped up in libraries to work around the hell that is browser differences. With the ubiquity of JavaScript comes a new challenge. How do we keep JavaScript safe when browsers don’t really distinguish between different sources and give them all the same rights? Why do we still judge the usefulness of JavaScript by how badly browsers speak it? Learn about some environments you can use JavaScript in securely and marvel at the magic and annoyances that are technologies that try to put a lock on the issue of JavaScript security.

A quick trip down memory lane.

When I first encountered JavaScript it was mainly used to do simple calculators, window manipulation and simple form validation. The main interface used was the browser object model with window being the main object and form and element being the collections to manipulate. You added content either by changing the value of a form field or by using document.write() with the latter being different from browser to browser. The other thing you had was the images array and this is what we used extensively to create rollovers.

Event handling was done with on{event} inline handlers and the body always had an onload handler on it.

Bring on the bling!

That however did not stop us from already abusing JavaScript to create pointless bells and whistles. Status bar tickers, title changing scripts and moving popup windows were the first to annoy the end user and they were just the start.

More bling.

With browsers starting to allow you to manipulate more of the document (via document.all and document.layers) and new and bespoke CSS extensions we had even more options to do very annoying and pointless things. Animated menus, rainbow cycling scrollbars, the floating (and flickering) Geocities logo, mousetrails and other abominations were built to bling up our sites and subsequently the audience got sick of JavaScript and discarded it as a toy.

Ajax for the win!

This all changed when Ajax came around and there was no way not to have some way or another you load content on demand using XMLhttpRequest – if you wanted to have a cool web site that is. And of course people used it wrongly.

Security scares.

As people used JavaScript to load information that should not be visible to the world and it is easy to intercept and see everything that happens in a browser in JavaScript we have more and more security scares coming up.

Is JavaScript a security problem?

This bears the question if JavaScript in itself is a security problem and if we should discard it at all.

Security flaws start at the backend but JavaScript gets the blame.

Last week I came across an interesting survey by the security company Cenzic – get the PDF here. They looked at the state of the web and the main security problems in the first two quarters of 2009. The survey showed that the browser was responsible for only 8% of the overall security issues.

One thing that is interesting is that most security flaws start with a problem on the backend but get blamed on JavaScript. XSS is a backend problem, but it becomes a problem as JavaScript is designed to give scripts too many rights.

JavaScript implementation vs. JavaScript

The problem is not JavaScript itself – well, not exclusively – it is mostly the implementation of it in browsers. And funnily enough this is how we measure the quality of the language. It is like judging the quality of a book by its movie.

Browsers don’t care where JavaScript comes from.

To a browser, every JavaScript has the same rights to the content of the page and other things JavaScript can reach – and that includes cookies. When I can steal your cookies I can steal your users’ identities and this is a big security issue.

Browsers are full of security holes.

The other issue is that browsers are full of security faults. This can be interesting as people complain about IE6 and its flaws, but the survey actually ranked Firefox and Safari as the most vulnerable browsers. The reasons are plugins in the case of Firefox and – in Safari’s case – the iPhone. Interesting targets are always successful platforms.

Plugins have and still are a main source for security issues. Especially in the case of IE Flash and PDF display was always a problem. The reason is simple – plugins extend the reach of the browser into the file system and that is an interesting attack vector. So if you offer PDF documents and you want to keep your system secure it might be a good idea to loop them through a script that sets a header that forces user download – this also allows you to add statistics to the PDF downloads.

So we can’t use JavaScript, right?

Which brings a lot of people not to trust JavaScript at all and see it as the source of all evil. Plugins like NoScript are all the rage and the security-conscious are happy to call JavaScript the source of all evil.

It is about spreading the joy of JavaScript.

JavaScript is an amazingly useful part of the interfaces we give our end users. Totally turning it off or not using it means we give up on a lot of things that our users should get and expect from an interface in 2009. I like that I can write a message while an attachment uploads in the background.

Learning JavaScript

The first thing to remember is that this is not 1997. We don’t have to learn JavaScript by looking at other people’s source code. Opera’s web standards curriculum and The Yahoo video theatre are great resources to take your first steps into the JavaScript world.

What to use JavaScript for

The main thing is to remember what we should use JavaScript for:

slicker interfaces (autocomplete, asynchronous uploading)
warning users about flawed entries (password strength for example)
extending the interface options of HTML to become an application language (sliders, maps, comboboxes…)
Any visual effect that cannot be done safely with CSS (animation, menus…)

CSS has come a long way but unless you can control the animation and be sure it works cross-browser it is not a replacement. Menu systems using CSS only are a gimmick as they cannot be made keyboard accessible.

What not to use JavaScript for

Sensitive information (credit card numbers, any real user data)
Cookie handling containing session data
Trying to protect content (right-click scripts, email obfuscation)
Replacing your server / saving on server traffic without a fallback

What if you need more?

All this becomes an issue when you get into developing large web products where you push the envelope of what can be done with the web and the technologies right now. The new Yahoo homepage is one of these examples – in it we wanted to allow third party developers to build own applications and run them safely inside ours without endangering the privacy of our users.

You can limit yourself

One thing you can do is to limit yourself to the “safe” parts of a language. Douglas Crockford’s AdSafe takes this approach and is meant as a guideline for ad providers.

You can pre-process JavaScript

The other option is to enforce the limitation of the language by pre-processing JavaScript and converting it to a safer subset. The main tool for this nowadays is Caja which has been invented by Google and now made workable by Google and Yahoo for the Open Social platform. Caja converts JavaScript to a safe subset – either on the client or on the server.

Things Caja doesn’t allow you to do

To ensure the security of our applications, Caja stops you from using some things you might have gotten accustomed to using in the last few years.

Caja and HTML

Here are the things you cannot use in HTML:

name attributes
custom attributes
custom tags
unclosed tags
embed
iframe
link rel=”...”
javascript:void(0)
radio buttons in IE
relative URLs

Caja and JavaScript

Things you need to keep out of your JavaScript:

eval()
new Function()
strings as event handlers (node.onclick = ‘...’;)
names ending with double / triple underscores
with function (with (obj) { ... })
implicit global variables (specify var variable)
calling a method as a function
document.write
window.event
ajax requests returning JS

Caja and CSS

And last but not least things deemed dangerous in CSS are:

star hacks
underscore hacks
IE conditionals
Insert-after clear fix
expression()
*@import

Caja ready code examples

You can find a good collection of Caja ready code examples in the Yahoo Application Platform documentation.

Caja problems and making it easier

Whilst Caja is a great idea to ensure the security of widgets it is not without its problems. If you chose client-side conversion it means a massive dent in the performance of your application and even with server-side conversion it becomes harder to build new systems. For starters, Caja-converted code is very hard to read and therefore debug and in many cases it means that as a developer you need to change your ways.

Libraries and Caja compliance

Much like we fix browsers, we can also use libraries to make our Caja-compliant development easier. The first library to be fully Caja compliant is the Yahoo User Interface library and other libraries like jQuery have also shown interest in compliance.

Abstracting the issue with an own language – YML

The other way to make it easier to write secure code is to abstract most of th changes to our normal development ways out into an own markup language. Facebook had done this and in Yahoo’s case there is the Yahoo Markup language or short YML. Using this language in a widget for the Yahoo homepage you can do Ajax requests and dig into the Yahoo social graph without having to write any JavaScript or server-side code.

Extending browsers

Another interesting way to make JavaScript development more interesting is to think about browser extensions. This starts with GreaseMonkey which allows Firefox users to extend any web site out there with new functionality using a few lines of Dom Scripting – a great way for example to do quick prototyping. Google Gears, Yahoo Browser Plus and and Mozilla Jetpack kick this idea up a notch and give you new APIs to extend the reach of the browser into local storage, allow for database access in JavaScript and give you worker threads to do heavy computations without slowing down the main interface. These extensions give browsers the power we would love to have to be able to deliver real applications inside browsers.

Moving out of the browser

The other thing you can do with JavaScript these days is to move outside the browser and take your HTML, CSS and JavaScript solutions to other platforms.

Widget frameworks

Widget frameworks have been around for a while with Konfabulator and Apple Dashboard widgets leading the way. Opera also allows you to run small applications outside the confines of a browser window. The interesting thing about widgets is that they always looked much prettier than most web solutions – mainly because PNG support was a given and not something you had to hack for MSIE.

W3C widgets

W3C widgets are a standard that allows you to zip up an HTML document with CSS, JavaScript and images and run it as a self-contained widget. Peter-Paul Koch has written a great introduction to W3C widgets and several mobile phone providers (first and foremost Vodafone) offer a way to run these widgets on handsets without the need to learn any mobile OS language or tools.

Adobe Air

Adobe Air has made it possible for web developers to write full-blown installable applications that run across several operating systems and have access to databases and the file system. Probably the most successful apps are Twitter clients and music apps like Spotify.

Command line JavaScript – Rhino

If you don’t like all the fancy visual stuff and you want to use JavaScript to do some heavy data conversion you can use JavaScript on the command line using Rhino which is a Java implementation of JavaScript. The really cool thing about writing JavaScript for the command line is that it supports all the features of the language and you are not at the mercy of a browser to do it right.

Turning JavaScript Mashups into web services.

One rather new opportunity for developers is that you can use YQL or Yahoo Query Language to easily mash-up and filter data from several data sources on the web. YQL allows you to:

mashup data with a SQL-style syntax
filter down to the absolutely necessary data
return as XML, JSON, JSON-P and JSON-P-X
use Yahoo as a high-speed proxy to retrieve data from various sources.
use Yahoo as a rate limiting and caching proxy when providing data.

Retrieving data from an HTML document and choosing the right output format

Using YQL it is dead easy for example to retrieve the headlines from an HTML document with the following statement.

select * from html where url=”http://2009.fullfrontal.org” and xpath=”//h3”

YQL is a web service in itself and you can retrieve the data returned from this request in different formats.

XML returns the data as an XML file which is not that useful in a JavaScript environment.
JSON is natively supported and therefore much easier to parse.
JSON-P wraps the returned JSON object in a JavaScript function call and thereby makes it very easy to use in a script node (either hardcoded or created on the fly).
JSON-P-X wraps the returned JSON object in a JavaScript function call and returns the XML content (in this case the scraped HTML) as a string. This makes it very easy to use innerHTML to render the data in a browser without having to loop through the JSON object and re-assemble the string.

Retrieving photos for a certain geographical location

As a demo, try this out. In order to retrieve photos for a certain geographical location you can use the geo and Flickr APIs in a single YQL statement:

select farm,id,secret,owner.realname,server,title,urls.url.content
from flickr.photos.info where photo_id in(
select id from flickr.photos.search where woe_id in(
select woeid from geo.places where text=”london”
)

)

Try it out in your browser to see the resulting data.

Using a few lines of DOM scripting you can turn this into a nice web site showing these photos.

Moving JavaScript solutions into YQL to turn them into web services

The problem with the solution above is that you make yourself dependent on JavaScript to show these photos. If you want to still use JavaScript but allow users without it to see these photos you can use a YQL open table with embedded JavaScript to do the conversion. YQL uses Rhino to run and execute your JavaScript server-side and returns you the content you created inside an XML or JSON file. As JavaScript is executed on the server, you have full E4X support to make the use of XML painless and you can use advanced JavaScript like for each:

var amt = amount || 10;

var query = ‘select farm,id,secret,owner.realname,server,title,’+

‘urls.url.content from flickr.photos.info where ‘+

‘photo_id in (select id from flickr.photos.search(‘+

amount + ‘) where ‘;

if(location!==null){

query += ‘woe_id in (select woeid from geo.places where text=”’ +

location+’”) and ‘;

}
	query += ’ text=”’ + text + ‘” and license=4)’

var x = y.query(query);

var out = 
;

for each(var cur in x.results.photo){

var li = ;

var a = ;

a.@[“href”] = cur.urls.url;

var img = ;

var url = ‘http://farm’ + cur.@farm + ‘.static.flickr.com/’ +

cur.@server + ‘/’+cur.@id + ‘_’ + cur.@secret +

‘_s.jpg’;

img.@[“src”] = url;

img.@[“alt”] = cur.title;

a.img = img;

li.a = a;

out.li += li;

}
	response.object = out;

This, embedded in an open table means you can retrieve photos from Flickr as a UL now using the following YQL statement:

select * from flickr.photolist where text=”me” and location=”uk” and amount=20

You can then display the photos returned with PHP:


$url = ‘http://query.yahooapis.com/v1/public/yql?q=use%20%22http://github.com/codepo8/yql-tables/raw/master/flickr/flickr.photolist.xml%22%20as%20flickr;%20select%20*%20from%20flickr%20where%20text=%22me%22%20and%20location=%22uk%22%20and%20amount=20&format=xml&diagnostics=false’;

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, $url);

curl_setopt($ch, CURLOPT_RETRNTRANSFER, 1);

$output = curl_exec($ch);

curl_close($ch);

$output = preg_replace(‘/.*/’,’‘,$output);

$output = preg_replace(‘/
.*/’,’
‘,$output);

$output = preg_replace(‘//’,’‘,$output);

$output = preg_replace(‘//’,’‘,$output);

echo $output;

?>

Or with a very simple JavaScript, thanks to the JSON-P-X output format:

Another example – scraping HTML from web pages that need POST data

Another powerful example of what you can do with JavaScript when you embed it into a YQL table is the following:





Christian Heilmann
HTML pages that need post data

select * from {table} where

url=’http://isithackday.com/hacks/htmlpost/index.php’

and postdata=”foo=foo&bar=bar” and xpath=”//p”]]>
http://www.wait-till-i.com/2009/11/16/using-yql-to-read-html-from-a-document-that-requires-post-data/

As explained in detail in this blog post this JavaScript extends the HTML scraping option of YQL to allow for POST data to be sent to a document before retrieving the HTML:

select * from htmlpost where

url=’http://isithackday.com/hacks/htmlpost/index.php’

and postdata=”foo=foo&bar=bar” and xpath=”//p”

Notice that YQL execute gives you full REST and HTTP support and has the xpath conversion built-in as a on own function.

oAuth in JavaScript – the netflix example

Another interesting example is the open table provided by Netflix, which shows you how you can use oAuth in JavaScript:

// Include the OAuth libraries from oauth.net

y.include(“http://oauth.googlecode.com/svn/code/javascript/oauth.js”);

y.include(“http://oauth.googlecode.com/svn/code/javascript/sha1.js”);

	// Collect all the parameters

var encodedurl = request.url;

var accessor = { consumerSecret: cks, tokenSecret: “”};

var message = { action: encodedurl, method: “GET”, parameters: [[“oauth_consumer_key”,ck],[“oauth_version”,”1.0”]]};

OAuth.setTimestampAndNonce(message);

	// Sign the request

OAuth.SignatureMethod.sign(message, accessor);

	try {

// get the content from service along with the OAuth header, and return the result back out

response.object = request.contentType(‘application/xml’).header(“Authorization”, OAuth.getAuthorizationHeader(“netflix.com”, message.parameters)).get().response;

} catch(err) {

response.object = {‘result’:’failure’, ‘error’: err};

}

Liberating our JavaScript

As you can see switching environments liberates our JavaScript solutions and gives us much tighter security. So open your minds and don’t judge JavaScript by its implementation. Instead have fun with it and use it wisely. With great power comes great responsibility.

Tags: fullfrontal09, fullfrontalconf, javascript, keynote, rhino, slides, yql
Posted in General | 6 Comments »

Using YQL to read HTML from a document that requires POST data

Monday, November 16th, 2009

YQL is a very cool tool to extract data from HTML documents on the web. Let’s face facts: HTML is a terrible data format as far too many documents out there are either broken, have a wrong encoding or simply are not structured the way they should be. Therefore it can be quite a mess to try to read a HTML document and then find what you were looking for using regular expressions or tools that expect XML compatible HTML documents. Python fans will know about beautiful soup for example that does quite a good job working around most of these issues.

Using YQL you can however use a simple web service to extract data from HTML documents. As an added bonus, the YQL engine will remove falsely encoded characters and run the data retrieved through HTML Tidy to get valid HTML back. For example to get the body content of CNN.com all you’d need to do is a:

select * from HTML where url="http://cnn.com"

The really cool thing about YQL is that it allows you to XPATH to filter down the data you want to extract. For example to get all the links from cnn.com you can use:

select * from html where xpath="//a" and url="http://cnn.com"

If you only want to have the text content of the links you can do the following:

select content from html where xpath="//a" and url="http://cnn.com"

You could use this for example to translate links using the Google translation API:

select * from google.translate where q in (
  select content from html where url="http://cnn.com" and xpath="//a"
) and target="fr"

Now, the other day my esteemed colleague Dirk Ginader came up with a bit of a brain teaser for me. His question was what to do when the HTML document you try to get needs POST data sent to it for it to render properly? You can append GET parameters to the URL, but not POST so the normal HTML document is not enough.

The good news is that YQL allows you to extend it in many ways, one of them is using an execute block in an open table to convert data with JavaScript on the server. The JavaScript has full e4x support and allows you to do any HTTP request. So the first step to solve Dirk’s dilemma was to write a demo page (the form was added to test it out):

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
  "http://www.w3.org/TR/html4/strict.dtd">
<html>
  <title>Test for HTML POST table</title>
 
<body>
  <p>Below this should be a "yay!" when 
    the right POST data was submitted.</p>
<?php if(isset($_POST['foo']) && isset($_POST['bar'])){
  echo "<p>yay!</p>";
}?>
<form action="index.php" method="post" accept-charset="utf-8">
  <input type="text" name="foo" value="is">
  <input type="text" name="bar" value="set">
  <input type="submit" value="Continue &rarr;">
</form>
  </body>
</html>

The next step was to write an open table for YQL that does the necessary request and transformations.

<?xml version="1.0" encoding="UTF-8"?>
<table xmlns="http://query.yahooapis.com/v1/schema/table.xsd">
  <meta>
  <author>Christian Heilmann</author>
  <description>HTML pages that need post data</description>
  <sampleQuery><![CDATA[
select * from {table} where
url='http://isithackday.com/hacks/htmlpost/index.php' 
and postdata="foo=foo&bar=bar" and xpath="//p"]]></sampleQuery>
  <documentationURL></documentationURL>
  </meta>
  <bindings>
    <select itemPath="" produces="XML">
    <urls>
      <url>{url}</url>
    </urls>
    <inputs>
      <key id="url" type="xs:string" required="true" paramType="variable"/>
      <key id="postdata" type="xs:string" required="true" paramType="variable"/>
      <key id="xpath" type="xs:string" required="true" paramType="variable"/>
    </inputs>
    <execute>
    <![CDATA[
      var myRequest = y.rest(url);  
      var data = myRequest.accept('text/html').
                 contentType("application/x-www-form-urlencoded").
                 post(postdata).response;
      var xdata = y.xpath(data,xpath);
      response.object = <postresult>{xdata}</postresult>;
    ]]>
    </execute>
  </select> 
  </bindings>
</table>

Using this, you can now send POST data to any HTML document (unless its robots.txt blocks the YQL server or it needs authentication) and get the HTML content back. To make it work, you define the table using the “use” command:

use "http://isithackday.com/hacks/htmlpost/htmlpost.xml" as htmlpost;
select * from htmlpost where
url='http://isithackday.com/hacks/htmlpost/index.php'
and postdata="foo=foo&bar=bar" and xpath="//p"

You can try this example in the console.

I’ve also added the table to the open YQL tables repository on github so it should show up sooner or later in the console.

Here’s a quick explanation what is going on:

<?xml version="1.0" encoding="UTF-8"?>
<table xmlns="http://query.yahooapis.com/v1/schema/table.xsd">
  <meta>
  <author>Christian Heilmann</author>
  <description>HTML pages that need post data</description>
  <sampleQuery><![CDATA[
select * from {table} where
url='http://isithackday.com/hacks/htmlpost/index.php' 
and postdata="foo=foo&bar=bar" and xpath="//p"]]></sampleQuery>
  <documentationURL></documentationURL>
  </meta>

You define the schema and add meta data like the author, a description and a sample query. The latter is really important as that will show up in the YQL console when people click the table. You should normally also provide a documentation URL, but this post wasn’t written when I wrote the table so I kept it empty.

  <bindings>
    <select itemPath="" produces="XML">
    <urls>
      <url>{url}</url>
    </urls>

The bindings of the table describe the real API data endpoints the table points to. You have select, insert, update and delete – much like any other database. You provide an itemPath to cut down on the data returned and tell YQL if the data returned is XML or JSON.

    <inputs>
      <key id="url" type="xs:string" required="true" paramType="variable"/>
      <key id="postdata" type="xs:string" required="true" paramType="variable"/>
      <key id="xpath" type="xs:string" required="true" paramType="variable"/>
    </inputs>

The inputs section defines what variables are expected, if they are required and what their IDs are. These IDs will be available for you as variables in the embedded JavaScript block and are normally defined by the API your table points to.

    <execute>
    <![CDATA[
      var myRequest = y.rest(url);  
      var data = myRequest.accept('text/html').
                 contentType("application/x-www-form-urlencoded").
                 post(postdata).response;
      var xdata = y.xpath(data,xpath);
      response.object = <postresult>{xdata}</postresult>;
    ]]>
    </execute>

Here comes the JavaScript magic inside the execute block. The y.rest(url) command sends a REST query to the URL. in the easiest form this would just mean to get the data back but in our case we need to define a few more things. We expect html back so we set the request accept header to text/html. This also ensures that the result is run through HTML Tidy before it is returned. The content type has to be like a form submission and we need to send the string postdata as a post request. The response then contains whatever our request brings back.

As we want to have the handy functionality of the original HTML table, we also need to do an xpath transformation which is done with the method of the same name.

Any JavaScript in the execute block needs to define a response.object which will become the result of the YQL query. As you can see, the E4X support of YQL allows you to simply write XML blocks without any DOM pains and you can embed any JavaScript variables inside curly braces.

  </select> 
  </bindings>
</table>

And we’re done. Using YQL execute you can move a lot of JavaScript that does complex transformations to the Yahoo server farm without slowing down your end user’s computers. And you have a secure environment to boot as there are no DOM vulnerabilities.

Tags: executetables, HTML, opentables, scraping, serverside javascript, yql
Posted in General | 11 Comments »

A Speakerrate comments badge or another reason to love Twitter

Tuesday, November 10th, 2009

Another reason to love Twitter: You ask for APIs and you get them. As requested in my Tweet, speaker rate acted very quickly and build an API
for latest comments – and here is my thank you. Using a bit of YQL magic and some JavaScript I put together a badge to show off the latest speakerrate comments on your page:

Check out the demo page of the badge and get the source and read the docs on GitHub.

Usage:

Simply add a DIV with a link to your speaker page on speakerrate:



Rate me on Speakerrate

Then add the script and call the init() method:

You can also provide a few options to the init() method to change the look and features of the badge:

If you need different styles, just use styled:false or override the ones applied.

Tags: api, badge, javascript, speakerrate, yql
Posted in General | Comments Off on A Speakerrate comments badge or another reason to love Twitter

Of Hamsters, Feature Creatures and Missed Opportunities – my talk at Fronteers 2009

Thursday, November 5th, 2009

The outline

In this session Chris Heilmann will show how some of the traits we have as developers keep us from evolving and our market from maturing. Instead of picking and choosing from already amazing products we tend to want to build and maintain everything ourselves – and fail at keeping up the quality leaving both security holes and carwrecks on the information superhighway behind. There is another way and Chris will show just how much more fun we can have with the web if we use systems that are available for us for free, remix the web with simple meta-mixing systems and liberate our developer minds from the barriers technology throws at us. If you always thought that there has to be more to our jobs than fixing bugs, this session is for you.

Slides

Fronteers 2009 Of Hamsters, Feature Creatures and Missed Opportunities

The audio recording

You can download the MP3 of the talk at archive.org – 42MB.

Following are the notes which will resemble what I am talking about…

Remember, remember the 5th of November…

First of all, how cool is it to have a conference on Guy Fawkes? Whoever can recite the “V” alliteration later on flawlessly I will buy a beer for.

Hamstering

Here’s a thing that happened in my garden in London the other day. My friend keeps putting bird feed in the garden to, well, feed the birds. She also wonders why she keeps having weeds in her garden. Well, the reason is this: a squirrel keeps coming and takes the bird feed and buries it a meter away from the bird feeder (video of the theft). This is stupid as we keep replenishing the bird feed anyways. However in the mind of the squirrel it is a safe idea to keep a cache. Luckily we humans are cleverer, right?

No. One very human trait we have is that we want to collect things – especially when they are free. If you followed my exploits this year you’d have seen that I travelled – a lot. What I found myself doing is collecting both inflight sock and toothpaste packs and little soap/shampoo/body lotion bottles. I am amazed just how many of these things I accumulated without ever needing them.

This is exactly what we do with knowledge and code examples. I have about 20GB of e-books on all kind of technical subjects. All of these I haven’t looked at for years as most of the topics are outdated. However, we accumulate knowledge and think we do a great job once we found the solution to end all solutions. That doesn’t work any longer. We should not fill up our head with useless information about how some browser fails to do one thing or another. This secured our jobs during the first .com crash but it is time to let go.

Amazing resources

Nowadays we have amazing resources for beginners to learn what we are doing. The times of copy and paste and then try to understand someone else’s code is over. We should not put random stuff on the web. Both the Opera Web Standards Curriculum and the WaSP interact curriculum needs your input.

Evolving the web

Our job as developers is to evolve the web to something more than it is now. However, we are not really concentrating on that. There are a few things that are in our way.

Specialising without caring about the consequences

One thing is that we tend to become specialists in a certain area and forget about the others. Web development is a task of working together with other specialists to build amazing interfaces that make a change in people’s lives. If you are not passionate about a certain aspect of it, don’t do that job – instead partner with somebody who is passionate about it.

Fanboys.

The amount of energy wasted every day fighting each other on forums, mailing lists and other resources about which technology is best and what library to use pains and bores me. It is not helpful and it doesn’t make any sense to try to make the world fit your tool of choice.

Be a librarian

Good developers should be like librarians. You don’t need to know everything, but you need to know where to find the right information at the right time.

Use libraries

And for that, use libraries like Dojo, jQuery, YUI, mootools or whatever to work around the browser problems so that you can concentrate on what you want to build rather than how it fails.

Libraries fix browsers

Our main enemy to build cool products are browsers. The technologies and standards that drive the web are dead easy – we just don’t have time using them as we spend most of our time testing and wondering why browsers don’t implement them in a predictable manner. This is where libraries come in and my plea here is to use them. Using libraries you can write predictable applications built on a base that can be altered for all the browsers to come – if you build towards the browsers in use today you build for the past. We’ve done that before and paid dearly for it.

Use don’t abuse libraries

Libraries give you an amazing amount of power to do quick solutions. Be aware that when you use them though, you use them wisely. Do not mix HTML, CSS and JavaScript in the same script. This creates unmaintainable solutions and we had years and years of those.

The feature creature

The first time I heard about the feature creature was in 2004 here in Amsterdam at the PHP conference. I loved the image of every developer having a little creature on their shoulder that keeps telling us what we should be doing. Most of the time the feature creature tells us that whatever is out there is not nearly as good as the thing we can come up with. It also keeps telling us that whatever we do needs more features. This leads to something I call the feature loop.

The feature loop

As developers we are driven to writing simple, elegant and clever solutions for complex issues. Then we do the right thing and release those to the whole world. Then we do the wrong thing by listening to everybody who wants to have an extra feature and support for some very esoteric edge case. We add all of these extensions and build something complex again that another developer might find and simplify again.

Breaking the loop

The only way to break this vicious circle is by broadening our horizons and being cleverly lazy. I talked about this before in detail at the web expo in Prague. In essence what I am saying is that a lot of what we think is our job to do has been done for us already. And now I am going to show you some of these things.

Missed opportunities

The thing is that by building on tested and working solutions we can build faster, more secure and working solutions. While we think that we need to squeeze the last out of everything we do ourselves very successful web sites are being built in other countries where developers are very pragmatic of what has to be done. Questioning an out-of-the box solution is a good safety measure but you also need to draw the line at trying to replace them. Instead, why not add to the already existing system – all of them are open source!

The other thing we keep forgetting as web developers is that the technologies we are experts in right now reach much further than the browser world.

Our technologies are hot right now

Mobile development, W3C mobile widgets, iPhone development, TV widgets and applications running inside Facebook, Yahoo or iGoogle are all based on CSS, HTML and JavaScript and run in much more defined and easier to use environments. We can take our knowledge and finally work without the sword of damocles of bad browser support hanging over our head.

Changing the idea of what the web is

The web is not web sites, it is not even code. Deep down all of it is about data and information. The Yahoo homepage for example moved from a directory of bookmarks to a portal full of third party content to a starting point for users where they can mix and match content of the web to their liking. This goes as far as now allowing people to update Facebook from the Yahoo homepage or you to write an application that is shown on it.

Building with components

Interfaces like that are only possible when you move away from the old idea of having a single page but you think of small components in the interface that could hold whatever the end user pleases. This is were we run into issues with the web standards, but that is another talk.

Based on APIs

All the data to remix and show is available for us on the web already – offered as APIs and listed on sites like Programmable Web.

Kobayashi Maru

Looking at the sheer amount of APIs – over 1500 right now – and knowing full well that all of them were built by developers with the feature creature on their shoulder it seems impossible to use a lot of them at the same time as you need to read up on all the documentation, get developer keys, learn the authentication and so on. We have that same problem internally at Yahoo, which is why we tackled it.

YQL is there for you

YQL is a meta language for the web that over a single, simple REST service turns the whole web into a giant database for you to explore and take part in. Let’s see some examples:

Photos from Flickr?

select * from flickr.photos.search where text = “cat”

Headlines of the New York Times?

select * from nyt.article.search where query=”spontaneous combustion”

Latest commits on a GitHub repo?

select * from github.repo.commits where id=’yql’ and repo=’yql-tables’

Latest headlines of the Volkskrant?

select title from html where url=”http://www.volkskrant.nl/” and xpath=”//h2[@title]”

Latest “English” headlines of the Volkskrant?

select * from google.translate where q in (select title from html where url=”http://www.volkskrant.nl/” and xpath=”//h2[@title]”) and target=’en’

Updating Twitter?

use ‘http://www.yqlblog.net/samples/twitter.status.xml’;

insert into twitter.status

(status,username,password)

values (

“In your fronteers, blowing ur mindz”,

“codepo8”,

“didyoureallythinkIshowit?”

)

Updating WordPress?

insert into wordpress.post

(title, description, blogurl, username,

password)

values

(“Test Title”, “This is a test body”,

“http://ajaxian.com”, “codepo8”, “iedoesitright”)

Updating the internet?

delete from internet where user_agent=”MSIE” and version < 8;

– Not yet, sorry

Search one term in Google, Bing and Yahoo?

select * from query.multi where queries=’

select * from microsoft.bing where query=”Jimmy Hoffa” and source in (“web”);

select * from google.search where q = “Jimmy Hoffa”;

select * from search.web where query = “Jimmy Hoffa”

‘

You can do all of this in the REST API

https://query.yahooapis.com/v1/public/yql?q={uri-encoded-query}&

format={xml|json}&

diagnostics={true|false}&

callback={function}&

env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys

Things you can do in YQL

You can mix and match whatever you want.
You can sort and limit.
You can extend with your own data tables by writing a schema and putting it on GitHub.
You can store information on the YQL servers (the cloud, the cloud)

Where’s the catch?

10k hits an hour (20k with oAuth).
100k hits a day.
Data is cached for 15 minutes (soon customisable).
Execution limit on the server is 30 seconds (for now).

That’s all!

An example – loading photos for a location from Flickr

Find Amsterdam

select * from geo.places where text=”amsterdam”

Define Amsterdam and get photos taken there

select * from flickr.photos.search where woe_id in (

select woeid from geo.places where text=”amsterdam”

)

Get all the information about these photos

select * from flickr.photos.info where photo_id in (

select id from flickr.photos.search where woe_id in (

select woeid from geo.places where text=”amsterdam”

)
	)

Get only what we need

select
farm,id,secret,owner.realname,server,title,urls.url.content
from flickr.photos.info where photo_id in (
select id from flickr.photos.search where woe_id in (
select woeid from geo.places where text=”amsterdam”
)

)

Select format JSON, define a callback and copy and paste the URL.

http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20flickr.photos.info%20where%20photo_id%20in%20(select%20id%20from%20flickr.photos.search%20where%20woe_id%20in%20(select%20woeid%20from%20geo.places%20where%20text%3D%22amsterdam%22))&format=json&diagnostics=false&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=flickr

Copy into a script src and write some DOM scripting to display

And you have a mashup!

Well, to a degree. You can see that it pulls photos from Flickr and shows them on a page.

You also have FAIL

As much as we love it, JavaScript is not reliably available in browsers and cannot be trusted for providing very important content. Or can it?

Moving JavaScript solutions server-side with YQL execute

Open data tables providing YQL with data can contain JavaScript that will be executed on the server to convert information. That way you can take the script just shown and make it a server-side conversion. The benefit of this is that you are not bound by client security flaws and you have full language support, including E4X for XML. The earlier script will thus become the following:

var amt = amount || 10;

var query = ‘select farm,id,secret,owner.realname,server,title,’+

‘urls.url.content from flickr.photos.info where ‘+

‘photo_id in (select id from flickr.photos.search(‘+

amount + ‘) where ‘;

if(location!==null){

query += ‘woe_id in (select woeid from geo.places where text=”’ +

location+’”) and ‘;

}
	query += ’ text=”’ + text + ‘” and license=4)’

var x = y.query(query);

var out = 
;

for each(var cur in x.results.photo){

var li = ;

var a = ;

a.@[“href”] = cur.urls.url;

var img = ;

var url = ‘http://farm’ + cur.@farm + ‘.static.flickr.com/’ +

cur.@server + ‘/’+cur.@id + ‘_’ + cur.@secret +

‘_s.jpg’;

img.@[“src”] = url;

img.@[“alt”] = cur.title;

a.img = img;

li.a = a;

out.li += li;

}
	response.object = out;

This, embedded in an open table means you can retrieve photos from Flickr as a UL now using the following YQL statement:

use “http://github.com/codepo8/yql-tables/raw/master/flickr/flickr.photolist.xml” as flickr;

select * from flickr where text=”me” and location=”uk” and amount=20

You can then display the photos returned either with PHP:


$url = ‘http://query.yahooapis.com/v1/public/yql?q=use%20%22http://github.com/codepo8/yql-tables/raw/master/flickr/flickr.photolist.xml%22%20as%20flickr;%20select%20*%20from%20flickr%20where%20text=%22me%22%20and%20location=%22uk%22%20and%20amount=20&format=xml&diagnostics=false’;

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, $url);

curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

$output = curl_exec($ch);

curl_close($ch);

$output = preg_replace(‘/.*/’,’‘,$output);

$output = preg_replace(‘/
.*/’,’
‘,$output);

$output = preg_replace(‘//’,’‘,$output);

$output = preg_replace(‘//’,’‘,$output);

echo $output;

?>

Or you can of course use JavaScript:

Be part of the movement

If you have other practical solutions like that, please write them and add them to the GitHub repository so they can become part of the interface.

Read the YQL documentation

Other than that you can read the YQL docs, there’s a YQL Blog and a Forum available. See you there.

Be clever, not busy

I think it is time we find solutions like libraries and YQL for our problems rather than doing everything by hand. Spend your time on making the web fun for humans, not on pleasing machines. If you are clever, you are allowed to be lazy. Replacing the web would be tough, so use it well.

Tags: conference, fronteers, fronteers09, fronteers2009, talk, web of data, yql
Posted in General | 9 Comments »

Getting a list of Flickr photos by location and/or search term with a YQL open table

Monday, November 2nd, 2009

Displaying photos from Flickr can be daunting. The API needs authentication and the RSS, JSON or LOLcode output is very limited. The way around is using YQL and its Flickr tables. That way it is pretty easy to search Flickr:

select * from flickr.photos.search where text=”panda”

Try out the Panda search in the YQL console.

The output format has a lot of information in there, but sadly enough, not all. For example the real name of the owner or the description is missing. Therefore you need to go through yet another Flickr API to get the whole data set:

select * from flickr.photos.info where photo_id in(

select id from flickr.photos.search where text=”panda”

)

Guess what? You can also try this more detailed query in the console.

I’ve shown before how easy it is to display Flickr Photos retrieved that way:

The issue with that though is that it uses JavaScript and JavaScript may be turned off (think Blackberries). Of course you can do the same thing in PHP but I’d wager to say more people to JavaScript than PHP these days.

The main issue is that Flickr returns the photos in a pretty weird format and that you need a script like the one above to turn it into a simple HTML list.

The good news is that YQL with the execute command allows you to embed JavaScript in your open tables. That way you can write a table that does all the necessary transformations and returns the data as a simple list for immediate use:





select * from {table} where location=”london,UK”
Christian Heilmann
http://www.wait-till-i.com/2009/11/01/getting-a-list-of-flickr-photos-by-location-andor-search-term-with-a-yql-open-table
Searches Flickr by location and/or search term and returns an HTML list that can be immediately used in a mashup.

You’ll notice that while the E4X support is very powerful, it can be a bit confusing to look at on first sight. Once you got your head around though it becomes much cleaner that way.

You can use this table like any other open table via the use command in YQL:

use “http://github.com/codepo8/yql-tables/raw/master/flickr/flickr.photolist.xml” as flickr;

select * from flickr where text=”me” and location=”uk” and amount=20

try it in the console.

I’ve wrapped one more API in there – the Yahoo Geo API to determine a place from a name should you want to search by location. All in all you have three parameters in this open table – all of which are optional:

text – the search text
location – the geographical location
amount – the amount of photos to be returned

If you look at the table source, you can also see that I hard-wired the license of the photos to 4 which is CC-BY. So if you link the photos back to Flickr you both satisfied Flickr’s terms and the original photographer’s.

Now, the easiest way to use this output is by using YQL’s JSON-P-X output format. This is XML with a callback which returns a JSON object with the HTML as a string instead of a convoluted JSON object. See the JSON-P-X output here.

That way you can easily use it in JavaScript:

And also in PHP:


$url = ‘http://query.yahooapis.com/v1/public/yql?q=use%20%22http://github.com/codepo8/yql-tables/raw/master/flickr/flickr.photolist.xml%22%20as%20flickr;%20select%20*%20from%20flickr%20where%20text=%22me%22%20and%20location=%22uk%22%20and%20amount=20&format=xml&diagnostics=false’;

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, $url);

curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

$output = curl_exec($ch);

curl_close($ch);

$output = preg_replace(‘/.*/’,’‘,$output);

$output = preg_replace(‘/
.*/’,’
‘,$output);

$output = preg_replace(‘//’,’‘,$output);

$output = preg_replace(‘//’,’‘,$output);

echo $output;

?>

You can see both in action on the demo page.

So by using YQL open tables you can not only access complex APIs with YQL, but you could also write complete mashups in JavaScript and have them executed in a safe environment on Yahoo’s servers. Your end of the mashup is simply the display which could be a form that works with Ajax when JavaScript is available and renders a static page in PHP (or whatever other server-side language) when JavaScript is turned off. You only need to do one HTTP request – the rest is executed and cached on the YQL server farm – everybody wins.

Tags: ap, api, flickr, javascript, mashups, opentables, yql
Posted in General | Comments Off on Getting a list of Flickr photos by location and/or search term with a YQL open table