Christian Heilmann

Posts Tagged ‘JSON’

Does API rate limiting spell the end of progressive enhancement?

Sunday, January 25th, 2009

Building TweetEffect taught me a few lessons and also pointed out some annoyances when building with third party APIs. Above all, I had to re-think and violate some of the best practices I’ve been advocating for years now.

First of all, TweetEffect was meant to be a demo for a university hack day and I didn’t quite plan for it to be a big success. Therefore I cobbled it together rather than planning the whole thing. What I wanted to build was a small tool that shows me my latest Twitter updates and analyze the changes in follower numbers. I then mapped those to the updates that happened before the change to show which ones might have been the cause.

The TweetEffect wishlist

I’ve had a few things I wanted to avoid:

  • Users shouldn’t have to give me their Twitter login data – this is just wrong, no matter how you put it
  • I didn’t want to cache any data on my server, for the same reason and to avoid my DB getting hammered (this blog runs on the same one :-))
  • I wanted end users to be able to use the site or simply get the results with a widget and subsequently with an API.

The PHP solution

Now, the normal way I would go on about building a solution like TweetEffect is to build it in PHP and then enhance it with JavaScript. This means it will work for everybody – including me on my BlackBerry – and I have PHP at my disposal, which is much richer than JavaScript when it comes to XML conversion or even array handling.

The normal way of dealing with it would be something like this:


include(‘./api.php’);
// the API sanitizes the user parameter, contacts the third party
// API and gives the data back in the right format, including the
// $user variable.
?>





if($user!==’‘){
// handling code…
}

?>

The problem I encountered with this even whilst developing is that if you call a third party API in your API you can quickly run against its limits and get blocked for an hour.

The only workaround is to cache the results locally – something I wanted to avoid for accuracy and the sanity of my server. Other services do caching for you (like gnip) but then you also run into the issue of data being outdated. During development it is a good idea to have a local flat data file stored to use – this will also cut down on your development time as you never have to wait for the third party servers.

Crowdsourcing API calls to avoid reaching the limit

Normally progressive enhancement in this case could be used to override the form submit event to show a slicker interface and do sorting of the data once it has been loaded without re-reading the page. This would cut down on the number of times you accessed the third party API.

However, if the API is more restrictive (like Twitter) but has a JSON output you can work around the issue by not calling the API server-side but instead create script nodes dynamically to get the data. That way you’re not the one requesting it but the computers of your users are doing it for you. Exceeding the API limit can only be done by your end users individually, not by all of them together. The obvious drawback is that users without JavaScript don’t get any results.

In the case of using dynamic script nodesthe api.php file still does the user entry sanitization, but instead of contacting the third party API and writing out the data directly, it writes out an HTML scaffolding and the necessary JavaScript files.


include(‘./api.php’);
// the API sanitizes user entries, contacts the third party
// API and gives the data back in the right format.
?>





if($user!==’‘){
echo $HTMLscaffolding;
echo $scripts;
}

?>

This, however is not progressive enhancement as it does not test if JavaScript is available – instead it simply expects it to work. We could work around that by adding a hidden form field that gets populated with JavaScript or simply by giving the submit button a name attribute when JavaScript is available.


include(‘./api.php’);
// the API sanitizes user entries, contacts the third party
// API and gives the data back in the right format.
?>






if($user!==’‘){
if($js!==’‘){
echo $HTMLscaffolding;
echo $scripts;
} else {
// handling code
}

}
?>

In any case, the solution will never be proper progressive enhancement as you will have to maintain two versions: the one that builds the resulting interface in JavaScript, and another one that does it server-side. The server side solution will most likely keel over sooner or later and you cannot offer a simple URL interface like app.php?user=user_name as this will always lead to the server side solution instead of the JavaScript one.

Submission method switching

The way around that is to change the method of the form when JavaScript is available. Initially you set the form to POST and you change it to GET if JavaScript is turned on. You can then check in the API for POST or GET submission and react accordingly:

  • If there is a GET parameter use the JavaScript solution
  • If POST was used then the form was submitted without JavaScript and you offer the server-side solution.

This means that people without JavaScript cannot use the REST API of your application, but still can enter the data in the form and send this one off. You will hit the rate limit in this case sooner or later, but seeing that most users will have JavaScript available it is quite a safe bet that it’ll be a rare occasion.


include(‘./api.php’);
?>






if($user!==’‘){
if($js){
echo $htmlScaffolding;
echo $scripts;
}

if(!$js){
// server side solution
}

}

?>

You can see the result in the demo and download the demo files as a zip. Try the demo (any user name works, this is a hard-coded API, not live Twitter data) with and without JavaScript to see the difference.

Summary

All in all strict rate-limiting is a real pain for web application developers (or hackers for that matter). The reasons are of course obvious, and this workaround does the job for now. It is however not quite right and does make it harder for users without JavaScript. The other issue of course is that the security aspect of using JSON in generated script nodes without validation can become a problem.

In the end it boils down to what your API should be used for and to maintain a good communication with your API users. If your product by definition is meant for short-term-high-traffic viral solutions then the ball is in your court to keep it scalable.

TTMMHTM: Good job news, extending browsers, TweetEffect and converting Wikipedia to JSON

Monday, January 19th, 2009

Things that made me happy this morning (and I tell you about in the afternoon):

Adding transcripts to presentations embedded from SlideShare using YQL

Sunday, January 11th, 2009

I like SlideShare a lot (yeah, repetition, I know). It is a great way of spreading your presentations as it allows others to embed them into their blogs and web sites and it also allows people to download and re-use what you’ve done.

One thing I really like about SlideShare is that it creates HTML transcripts of your presentation displayed far down the page as shown in the following screenshot:

Screenshot of slideshare showing transcripts

Now, the annoying thing is that the SlideShare API - full of goodness as it is – does not provide a way to get to these transcripts in case you want to display them alongside your presentation. You also need to authenticate for detailed information which is very needed but also bad if you just want to offer a JavaScript solution.

Good thing that there is an easy way to retrieve information from any web site out there now and get it back as JSON to use in JavaScript: YQL. Using the YQL console and some XPATH I can for example extract the transcription information and get it back as XML (the slideshare URL is http://www.slideshare.net/cheilmann/shifting-gears-presentation and the XPATH //ol/li):


http://query.yahooapis.com/v1/public/yql?q=select * from html where url%3D”http%3A%2F%2Fwww.slideshare.net%2Fcheilmann%2Fshifting-gears-presentation” and xpath%3D’%2F%2Fol%2Fli’&format=xml

If you define the format as JSON and provide a callback parameter you can easily use this to write a small script to inject transcripts into SlideShare embeds:

slideshareTranscripts = function(){
var div = document.getElementsByTagName(‘div’);
var containers = {};
for(var i=0;div[i];i++){
if(div[i].id.indexOf(‘__ss’)!==-1){
var slideurl = div[i].getElementsByTagName(‘a’)[0].href.split(‘?’)[0];
containers[slideurl]={c:div[i],id};
get(slideurl);
}

}
function get(url){
var url = ‘http://query.yahooapis.com/v1/public/yql?’ +
‘format=json&callback=slideshareTranscripts.doit&q=’ +
‘select%20strong,p%20from%20html%20where%20url%3D%22’ +
encodeURIComponent(slideurl) +
‘%22%20and%20xpath%3D%27%2F%2Fol%2Fli%27&’;
var s = document.createElement(‘script’);
s.src = url;
s.type = ‘text/javascript’;
document.getElementsByTagName(‘head’)[0].appendChild(s);
}

function doit(o){
var url = decodeURIComponent(o.query.uri).split(‘”’);
var out = document.createElement(‘ol’);
var lis = o.query.results.li;
for(var i=0,j=lis.length;i var li = document.createElement(‘li’);
var strong = document.createElement(‘strong’);
var p = document.createElement(‘p’);
strong.appendChild(document.createTextNode(lis[i].strong));
p.appendChild(document.createTextNode(lis[i].p));
li.appendChild(strong);
li.appendChild(p);
out.appendChild(li);
}

containers[url[1]].c.appendChild(out);
}

return{doit:doit}
}();

You can see the script in action and download it for your own use. All you need to do is add it to the bottom of any document with one or several SlideShare embed codes in it. Here’s what the script does:

slideshareTranscripts = function(){
var div = document.getElementsByTagName(‘div’);

We get all the DIV elements in the page (there is probably a faster way using CSSQuery these days, but let’s be bullet-proof).

In order to find out which one of the DIVs is a SlideShare embed, we check for an ID that contains __ss as the embed codes have a generated ID starting with this.

What we will do with each of these is to find out what the url of the slideshow is. This is needed because of two reasons: first of all we want to retrieve the transcript and secondly we need a way of matching the returned data from YQL with the right DIV container.

As generated script nodes are not safe to load one after the other this makes sure that the right transcript gets added to the right slides.

So, what we do is check the ID of the DIV, get the first link, retrieve the url from it, store the DIV in an object called containers and use the url as the property name. This property gets a shortcut to the correct DIV as its value. All we need to do then is to get the transcript data via get():


var containers = {};
for(var i=0;div[i];i++){
if(div[i].id.indexOf(‘__ss’)!==-1){
var slideurl = div[i].getElementsByTagName(‘a’)[0].href.split(‘?’)[0];
containers[slideurl]={c:div[i],id};
get(slideurl);
}

}

The get() method is your garden variety script node generation function that calls the YQL API and defines slideshareTranscripts.doit() as the callback parameter. This means that the script will generate a script node, point it to YQL, YQL then gets the transcript information from SlideShare, turns it into JSON and calls doit() with the transcript information as a parameter.


function get(url){
var url = ‘http://query.yahooapis.com/v1/public/yql?’ +
‘format=json&callback=slideshareTranscripts.doit&q=’ +
‘select%20strong,p%20from%20html%20where%20url%3D%22’ +
encodeURIComponent(slideurl) +
‘%22%20and%20xpath%3D%27%2F%2Fol%2Fli%27&’;
var s = document.createElement(‘script’);
s.src = url;
s.type = ‘text/javascript’;
document.getElementsByTagName(‘head’)[0].appendChild(s);
}

The job of doit() is to generate the right markup and inject it into the embed codes. We create an ordered list and loop over all the li elements. We create the fitting HTML code for each of the list items and add the content as text nodes (innerHTML would try to render markup used in the slides – not a good idea).

We then retrieve the URL from the uri property and add the list as a new child node to the container div stored previously in the container object.

nt
function doit(o){
var out = document.createElement(‘ol’);
var lis = o.query.results.li;
for(var i=0,j=lis.length;i var li = document.createElement(‘li’);
var strong = document.createElement(‘strong’);
var p = document.createElement(‘p’);
strong.appendChild(document.createTextNode(lis[i].strong));
p.appendChild(document.createTextNode(lis[i].p));
li.appendChild(strong);
li.appendChild(p);
out.appendChild(li);
}

var url = decodeURIComponent(o.query.uri).split(‘”’);
containers[url[1]].c.appendChild(out);
}

return{doit:doit}
}();

That’s it. Of course you can simplify this by matching only the OL and using innerHTML to write out the list in one go, but this solutions also allows you to alter the HTML output of the list to – for example – make every http://example.com a link.

Happy transcribing!

Another amazingly useful web site: http://ismycomputeron.com

Friday, January 9th, 2009

Sometimes you come across web services that are so amazingly useful, you wonder why nobody has done it before. One of those is Is my computer on? sent to me this morning by Tomas Caspers.

While the usefulness of the service is indisputable the lack of RSS feed or API is actually annoying (let’s not discuss the HTML quality of the site, I am sure this is because of performance reasons as it is the case with other big players). Likeminded web dwellers like Dion Almaer bemoaned the same fact which is why I’ve taken matters into my own hand and used YQL to turn this service into a JSON API:


http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22http%3A%2F%2Fismycomputeron.com%22%20and%20xpath%3D’%2F%2Fcenter%2Ffont%2Ftext()’&format=json&callback=isiton

This version wraps the resulting data (in my case “yes”) in a JSON object and calls the isiton() method. You can try it out for yourself.

If you want to change this simply rename isiton at the end of the url to your function name of choice. If you use alert() as the function name you could even turn this into a useful bookmarklet.

Of course you should never forget to support the library followers if you have a system like that and Mattias Hising came quickly to the rescue and built the system as a jQuery plugin.

Detecting and displaying the information of a logged-in twitter user

Monday, January 5th, 2009

Wouldn’t it be cool (and somehow creepy) to greet your visitors by their twitter name, and maybe ask them to tweet a post? It can be really easily done.

Check it out yourself: Hello Twitter Demo
Update: this is not working any longer. Twitter have discontinued this functionality because of the phishing opportunities it posed.

This page should show you your avatar, name, location and latest update when you are logged into twitter. If nothing show up you either are not logged in or already exceeded your API limit for the hour (if you have twhirl running, like me, that can happen fast)

This is actually very easy to do as a logged-in twitter user can be detected with a simple API call in a script node:


http://twitter.com/statuses/user_timeline.json?count=1&callback=yourcallback

All you need to do is provide a callback function that gets the data provided by the API and get the right information out. The demo does this by assembling a string:





Trying to think of a cool use for this that is not spooky :)