Christian Heilmann

Posts Tagged ‘api’

TTMMHTM:Macs, IE6, hacker spaces, pixar vs. dreamworks and travel nightmares

Thursday, April 2nd, 2009

Things that made me happy this morning:

Let’s start with filthy anti-propagandaa cool comic about macs:

Man gets electrocuted by his computer but fails to acknowledge it as it is a mac

News Mixer – my first attempt at using the Guardian’s open platform content API

Tuesday, March 10th, 2009

I am a very happy bunny at the moment. First of all because there is more yummy data on the web to play with as The Guardian just released a brand new API to access their archives and secondly as I was invited to play with it before it was public. The announce of the API was today and I’ve spent a few hours yesterday in my hotel room before checking out to build news mixer

News Mixer - web news and images enhanced by Guardian content

The API is simple enough to use and once you got your developer key you can search for content and request the more detailed data using a content ID. The next problem to tackle was what to build.

Access of data and tags is easy

I love that we turned the web from yet another information channel into a read/write web and that user generated content allows us to get information from everybody and not just from dedicated journalists. I also love that you can tag information and make it easier to find that way. Lastly I love that with products like BOSS you can now get access to information of search engines and use that in your own sites.

Relevancy of tags?

The tagging bit has me a bit annoyed though. While a few years ago when the idea was still fresh people tagged like mad and with high quality keywords this seemed to be on the decline a bit and as faster connections allow us to upload more and more data in bulk people stopped tagging sensibly and rely more on automated tags like geolocation or exif data in images.

Mixing user tags and professional categories

I wanted to show a news site that allows you to find keywords that match your search term that make sense and used two different APIs for that. BOSS allows you to search for news items and images and the BOSS web search also offers keyterms for certain web sites. These keyterms are to a degree user generated as this is what people entered into Yahoo to find the sites. I then used the new Guardian Data API to pull relevant articles and as these are professionally tagged by journalists this makes for more relevant keywords. Putting the two together means a good mix of professional and up-to-date information.

The outcome is News Mixer and you can download the source code to play with it yourself.

It was amazingly straight forward to build, the only snags I hit were the following:

  • Whilst BOSS provides keyterms for web searches, it does not do so for news searches. Therefore I used YQL to get the keyterms of each of the urls returned by news search in a nested loop. This is a bit hacky and I would love for that to change.
  • The Guardian API returns articles by relevancy and not by date. You can specify though that you want articles before or after a certain date, which is why all I had to do is get the current date and go back one month from that.
  • The content body of the Guardian API does not provide any paragraph or list information. This is very annoying as it results in terrible display (a massive chunk of text). I’ve worked around the issue by splitting the content at full stops and then injecting paragraphs after every third of them but that is just guesswork and not real structure of text.

In any case I am happy to have such a cool new archive of information to play with and we’re working on open table definitions for YQL to make it easy for you to get to the goodies the Guardian offers us.

Does API rate limiting spell the end of progressive enhancement?

Sunday, January 25th, 2009

Building TweetEffect taught me a few lessons and also pointed out some annoyances when building with third party APIs. Above all, I had to re-think and violate some of the best practices I’ve been advocating for years now.

First of all, TweetEffect was meant to be a demo for a university hack day and I didn’t quite plan for it to be a big success. Therefore I cobbled it together rather than planning the whole thing. What I wanted to build was a small tool that shows me my latest Twitter updates and analyze the changes in follower numbers. I then mapped those to the updates that happened before the change to show which ones might have been the cause.

The TweetEffect wishlist

I’ve had a few things I wanted to avoid:

  • Users shouldn’t have to give me their Twitter login data – this is just wrong, no matter how you put it
  • I didn’t want to cache any data on my server, for the same reason and to avoid my DB getting hammered (this blog runs on the same one :-))
  • I wanted end users to be able to use the site or simply get the results with a widget and subsequently with an API.

The PHP solution

Now, the normal way I would go on about building a solution like TweetEffect is to build it in PHP and then enhance it with JavaScript. This means it will work for everybody – including me on my BlackBerry – and I have PHP at my disposal, which is much richer than JavaScript when it comes to XML conversion or even array handling.

The normal way of dealing with it would be something like this:


include(‘./api.php’);
// the API sanitizes the user parameter, contacts the third party
// API and gives the data back in the right format, including the
// $user variable.
?>





if($user!==’‘){
// handling code…
}

?>

The problem I encountered with this even whilst developing is that if you call a third party API in your API you can quickly run against its limits and get blocked for an hour.

The only workaround is to cache the results locally – something I wanted to avoid for accuracy and the sanity of my server. Other services do caching for you (like gnip) but then you also run into the issue of data being outdated. During development it is a good idea to have a local flat data file stored to use – this will also cut down on your development time as you never have to wait for the third party servers.

Crowdsourcing API calls to avoid reaching the limit

Normally progressive enhancement in this case could be used to override the form submit event to show a slicker interface and do sorting of the data once it has been loaded without re-reading the page. This would cut down on the number of times you accessed the third party API.

However, if the API is more restrictive (like Twitter) but has a JSON output you can work around the issue by not calling the API server-side but instead create script nodes dynamically to get the data. That way you’re not the one requesting it but the computers of your users are doing it for you. Exceeding the API limit can only be done by your end users individually, not by all of them together. The obvious drawback is that users without JavaScript don’t get any results.

In the case of using dynamic script nodesthe api.php file still does the user entry sanitization, but instead of contacting the third party API and writing out the data directly, it writes out an HTML scaffolding and the necessary JavaScript files.


include(‘./api.php’);
// the API sanitizes user entries, contacts the third party
// API and gives the data back in the right format.
?>





if($user!==’‘){
echo $HTMLscaffolding;
echo $scripts;
}

?>

This, however is not progressive enhancement as it does not test if JavaScript is available – instead it simply expects it to work. We could work around that by adding a hidden form field that gets populated with JavaScript or simply by giving the submit button a name attribute when JavaScript is available.


include(‘./api.php’);
// the API sanitizes user entries, contacts the third party
// API and gives the data back in the right format.
?>






if($user!==’‘){
if($js!==’‘){
echo $HTMLscaffolding;
echo $scripts;
} else {
// handling code
}

}
?>

In any case, the solution will never be proper progressive enhancement as you will have to maintain two versions: the one that builds the resulting interface in JavaScript, and another one that does it server-side. The server side solution will most likely keel over sooner or later and you cannot offer a simple URL interface like app.php?user=user_name as this will always lead to the server side solution instead of the JavaScript one.

Submission method switching

The way around that is to change the method of the form when JavaScript is available. Initially you set the form to POST and you change it to GET if JavaScript is turned on. You can then check in the API for POST or GET submission and react accordingly:

  • If there is a GET parameter use the JavaScript solution
  • If POST was used then the form was submitted without JavaScript and you offer the server-side solution.

This means that people without JavaScript cannot use the REST API of your application, but still can enter the data in the form and send this one off. You will hit the rate limit in this case sooner or later, but seeing that most users will have JavaScript available it is quite a safe bet that it’ll be a rare occasion.


include(‘./api.php’);
?>






if($user!==’‘){
if($js){
echo $htmlScaffolding;
echo $scripts;
}

if(!$js){
// server side solution
}

}

?>

You can see the result in the demo and download the demo files as a zip. Try the demo (any user name works, this is a hard-coded API, not live Twitter data) with and without JavaScript to see the difference.

Summary

All in all strict rate-limiting is a real pain for web application developers (or hackers for that matter). The reasons are of course obvious, and this workaround does the job for now. It is however not quite right and does make it harder for users without JavaScript. The other issue of course is that the security aspect of using JSON in generated script nodes without validation can become a problem.

In the end it boils down to what your API should be used for and to maintain a good communication with your API users. If your product by definition is meant for short-term-high-traffic viral solutions then the ball is in your court to keep it scalable.

Detecting and displaying the information of a logged-in twitter user

Monday, January 5th, 2009

Wouldn’t it be cool (and somehow creepy) to greet your visitors by their twitter name, and maybe ask them to tweet a post? It can be really easily done.

Check it out yourself: Hello Twitter Demo
Update: this is not working any longer. Twitter have discontinued this functionality because of the phishing opportunities it posed.

This page should show you your avatar, name, location and latest update when you are logged into twitter. If nothing show up you either are not logged in or already exceeded your API limit for the hour (if you have twhirl running, like me, that can happen fast)

This is actually very easy to do as a logged-in twitter user can be detected with a simple API call in a script node:


http://twitter.com/statuses/user_timeline.json?count=1&callback=yourcallback

All you need to do is provide a callback function that gets the data provided by the API and get the right information out. The demo does this by assembling a string:





Trying to think of a cool use for this that is not spooky :)

Small change in the flickr API output breaking my bad code

Wednesday, December 31st, 2008

During my absence I got an email that the first version of my unobtrusive Flickr badge has stopped working. The reason was that Flickr changed the output of the API. I the old version the JSON object contained an HTML description that was run through htmlspecialchars() first. I always considered this a bit of a nuisance and wondered about the reason. Now the Flickr feed does not have the encoding any longer as you can see for example in the JSON output of my latest photos:


{

“title”: “Terminal 5 animated discs thingamabob *”,
“link”: “http://www.flickr.com/photos/codepo8/3141566868/”,
“media”: {m“},
“date_taken”: “2008-12-27T08:10:35-08:00”,
“description”: “

codepo8 posted a video:

Terminal 5 animated discs thingamabob *

* technical term, really

“,
“published”: “2008-12-27T16:10:35Z”,
“author”: “nobody@flickr.com (codepo8)”,
“author_id”: “11414938@N00”,
“tags”: “london art heathrow awesome animation t5 ba britishairways terminal5”
},

Before the change this would have been:


{

“title”: “Terminal 5 animated discs thingamabob *”,
“link”: “http://www.flickr.com/photos/codepo8/3141566868/”,
“media”: {m“},
“date_taken”: “2008-12-27T08:10:35-08:00”,
“description”: “

codepo8 posted a video:

"Terminal

* technical term, really

“,
“published”: “2008-12-27T16:10:35Z”,
“author”: “nobody@flickr.com (codepo8)”,
“author_id”: “11414938@N00”,
“tags”: “london art heathrow awesome animation t5 ba britishairways terminal5”
},

This broke my code as I was relying on regular expressions to get the photos. Taught me a lesson, and as we have a media property now there is no need for this any longer. The fix for the badge code was:


// buggy, bad
temp=obj.items[i].description.match(/src="([^&]*)"/)[1];”
// works :)
temp=obj.items[i].media.m;

Lesson learned: Don’t trust regular expressions and data scraping.

Thanks to Jennifer Forman Orth for flagging this up!