Christian Heilmann

Author Archive

News Mixer – my first attempt at using the Guardian’s open platform content API

Tuesday, March 10th, 2009

I am a very happy bunny at the moment. First of all because there is more yummy data on the web to play with as The Guardian just released a brand new API to access their archives and secondly as I was invited to play with it before it was public. The announce of the API was today and I’ve spent a few hours yesterday in my hotel room before checking out to build news mixer

News Mixer - web news and images enhanced by Guardian content

The API is simple enough to use and once you got your developer key you can search for content and request the more detailed data using a content ID. The next problem to tackle was what to build.

Access of data and tags is easy

I love that we turned the web from yet another information channel into a read/write web and that user generated content allows us to get information from everybody and not just from dedicated journalists. I also love that you can tag information and make it easier to find that way. Lastly I love that with products like BOSS you can now get access to information of search engines and use that in your own sites.

Relevancy of tags?

The tagging bit has me a bit annoyed though. While a few years ago when the idea was still fresh people tagged like mad and with high quality keywords this seemed to be on the decline a bit and as faster connections allow us to upload more and more data in bulk people stopped tagging sensibly and rely more on automated tags like geolocation or exif data in images.

Mixing user tags and professional categories

I wanted to show a news site that allows you to find keywords that match your search term that make sense and used two different APIs for that. BOSS allows you to search for news items and images and the BOSS web search also offers keyterms for certain web sites. These keyterms are to a degree user generated as this is what people entered into Yahoo to find the sites. I then used the new Guardian Data API to pull relevant articles and as these are professionally tagged by journalists this makes for more relevant keywords. Putting the two together means a good mix of professional and up-to-date information.

The outcome is News Mixer and you can download the source code to play with it yourself.

It was amazingly straight forward to build, the only snags I hit were the following:

  • Whilst BOSS provides keyterms for web searches, it does not do so for news searches. Therefore I used YQL to get the keyterms of each of the urls returned by news search in a nested loop. This is a bit hacky and I would love for that to change.
  • The Guardian API returns articles by relevancy and not by date. You can specify though that you want articles before or after a certain date, which is why all I had to do is get the current date and go back one month from that.
  • The content body of the Guardian API does not provide any paragraph or list information. This is very annoying as it results in terrible display (a massive chunk of text). I’ve worked around the issue by splitting the content at full stops and then injecting paragraphs after every third of them but that is just guesswork and not real structure of text.

In any case I am happy to have such a cool new archive of information to play with and we’re working on open table definitions for YQL to make it easy for you to get to the goodies the Guardian offers us.

TTMMHTM: Dazzle audiences, love audiences, cool data from the guardian and Opera WSC

Friday, March 6th, 2009

Things that made me happy this morning

Dasher is an information-efficient text-entry interface, driven by natural continuous pointing gestures. Dasher is a competitive text-entry system wherever a full-size keyboard cannot be used – for example, when operating a computer one-handed, by joystick, touchscreen, trackball, or mouse; when operating a computer with zero hands (i.e., by head-mouse or by eyetracker); on a palmtop computer; on a wearable computer.
The eyetracking version of Dasher allows an experienced user to write text as fast as normal handwriting – 29 words per minute; using a mouse, experienced users can write at 39 words per minute.
Dasher can be used to write efficiently in any language.

TTMMHTM: IP to geo location, ARIA talks and urban camouflage

Thursday, March 5th, 2009

Things that made me happy this morning

Mozilla Bespin meetup in London next week

Wednesday, March 4th, 2009

To my shame I have to say that I hadn’t had time to give Mozilla’s code editor in the cloud Bespin much of an, err, spin yet.

Bespin is built by fellow Ajaxians Dion Almaer and Ben Galbraith and uses all fancy new JavaScript and Canvas tricks to make editing files in a browser environment run as smoothly as possible. Fans of TextMate should have a look.
Both will come to London on Tuesday, the 10th of March to give a detailed talk about Bespin and what else is brewing in the Mozilla Developer Tools Group. The venue is not quite clear yet but I heard it’ll be in North London. To join the fun, just RSVP for the event at upcoming.

I’ll be freshly back from the US so I will be tired while I am there, but it still should be good fun.

Introduction to hacking at Georgia Tech

Wednesday, March 4th, 2009

I’m right now in Atlanta, Georgia and Georgia Tech for University Hack Day. Yesterday night I kicked off the one week event as my colleagues from California were delayed because of the snow situation in the US.

The presentation covers the history of hack in Yahoo, what makes a good and interesting hack and goes into explaining some of the technologies that people can use.

[slideshare id=1097476&doc=1097476]

Some people complained that the PDF is not readable on a PC, so here’s a powerpoint version of the same talk.