Christian Heilmann

Posts Tagged ‘research’

A research interface for the social web – fork it now and find what people are talking about

Wednesday, September 22nd, 2010

Researching something on the web can be pretty annoying. Search engines get better every year, but there is a whole world of social sites that are not indexed. For example if I search for a nice photo of a red panda I use Google image search. If I want to use this photo later on I am better off using Flickr or Picasa and see what license the photo is.

Yahoo’s researchers had the same problem which is why they assembled all the social updates in one XML feed – the Yahoo! Firehose. This, in contrast to other Yahoo APIs also comes with commercial terms and conditions and is available through YQL. In terms of data, the Firehose aggregates a lot of different sources:

Yahoo! 360, AOL, Bebo, Blogger, Bloglines, Digg, Diigo, Goodreads, Google, Google Reader,, Ma.gnolia, Movable Type, Netflix, Pandora, Picasa, Pownce, Seesmic, Slideshare, SmugMug, StumbleUpon, ThisNext, TravelPod, Tumblr, Twitter, TypePad, Vimeo, Vox, Webshots, Xanga, Yelp, YouTube, Zooomr, Yahoo! Avatars, Yahoo! Buzz, Yahoo! Profiles, Wisteria, Yahoo! Answers, Yahoo! Shopping, Yahoo! Autos, Bix for Yahoo!, Yahoo! Bookmarks, Yahoo! Briefcase, Yahoo! Calendar, Yahoo! Classifieds, Delicious, Yahoo! Family, Yahoo! Sports, Yahoo! Finance, Flickr, Yahoo! Food, Yahoo! Games, Yahoo! Geocities, Yahoo! Green, Yahoo! Greetings, Yahoo! Groups, Yahoo! Health, Yahoo! Hotjobs, Yahoo! Kids, Yahoo! Local, Yahoo! Movies, Yahoo! Music, MyBlogLog, Yahoo! News, OMG! from Yahoo!, Yahoo! Personals, Yahoo! Pets, Yahoo! Status Updates, Yahoo! Guestbook Comments, SearchMonkey from Yahoo!, Yahoo! Shopping, Yahoo! Sports, Yahoo! Tech, Yahoo! Travel, Yahoo! TV, Yahoo! Video.

You can do the data junkie part and use it in the YQL console:

This can be annoying though, especially as you cannot see the photos and videos. This is why I put together a research interface on top of the Yahoo Firehose:

You can see the research interface in action here but more importantly, the source code of the interface is available on GitHub which means that you can host it yourself – for example behind a firewall or make it part of your Intranet.

For a local install you need to sign up for a developer key, edit the keys.php file, put all the files up on your PHP enabled server and you are done. If you get stuck you can get help on the YDN Forums.

Notice that I am keeping the state of your last search by storing it in local storage when your browser supports it – this can be useful for larger searches.

Analysing the history of Winter Olympics medals with YQL

Thursday, February 18th, 2010

I am a big fan of the Guardian Data Blog which releases all kind of cool datasets used in their research for people to mash up. One of the recent data sets was the Statistics of Winter Olympics medals over the years.

I’ve taken the Excel sheet and exported it as a CSV. Then I created a YQL open table to make it easier to use and filter the information.

Using this table you can now get the statistics of the Winter Olympics in terms of medals from 1924 up to 2006 (the 2010 data is of course not in there yet). Try it out for yourself:

Get all medal information:

use “” as medals;
select * from medals

see it as XML or in the YQL console

UK Gold medals (small dataset):

use “” as medals;
select * from medals where country=”gbr” and type=”gold”

see it as XML or in the YQL console

All Skating medals in the Speed Skating discipline won by men

use “” as medals;
select * from medals where sport=”skating”
and discipline=”speed skating”
and gender=”m”

see it as XML or in the YQL console

All US, Canadian and French Medals in the Games before 2000

use “” as medals;
select * from medals where country in
(‘usa’,’can’,’fra’) and year=”19”

see it as XML or in the YQL console

All the things you can filter with

Basically you can get the medal information and you can filter your research with the following criteria:

  • year – the year of the olympics
  • city – the city it was held in, like “Lillehammer” or “Sarajevo”
  • sport – the sport
  • discipline – the sub-discipline
  • country – the country as an NOC code
  • event – the event, like “alpine combined” or “two-man”
  • gender – the athlete’s gender, X is for pair sports
  • type – the medal type (Gold, Silver, Bronze)

Any of these could also be used as a wildcard, so country="g" would find GBR, FRG, GDR, YUG and GER.

Have fun!