Christian Heilmann

Analyzing the FIFA2010 worldcup with Guardian Data and YQL

Wednesday, July 14th, 2010 at 10:20 am

Breaking news: The Guardian once again involved in committing a data awesome! As before, the UK newspaper graced developers with a really cool piece of information published on the web: all the World Cup 2010 statistics as an Excel Sheet.

Now, the easiest way to play with this data is to use YQL, so I simply took a copy of the information and shared it as a CSV document on Google Docs. That way I can use it in YQL:

select * from csv where url=”http://spreadsheets.google.com/pub?
key=0AhphLklK1Ve4dEdrWC1YcjVKN0ZRbTlHQUhaWXBKdGc&single=true&gid=1&x=1&
output=csv” and columns=”surname,team,position,time,shots,passes,tackles,saves”

You can Try this out in the console and see the results here.

Using YQL to filter and sort this, you can do some interesting searches on that information. For example:

What were the German mid field Players?

select * from csv where url=”http://spreadsheets.google.com/pub?
key=0AhphLklK1Ve4dEdrWC1YcjVKN0ZRbTlHQUhaWXBKdGc&single=true&gid=1&x=1&
output=csv” and columns=”surname,team,position,time,shots,passes,tackles,saves”
and team=”Germany” and position=”Midfielder”

You can Try this out in the console and see the results here.

Using sort() and reverse() you can do rankings. For example:

Who was the goalkeeper with the most saves?

(Neuner of Germany, Kingson of Ghana and Enyeama of Nigeria in case you wonder)

select * from csv where url=”http://spreadsheets.google.com/pub?
key=0AhphLklK1Ve4dEdrWC1YcjVKN0ZRbTlHQUhaWXBKdGc&single=true&gid=1&x=1&
output=csv” and
columns=”surname,team,position,time,shots,passes,tackles,saves”
and position=”Goalkeeper” | sort(field=”saves”) | reverse()

You can Try this out in the console and see the results here.

Which was the player that spent most time on the pitch?

select * from csv where url=”http://spreadsheets.google.com/pub?
key=0AhphLklK1Ve4dEdrWC1YcjVKN0ZRbTlHQUhaWXBKdGc&single=true&gid=1&
output=csv” and
columns=”surname,team,position,time,shots,passes,tackles,saves”
| sort(field=”time”) | reverse()

You can Try this out in the console and see the results here.

Who were the players who were the least on the pitch in the German and Brazilian teams?

select * from csv where url=”http://spreadsheets.google.com/pub?
key=0AhphLklK1Ve4dEdrWC1YcjVKN0ZRbTlHQUhaWXBKdGc&single=true&gid=1&x=1&
output=csv” and columns=”surname,team,position,time,shots,passes,tackles,saves”
and team in (“Germany”,”Brazil”) | sort(field=”time”)

You can Try this out in the console and see the results here.

Using the CSV output and YQL you can do all kind of cool things with that data – as YQL also releases it as JSON it makes it easy to create interactive interfaces and visualizations, too – why don’t you have a go?

Tags: , , , , ,

Share on Twitter