opendata | Christian Heilmann

Posts Tagged ‘opendata’

Participating in the Web of Data with Open Standards

Wednesday, March 17th, 2010

These are the detail notes for my talk at the mix10 conference in Las Vegas. The description of my talk was the following:

Web development as we do it right now is on the way out. The future of the web is what its founders have planned a long time ago: loosely joined pieces of information for you to pick and choose and put together in interfaces catered to your end users. In this session, see how to build a web portfolio that is always up-to-date, maintained by using the web rather than learning a bespoke interface and high in performance as the data is pulled and cached for you by a high traffic server farm rather than your server. If you wondered how you can leave your footprint on the web without spending thousands on advertising and development, here are some answers.

The slides

Participating in the Web of Data

View more presentations from Christian Heilmann.

The detailed notes / transcript

Welcome to the web

When I started using the web I was working for a radio station as a newscaster and producer. I’ve always dabbled with computers and connecting them world-wide and I was simply and utterly amazed at how easy it was to find information. This was 1996 and I convinced my boss at the radio station that it is of utmost importance that I get an internet access to be the first in our town to get the news from Associated Press and other very important sources. In reality, I just fell in love with the web and its possibilities.

Working as a professional web developer

I quit my job soon after that and built my first web sites. Together with some friends we maintained a banner exchange and I was admin for a few mailing lists, IRC and built a lot of really bad web sites which – back then – were the bee’s knees.

I went through the first .com boom living in a hotel in Los Angeles with all expenses paid for writing HTML.

Joining the Enterprise crew

I then worked for an agency and delivered lots of products with enterprise level content management systems. Massive systems intended to replace the inadequate systems of the past.

This is when things went wrong

For a few years I released things that were incredibly expensive and meant that the people using them had to be sent on £3K+ trainings just to be able to do the things they already did before that – only much less effective. Or did they?

Joining the corp environment (to a degree)

Again, this was a very interesting step as it meant I moved away from the large world of delivering for clients to concentrating on one single company – one of the companies that defined the world wide web as it is now and also a large player in a newly emerging world.

Hey Ho, Web 2.0 highfive

The web 2.0 came around and run rampant in the media and in the mind of early adopters. User Generated Content sounded like an awesome scam to make millions of dollars with content that comes for free – all you need to do is set up the infrastructure and find the right people.

And it went wrong again.

The main purpose of the first round of web2.0 was to be as visible as possible – doesn’t matter if all the content added by your users is really just “first” and “you’re a fag” comments – as long as the user numbers were great you won.

And now?

Makes you wonder what comes now, doesn’t it?

The web as the platform, the Mobile Web and GeoLocation

This is where we are. We can use the web as our platform, we work with virtual servers, hosted services and are happy to mix and match different services to get our UGC fix. What is missing though is the glue.

Market changes leave a track

The thing we keep forgetting though are the users. All the changes we’ve gone through bred generations of users that are happy to use what they learnt to do in their job but are not happy having to re-learn basic chores over and over again.

Time to shift down a gear

We have the infrastructure in place, we can already make this work. I get a real feeling that we are not innovating but moving the web to a mainstream media. Streaming TV without commenting, massive successes like Farmville and MafiaWars makes me realize that the web is becoming ubiquitous but that it is also boring for me as someone who wants to move it further. In order to accelerate with a car you need to shift down a gear (or kick down the accelerator with an automatic). This is the time to do so.

Finding the common denominator

What is the common denominator that drove all the innovation and movement in the past? Data. Information becoming readily available and interesting by mixing it with other information sources to find the story in the data.

Tapping into the world of data

Data is around us, it is just not well structured. People cannot even be bothered to write semantic HTML and it is tough to teach editors to enter tags, alternative text for images and video descriptions. The reason is that we lack success stories and learning our CMS takes too long. Instead teaching people how to use systems we should teach people how to structure information and simplify the indexing process rather than empowering to make things pretty and shiny – this is what other experts do better.

Why APIs work

APIs or Application Programming Interfaces are the best thing you can think of if you want to build something right now. If you remove the information from the interface – both in terms of data entry and when it comes to consumption we can build a web that works for everybody. By making information searchable and allowing filtering out-of-the-box you can scale to any size or cut down to the utmost necessary.

APIs made easy

The Yahoo Query language or short YQL is a very simple language that allows you to use the web like you would use a database. In its simplest form a YQL query looks like this:

select {what} from {where} where {conditions}

Say for example you want to find photos in Flickr with the keyword donkey. The YQL statement for this is the following:

select * from flickr.photos.search

where text=”donkey” and license=4

The license=4 means that the photos you are retrieving and displaying are licensed with Creative Commons, which means that you are allowed to show them on your page. This is very important as Flickr users are happy to show photos but not all are happy for you to use their photos in your product. Play nice and we all get more good data.

YQL is not limited to Yahoo APIs and data – anything on the web can become accessible with it. If you want to find a flower pot in the Bay Area you can use for example Craigslist with YQL:

select * from craigslist.search where

location=”sfbay” and type=”sss” and query=”flower pot”

If you want to get the latest news from Google about the topic of healthcare, you can use:

select * from google.news where q=”healthcare”

If you want to collate different APIs and get one data set back, you can use the query.multi table. The following for example searches the New York Times archive, Microsoft Bing News and Google News for the term healthcare and returns one collated set of results:

select * from query.multi where queries in (

‘select * from nyt.article.search where query=”healthcare”’,

‘select * from microsoft.bing.news where query=”healthcare”’,

‘select * from google.news where q=”healthcare”’

)

YQL even allows you to get information when there is no API available. For example in order to get the text of all the latest headlines from Fox News you can use YQL’s HTML table:

select content from html where

url=”http://www.foxnews.com/” and xpath=”//h2/a”

This goes to the foxnews.com home page, retrieves the HTML, runs it through the W3C HTML Tidy tool to clean up broken HTML and filters the information down to the text content of all the links inside headings of the second level. This is done via the xpath statement //h2/a which means all links inside h2 elements and you retrieve the content instead of everything with the *.

You can then do more with this content – for example using the Google translation API to translate the headlines into French:

select * from google.translate where q in (

select content from html where url=”http://www.foxnews.com/”

and xpath=”//h2/a”

) and target=”fr”

As you can see it is pretty easy to mix and match different APIs to reach your goal – all in the same language.

YQL goes further than just reading data though. If you have an API that allows for writing to it, you can do insert, update and delete commands much like you can with a database. You can for example post a blog post on a WordPress blog with the following command:

insert into wordpress.post

(title, description, blogurl, username, password)

values (“Test Title”, “This is a test body”, “http://yqltest.wordpress.com”, “yqltest”, “password”)

The insert, update and delete tables require you to use HTTPS to send the information, which means that although your name and password are perfectly readable here they won’t leak out to the public – unless you are using them inside JavaScript where they would be readable.

The YQL endpoint

As YQL is a web service, it needs an endpoint on the web. For tables that don’t need any authentication this end point is the following URL:

http://query.yahooapis.com/v1/public/yql?

q={query}

&format=xml|json

&callback={callbackfunction}

You can get the information back as XML or as JSON format. If you choose JSON then you can use the information in JavaScript and by providing a callback function even without any server interaction.

Benefits of using YQL

YQL is a way to use the web as a database. Instead of using your time reading up on different APIs, requesting access and learning how they work you can simply access and mix the data and you get it back ready to use a few minutes later. YQL does all this for you:

No time wasted reading API docs – every YQL table comes with a desc command that tells you what parameters are expected and what data will come back for you to use.
Creating complex queries with the console. – the YQL console allows you to play with YQL and quickly click together complex queries in an easy to use interface. It also previews the information directly to you so you can see what comes back and in which format.
Filter data before use – YQL allows you to select all the data in a certain query using the * or specifically define what you want – down to a very granular level. In very limited application environments like mobile devices this can be a real benefit. It also means that you don’t need to spend a lot of time converting information to something useful after you requested it but even before you send it to the interface layer.
Fast pipes – YQL is hosted on a distributed server network that is very well connected to the internet. Chances are that the server is fast to reach and a few times faster than your own server when it comes to accessing API servers from all over the world.
Caching + converting – YQL by default gives out XML or JSON which can be useful to convert any data that is on the web in another format to these highly versatile and open data formats. YQL also has an in-built caching system that only gives you new data when it is available and returns it very fast if all you need to do is request in another time.
Server-side JavaScript – if the out-of-the-box filtering, sorting and converting methods are not enough for you you can use YQL execute tables to run the returned data through JavaScript before YQL gives it back to you. This allows for all the conversion and extension power JavaScript comes with in a safe and powerful environment as we use Rhino to run it server-side.

All in all YQL allows you to really use and access data without the hassle of resorting to XSLT, Regular Expressions, scraping and other tried-and-true but also complex ways of getting things web-ready.

Government as a trailblazer?

One thing that gets me very excited lately is governments throwing out information for us to use. Data that has been gathered with our tax money is now available for experts and laymen to play with it, look at it and see where the interesting parts are.

Conjuring APIs out of thin air

The problem is that the government data is not available for us as APIs – instead you will find it entered into Excel spreadsheets and in all kind of other data formats coming out of – yes – Microsoft products. We could now bitch about this and claim this is old school or do something about it. So how can we do this?

A few weeks ago I build http://winterolympicsmedals.com – an interface to research the history of the Medals won in the Winter Olympics from 1924 up to now. There is no API for that and this data is also not available anywhere.

What happened was that the UK newspaper the Guardian released a dataset of this information on their data blog (this is a free service the newspaper provides). The data was provided as an Excel sheet and all I did was upload it to Google Docs. I then selected “Share” and “export as CSV” which gives me the following URL:

http://spreadsheets.google.com/pub?key=tpWDkIZMZleQaREf493v1Jw&output=csv

Now, using YQL with this we have a web service:

select * from csv where url=”http://spreadsheets.google.com/pub?key=tpWDkIZMZleQaREf493v1Jw&output=csv” and columns=”Year,City,Sport,Discipline,Country,Event, Gender,Type”

By giving the CSV some columns we can now filter by them, for example to get all the Silver medals of the 1924 games we can use:

select * from csv where url=”http://spreadsheets.google.com/pub?key=tpWDkIZMZleQaREf493v1Jw&output=csv” and columns=”Year,City,Sport,Discipline,Country,Event, Gender,Type” and Year=”1924” and Medals=”Silver”

Spreadsheet to web services made easy!

This makes it easy for us to turn any spreadsheet into a web service. With Google docs as the cloud storage and YQL as the interface and doing the caching and limiting for us it is not that hard to do.

In order to make the creation of a search form and results table easier, I built a PHP script that takes this job on:


include(‘csvtoservice.php’);

$content = csvtoservice(‘http://winterolympicsmedals.com/medals.csv’);

if($content){

	if($content[‘form’]){

echo ‘
Filters
‘;

echo $content[‘form’];

}

	if($content[‘table’]){

echo ‘
Results
‘;

echo $content[‘table’];

}

	}

?>

Some examples

Let’s quickly go through some examples that show you the power of YQL and using already existing free, open systems to build a web interface.

GooHooBi is a search interface that allows you to search the web simultaneously using Google, Yahoo and Bing. You can see how it was done in a live screencast on Vimeo.
UK House Prices is a mashup I build for the release of the UK government open data site. It allows you to compare the house prices in England and Wales from 1996 up to now and see where it makes sense to buy a place and how stable the prices are in that area.
GeoMaker is a

In summary

We have the network and we have the technology.
We have people who work effectively with the tools they use.
We have a new generation coming who naturally use the internet and are happy with our web interfaces.
If we use our efforts 50/50 on new and building APIs and converters to get the data of the old the web will rock.

Homework

As a homework I want people to have a look at the open tables for YQL on GitHub
and see what is missing. I’d especially invite people to add their APIs to the tables using the open table documentation. We want your data to allow people to play with it.

Learn more

If you want to learn more about building sites like the ones I showed with this approach, there is a Video of me talking you through the building of UK-House-Prices.com available on the YUI blog.

Tags: apis, lasvegas, mix10, opendata, webofdata, yql, YUI
Posted in General | 8 Comments »

TTMMHTM – BBC Web animals, two very cool APIs and there’s something about the LG logo

Tuesday, February 23rd, 2010

Things that made me happy this morning:

WinterOlympicsMedals.com is live – I was busy this weekend creating this search interface from the open data provided by the Guardian.
The LG logo has a hidden meaning that is full of win
The BBC have a great little test analyzing your web behaviour. Apparently I am a web ostrich, but I also buggered the game with the chocolate bars as I was tweeting instead of reading the instructions properly.
On March the First, there will be a funeral procession for IE6 in Mountain View, CA.
Missing kids map and Missing Adults Map are both helping to find people who have been reported missing. Both have an API - if you call read.php with the state name you get the information as XML - for example missing kids in California
Map Compare allows you to see different maps next to each other to compare their quality
ProForma is a paper to create 3D models rapidly from a video recording
Unicons is a bookmarklet to add UTF-8 characters depicting images into any text field.
If you are building a racing site there is an awesome free Formula One API available and its design puts the commercial ones to shame.

Tags: 3dmodelling, api, bbc, comparing, formula1, Google, guardian, ie6, ie6funeral, lg, logo, maps, medals, missingkids, msie, opendata, osm, pacman, search, socialmedia, survey, test, utf8, winterolympics
Posted in General | 3 Comments »

TTMMHTM:How web sites work, how many people see them, open data and lots of accessibility and “girly” stuff

Sunday, January 24th, 2010

I am right now sitting at Heathrow Terminal 5 in London on my way outbound to a two week stint in the Silicon Valley (Sunnyvale/Mountain View) to meet with the US team. And here are the things that made me happy this morning:

Ian Pouncey continues to put together good thought-pieces on accessibility with Web Accessibility Myths.
Robert O’Callahan talks Video, Freedom and Mozilla and debunks some H264 myths.
In case you feel like beeping, there is a cool online Morse Code translator
The back side of a Mozilla business card finally reveals the truth about how websites work – I knew that there were monkeys and gerbils involved!
The Royal Pingdom has a good roundup of The Internet 2009 in numbers – guess where most of the Internet users are from?
Vimeo has a new HTML5 video player and you can kick its tyres.
Liz Danzico wrote a wonderful piece called confidence for good on how making a leap of faith in your career is sometimes very necessary.
UK House Prices, my entry to the data.gov.uk application showcase went down really well on the day of the release and got about 5000 hits on the first day. Now to pester the UK government to do more YQL and expect less SPARQL.
Constructing a POUR web site has some very solid advice on how to build a good web site. I am tired of people shoe-horning new acronyms though. POSH was a bad idea, so was HIJAX. Not because of what they stand for but for giving a way out. Hearing sentences like “yeah the solution is bad now, but we will go for a POSH one soon” or “this will be redesign with HIJAX in mind at a later stage” make me want to scream – an I heard both several times.
We have a winner of most inappropriate prank in a job ad – hands-down.
ReadWriteWeb have a good open thread on Sexy Girls, Smart Women and Tech (and I am already looking forward to the referrers linking to this link)
How programmers of different languages view each other and How a common LISP programmer views users of other languages both have poster potential for the office.

Tags: 2009, accessibility, ad, coding, confidence, datagovuk, geeks, h264, html5, internet, job, lisp, ogg, opendata, pour, prank, sexygirls, smartwomen, statistics, ukhouseprices, video, wcag2
Posted in General | Comments Off on TTMMHTM:How web sites work, how many people see them, open data and lots of accessibility and “girly” stuff

How I build my data.gov.uk mashup – UK-House-Prices.com

Thursday, January 21st, 2010

UK-House-Prices.com is a web site to see how the prices in a certain area changed over the years using a data set released by the UK government as part of the data.gov.uk initiative.

Here’s a screencast showing the app:

The first step was to get the right data. I was lucky enough to be invited to the initial “hack day” and pre-release of the data and looked around for something to mash up. Initially I wanted to do something with environmental data but I found a lot of it to be very old. Therefore I just did a search for “2009” at data.gov.uk and found that the house prices data from 1996 to now in England and Wales is available. The plan was set. This was it:

I wanted to build an interface to show this information that was very fast, very portable and show a nice map of the area next to the numbers.
I wanted to build this as a web app and as an application for the Yahoo homepage (as I needed to build one as a demo anyways)
Traffic and speed was the most important issue – as this might get huge.

Cleaning and converting data

I got the spreadsheet and was confronted with my old nemesis: Excel. After saving the sheet as CSV and spending some fun time regular expressions and split() I had the data in a cleaner, and more usable version (JSON, specifically). One fun part is that when there was no data available for a certain area the field was either “..”, “n/a” or just empty. Something to work around. The numbers were also formatted like 100,312 which is nice on the eye but needs un-doing when you want to sort them outside Excel.

Once I had the list of locations and their numbers I wanted to turn them into geographical locations to display maps of the area. For this I used Yahoo Placemaker, especially the YQL table (see an example for Rugby in the YQL console). This is the script I ran over the list of locations:



$out = ‘’;

for($i=0;$i
$select = preg_replace(‘/,.*/’,’‘,$lines[$i]);

$select = preg_replace(‘/ UA/’,’‘,$select);

$url = ‘http://query.yahooapis.com/v1/public/yql?q=select%20match.place.woeId%2Cmatch.place.centroid%20from%20geo.placemaker%20where%20documentContent%20%3D%20%22’.urlencode($select.’,uk’).’%22%20AND%20documentType%3D%22text%2Fplain%22%20and%20appid%20%3D%20%22%22%20limit%201&format=json&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys’;

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, $url);

curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

$output = curl_exec($ch);

curl_close($ch);

$data = json_decode($output);

echo ‘{“place”:”’.$select.’”,’;

echo ‘”w”:”’.$data->query->results->matches->match->place->woeId.’”,’;

echo ‘”lat”:”’.$data->query->results->matches->match->place->centroid->latitude.’”,’;

echo ‘”lon”:”’.$data->query->results->matches->match->place->centroid->longitude.’”’.”},n”;

;
	}

That was that – I had a data set I can work with.

Adding more information

The next thing I wanted to add was some more information about the area which meant using maps. As both Yahoo and Google maps have static map versions but are rate limited I wondered if there is a free version of that. And there is. Openstreetmap was the answer, especially the somewhat unofficial API I found with Google. To play safe, I wrote a script that gets the images and I cache it on my server to avoid killing this API.

I also wanted to show currently available houses in the area in case you are looking to buy. For this the natural choice for me was to use Nestoria as they also have an open YQL table (see the Nestoria table in the YQL console). So I used YQL and sorted the results by date:

select * from nestoria.search where place_name="Rugby" | sort(field='updated_in_days')

Using this I can get offers in the area live:

$url = ‘http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20nestoria.search%20where%20place_name%3D%22’.urlencode($city).’%22%20|%20sort%28field%3D%27updated_in_days%27%29&format=json&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&diagnostics=false’;

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, $url);

curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

$output = curl_exec($ch);

curl_close($ch);

$data = json_decode($output);

if($data->query->results){

$i=0;

$results = array_slice($data->query->results->listings,0,5);

if(sizeof($results)>0){

echo ‘Current property listings (powered by Nestoria) 
‘;

foreach($results as $r){

echo ‘lister_url).’”>‘.($r->title).’‘;

echo ‘ Price: ‘.($r->price_formatted).’, Type of property: ‘.ucfirst($r->property_type).’, Updated: ‘.($r->updated_in_days_formatted).’ (‘.($r->updated_in_days).’ days)
‘;

echo ‘Listed at: ‘.($r->datasource_name).’ by ‘.($r->lister_name).’.
‘;

echo ‘
‘;

}
	echo ‘
‘;

}
	}

Finding a charting solution

Adding interactive charts was the next step. I had a few issues with that:

While Google charts are full of win, they are rate-limited and I didn’t want to pull images. As the app was also meant to become a Yahoo application every image would have to be run through Caja for safety reasons which slowed it down.
Canvas and Flash solutions like YUI charts or Raphael were also not possible because of the performance of the YAP app.

So I wrote my own pure CSS bar charts to work around that issue.

Building the API

I put all these solutions together and built a small API that will give me the search results with three parameters: the location as an id and the start and end of the time range.

http://uk-house-prices.com/graphs.php?loc=1&start=10&end=20

Building the interface

To build the interface, I went all-out YUI. I took the YUI grids builder to create the main layout, the AutoComplete demo, the dual slider demo and the button and put them all together. Add an Ajax call to the form, and you are done. OK, I admit, there was quite a bit of cleaning up to be done :)

Notice that I am using progressive enhancement all the way. Without JavaScript you get dropdowns:

That’s it

The next thing I had to do is move the app over to the Yahoo Application Platform which was easy as I based it on an API - but this is another blog post :)

Tags: datagov, datagovuk, england, government, houseprices, mashup, nestoria, opendata, openstreetmap, uk, ukhouseprices, yql, YUI
Posted in General | 6 Comments »