Yesterday I gave a talk at the Webmontag in Frankfurt, Germany about using APIs to build web sites from distributed data using YQL. Here are the slides of the talk followed by my notes.
Transcript / Notes
Fantastic voyage into the web of data
Web development has changed drastically in the last few years. Sadly enough not all of the options that are open to us are common practice though.
Developing the web vs. developing for the web
The main issue is that instead of using the web to develop our products we still develop products for the web. Instead of embracing the fact that there is no “offline” when it comes to web sites we still build products that keep all the content and media on one server and write mediocre solutions for people to deal with images, video content or outbound links whilst there are already great products in place that were built for exactly those use cases.
Instead of concentrating our energies on improving the content of the web – using proper textual structures, providing alternative content, adding semantic meaning and geospatial context and so on – we spend most of our days bickering on forums, mailing lists, blogs and really any other platform about the technologies that drive the web.
There are dozens of solutions on how to make rounded corners work with any old browser out there; we keep re-inventing new ways to use custom fonts on web sites yet the documentation on proper localisation and real accessibility of web products that benefit everybody are rare.
Decentralised Data
The biggest mistake in web development to me is building a single point of entry for our users and then hope that people will come. This is why we spend more time and money on SEO, link-building, newsletters and other ways of promoting our domain and brand instead of embracing the idea of the web.
The web is an interlinked structure of data – media, documents and URL endpoints. By spreading ourselves all over it we make our domain less important but we also weave ourselves into the fabric of the web.
There is a different approach to web development. About two years ago I wrote this book. In it I explained that you can build an easy to maintain and successful web site without needing to know much about programming. The lack of success of the book to me is related to the title which is far too complex. “Web Development Solutions: Ajax, APIs, Libraries, and Hosted Services Made Easy” originally was meant to be “No Bullshit Web Design”.
The trick that I explained in the book and want to re-iterate here is the following: instead of trying to bring the web to our site we are much better off bringing our site to the web.
The main core of the site should be a CMS and it doesn’t really matter which one. This could be as easy as a blogging system like WordPress or as complex as a full-blown enterprise level system like Vignette, Tridion or RedDot. The main feature should be that it is modular and that we can write our own extensions for it to retrieve data from the web.
The next step is to spread our content on the web:
The benefits of this approach are the following:
- The data is distributed over multiple servers – even if your own web site is offline (for example for maintenance) the data lives on
- You reach users and tap into communities that would never have ended up on your web site.
- You get tags and comments about your content from these sites. These can become keywords and guidelines for you to write very relevant copy on your main site in the future. You know what people want to hear about rather than guessing it.
- Comments on these sites also mean you start a channel of communication with users of the web that happens naturally instead of sending them to a complex contact form.
- You don’t need to worry about converting image or video materials into web formats – the sites that were built exactly for that purpose automatically do that for you.
- You allow other people to embed your content into their products and can thus piggy-back on their success and integrity.
If you want to more about this approach, check out the Developer Evangelism Handbook where I cover this in detail in the “Using the (social) web” chapter.
APIs are the key
The main key to this kind of development are Application Development Interfaces or for short APIs. Using APIs you get programmatic access to the content of the API provider. There are hundreds of APIs available for you and one site that lists them is programmable web.
Using an API can be as easy as opening an address like http://search.twitter.com/trends/current.json in a browser. In this case this will get you the currently trending topics on Twitter in JSON format.
API issues
Of course there are also problems with APIs. The main one is inconsistency. Each API has its own ways of authenticating, needs different input parameters and has different output formats and structures. That means that you have to spend a lot of time reading API documentation or – if that one doesn’t exist which happens a lot – trial and error development. The other big problem with APIs is that a lot of providers underestimate the performance the API needs and the amount of traffic it will have to deal with. Therefore you will find APIs being unavailable or constantly changing to work around traffic issues.
No need for rock stars
Whilst development using third party APIs used to be an exclusive skill set of experts this is very much over. Newer products and meta APIs allow everyone to simply put together a product using several APIs. There is no need any longer to call in a “rock star developer”
YQL - making it really easy
YQL is a meta API that allows you to mix, match, convert and filter API output in a very simple format:
select {what} from {where} where {condition(s)}
The easiest way to start playing with YQL is by using the console. This is a simulation of a call to the YQL web service and allows you to enter your query and define the output format (XML or JSON or JSON-P or JSON-PX). You then run the query and will see the results either as raw data format or as a tree to drill into. If you’re happy with the result you can copy and paste the URL to use either in a browser or your script.
You have a list of your recent queries, some demo queries to get you going and a list of all the available data tables. Data tables are the definitions that point to the third party API and they come with demo queries and a description which tells you what parameters are expected to make the request work.
For example: Frankfurt
As an example, let’s build an interface that shows information about Frankfurt.
The main piece of code that you need is a function that uses cURL to get data from the web:
function getstuff($url){
$curl_handle = curl_init();
curl_setopt($curl_handle, CURLOPT_URL, $url);
curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, 2);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
$buffer = curl_exec($curl_handle);
curl_close($curl_handle);
if (empty($buffer)){
return ‘Error retrieving data, please try later.’;
} else {
return $buffer;
}
}
Then you can get a description of Frankfurt from Wikipedia via YQL and the HTML table:
$root = ‘http://query.yahooapis.com/v1/public/yql?q=’;
$city = ‘Frankfurt’;
$loc = ‘Frankfurt’;
$yql = ‘select * from html where url = ‘http://en.wikipedia.org/wiki/’.$city.’’ and xpath=”//div[@id=’bodyContent’]/p” limit 3’;
$url = $root . urlencode($yql) . ‘&format=xml’;
$info = getstuff($url);
$info = preg_replace(“/.*|.*/”,’‘,$info);
$info = preg_replace(“/
” encoding=”UTF-8”?>/”,’‘,$info);
$info = preg_replace(“//”,’‘,$info);
$info = preg_replace(“/”/wiki/”,’”http://en.wikipedia.org/wiki’,$info);
Newest events from upcoming:
$yql = ‘select * from upcoming.events.bestinplace(4) where woeid in (select woeid from geo.places where text=”’.$loc.’”) | unique(field=”description”)’;
$url = $root . urlencode($yql) . ‘&format=json’;
$events = getstuff($url);
$events = json_decode($events);
foreach($events->query->results->event as $e){
$evHTML.=’