Christian Heilmann

Posts Tagged ‘apis’

TTMMHTM: Google delivers, Guardian delivers and some more developer goodies

Friday, May 21st, 2010

Damn, life is good as a developer right now.

** We finally have an open video format with WebM which is a container based on Matroska with VP8 as the video and Vorbis for audio. It is backed by Opera, Mozilla and Google amongst other huge players. IE9 says it will support it with a plugin and Safari seems to be supporting it via a Quicktime plugin.

** Google released a library to make it easier to embed web fonts and also host quite a bunch of fonts for easy embedding – preliminary reports by friends of mine state that they look amazing on Mac but very bac on PC - can you verify this?

** Google Latitude now has an API but it also shows some problems

** Google also released BigQuery which is a REST API to analyse data in Google Storage. One of the uses they showed is the Prediction API which does natural language analysis and could become a contender for Open Calais.

** AppEngine got some good updates including a Comet API

** Android got a major update – it now runs the superfast V8 engine, you will have access to accelerometer and camera with a JavaScript API and you can send applications to the phone from computers via the cloud. Makes me want to kill the guy who dropped my Nexus One and hope HTC will be quick in repairing it.

Participating in the Web of Data with Open Standards

Wednesday, March 17th, 2010

These are the detail notes for my talk at the mix10 conference in Las Vegas. The description of my talk was the following:

Web development as we do it right now is on the way out. The future of the web is what its founders have planned a long time ago: loosely joined pieces of information for you to pick and choose and put together in interfaces catered to your end users. In this session, see how to build a web portfolio that is always up-to-date, maintained by using the web rather than learning a bespoke interface and high in performance as the data is pulled and cached for you by a high traffic server farm rather than your server. If you wondered how you can leave your footprint on the web without spending thousands on advertising and development, here are some answers.

The slides

The detailed notes / transcript

Welcome to the web

When I started using the web I was working for a radio station as a newscaster and producer. I’ve always dabbled with computers and connecting them world-wide and I was simply and utterly amazed at how easy it was to find information. This was 1996 and I convinced my boss at the radio station that it is of utmost importance that I get an internet access to be the first in our town to get the news from Associated Press and other very important sources. In reality, I just fell in love with the web and its possibilities.

Working as a professional web developer

I quit my job soon after that and built my first web sites. Together with some friends we maintained a banner exchange and I was admin for a few mailing lists, IRC and built a lot of really bad web sites which – back then – were the bee’s knees.

I went through the first .com boom living in a hotel in Los Angeles with all expenses paid for writing HTML.

Joining the Enterprise crew

I then worked for an agency and delivered lots of products with enterprise level content management systems. Massive systems intended to replace the inadequate systems of the past.

This is when things went wrong

For a few years I released things that were incredibly expensive and meant that the people using them had to be sent on £3K+ trainings just to be able to do the things they already did before that – only much less effective. Or did they?

Joining the corp environment (to a degree)

Again, this was a very interesting step as it meant I moved away from the large world of delivering for clients to concentrating on one single company – one of the companies that defined the world wide web as it is now and also a large player in a newly emerging world.

Hey Ho, Web 2.0 highfive

The web 2.0 came around and run rampant in the media and in the mind of early adopters. User Generated Content sounded like an awesome scam to make millions of dollars with content that comes for free – all you need to do is set up the infrastructure and find the right people.

And it went wrong again.

The main purpose of the first round of web2.0 was to be as visible as possible – doesn’t matter if all the content added by your users is really just “first” and “you’re a fag” comments – as long as the user numbers were great you won.

And now?

Makes you wonder what comes now, doesn’t it?

The web as the platform, the Mobile Web and GeoLocation

This is where we are. We can use the web as our platform, we work with virtual servers, hosted services and are happy to mix and match different services to get our UGC fix. What is missing though is the glue.

Market changes leave a track

The thing we keep forgetting though are the users. All the changes we’ve gone through bred generations of users that are happy to use what they learnt to do in their job but are not happy having to re-learn basic chores over and over again.

Time to shift down a gear

We have the infrastructure in place, we can already make this work. I get a real feeling that we are not innovating but moving the web to a mainstream media. Streaming TV without commenting, massive successes like Farmville and MafiaWars makes me realize that the web is becoming ubiquitous but that it is also boring for me as someone who wants to move it further. In order to accelerate with a car you need to shift down a gear (or kick down the accelerator with an automatic). This is the time to do so.

Finding the common denominator

What is the common denominator that drove all the innovation and movement in the past? Data. Information becoming readily available and interesting by mixing it with other information sources to find the story in the data.

Tapping into the world of data

Data is around us, it is just not well structured. People cannot even be bothered to write semantic HTML and it is tough to teach editors to enter tags, alternative text for images and video descriptions. The reason is that we lack success stories and learning our CMS takes too long. Instead teaching people how to use systems we should teach people how to structure information and simplify the indexing process rather than empowering to make things pretty and shiny – this is what other experts do better.

Why APIs work

APIs or Application Programming Interfaces are the best thing you can think of if you want to build something right now. If you remove the information from the interface – both in terms of data entry and when it comes to consumption we can build a web that works for everybody. By making information searchable and allowing filtering out-of-the-box you can scale to any size or cut down to the utmost necessary.

APIs made easy

The Yahoo Query language or short YQL is a very simple language that allows you to use the web like you would use a database. In its simplest form a YQL query looks like this:

select {what} from {where} where {conditions}

Say for example you want to find photos in Flickr with the keyword donkey. The YQL statement for this is the following:

select * from flickr.photos.search
where text=”donkey” and license=4

The license=4 means that the photos you are retrieving and displaying are licensed with Creative Commons, which means that you are allowed to show them on your page. This is very important as Flickr users are happy to show photos but not all are happy for you to use their photos in your product. Play nice and we all get more good data.

YQL is not limited to Yahoo APIs and data – anything on the web can become accessible with it. If you want to find a flower pot in the Bay Area you can use for example Craigslist with YQL:

select * from craigslist.search where
location=”sfbay” and type=”sss” and query=”flower pot”

If you want to get the latest news from Google about the topic of healthcare, you can use:

select * from google.news where q=”healthcare”

If you want to collate different APIs and get one data set back, you can use the query.multi table. The following for example searches the New York Times archive, Microsoft Bing News and Google News for the term healthcare and returns one collated set of results:

select * from query.multi where queries in (
‘select * from nyt.article.search where query=”healthcare”’,
‘select * from microsoft.bing.news where query=”healthcare”’,
‘select * from google.news where q=”healthcare”’
)

YQL even allows you to get information when there is no API available. For example in order to get the text of all the latest headlines from Fox News you can use YQL’s HTML table:

select content from html where
url=”http://www.foxnews.com/” and xpath=”//h2/a”

This goes to the foxnews.com home page, retrieves the HTML, runs it through the W3C HTML Tidy tool to clean up broken HTML and filters the information down to the text content of all the links inside headings of the second level. This is done via the xpath statement //h2/a which means all links inside h2 elements and you retrieve the content instead of everything with the *.

You can then do more with this content – for example using the Google translation API to translate the headlines into French:

select * from google.translate where q in (
select content from html where url=”http://www.foxnews.com/”
and xpath=”//h2/a”
) and target=”fr”

As you can see it is pretty easy to mix and match different APIs to reach your goal – all in the same language.

YQL goes further than just reading data though. If you have an API that allows for writing to it, you can do insert, update and delete commands much like you can with a database. You can for example post a blog post on a WordPress blog with the following command:

insert into wordpress.post
(title, description, blogurl, username, password)
values (“Test Title”, “This is a test body”, “http://yqltest.wordpress.com”, “yqltest”, “password”)

The insert, update and delete tables require you to use HTTPS to send the information, which means that although your name and password are perfectly readable here they won’t leak out to the public – unless you are using them inside JavaScript where they would be readable.

The YQL endpoint

As YQL is a web service, it needs an endpoint on the web. For tables that don’t need any authentication this end point is the following URL:

http://query.yahooapis.com/v1/public/yql?
q={query}
&format=xml|json
&callback={callbackfunction}

You can get the information back as XML or as JSON format. If you choose JSON then you can use the information in JavaScript and by providing a callback function even without any server interaction.

Benefits of using YQL

YQL is a way to use the web as a database. Instead of using your time reading up on different APIs, requesting access and learning how they work you can simply access and mix the data and you get it back ready to use a few minutes later. YQL does all this for you:

  • No time wasted reading API docs – every YQL table comes with a desc command that tells you what parameters are expected and what data will come back for you to use.
  • Creating complex queries with the console. – the YQL console allows you to play with YQL and quickly click together complex queries in an easy to use interface. It also previews the information directly to you so you can see what comes back and in which format.
  • Filter data before useYQL allows you to select all the data in a certain query using the * or specifically define what you want – down to a very granular level. In very limited application environments like mobile devices this can be a real benefit. It also means that you don’t need to spend a lot of time converting information to something useful after you requested it but even before you send it to the interface layer.
  • Fast pipesYQL is hosted on a distributed server network that is very well connected to the internet. Chances are that the server is fast to reach and a few times faster than your own server when it comes to accessing API servers from all over the world.
  • Caching + convertingYQL by default gives out XML or JSON which can be useful to convert any data that is on the web in another format to these highly versatile and open data formats. YQL also has an in-built caching system that only gives you new data when it is available and returns it very fast if all you need to do is request in another time.
  • Server-side JavaScript – if the out-of-the-box filtering, sorting and converting methods are not enough for you you can use YQL execute tables to run the returned data through JavaScript before YQL gives it back to you. This allows for all the conversion and extension power JavaScript comes with in a safe and powerful environment as we use Rhino to run it server-side.

All in all YQL allows you to really use and access data without the hassle of resorting to XSLT, Regular Expressions, scraping and other tried-and-true but also complex ways of getting things web-ready.

Government as a trailblazer?

One thing that gets me very excited lately is governments throwing out information for us to use. Data that has been gathered with our tax money is now available for experts and laymen to play with it, look at it and see where the interesting parts are.

Conjuring APIs out of thin air

The problem is that the government data is not available for us as APIs – instead you will find it entered into Excel spreadsheets and in all kind of other data formats coming out of – yes – Microsoft products. We could now bitch about this and claim this is old school or do something about it. So how can we do this?

A few weeks ago I build http://winterolympicsmedals.com – an interface to research the history of the Medals won in the Winter Olympics from 1924 up to now. There is no API for that and this data is also not available anywhere.

What happened was that the UK newspaper the Guardian released a dataset of this information on their data blog (this is a free service the newspaper provides). The data was provided as an Excel sheet and all I did was upload it to Google Docs. I then selected “Share” and “export as CSV” which gives me the following URL:

http://spreadsheets.google.com/pub?key=tpWDkIZMZleQaREf493v1Jw&output=csv

Now, using YQL with this we have a web service:

select * from csv where url=”http://spreadsheets.google.com/pub?key=tpWDkIZMZleQaREf493v1Jw&output=csv” and columns=”Year,City,Sport,Discipline,Country,Event, Gender,Type”

By giving the CSV some columns we can now filter by them, for example to get all the Silver medals of the 1924 games we can use:

select * from csv where url=”http://spreadsheets.google.com/pub?key=tpWDkIZMZleQaREf493v1Jw&output=csv” and columns=”Year,City,Sport,Discipline,Country,Event, Gender,Type” and Year=”1924” and Medals=”Silver”

Spreadsheet to web services made easy!

This makes it easy for us to turn any spreadsheet into a web service. With Google docs as the cloud storage and YQL as the interface and doing the caching and limiting for us it is not that hard to do.

In order to make the creation of a search form and results table easier, I built a PHP script that takes this job on:


include(‘csvtoservice.php’);
$content = csvtoservice(‘http://winterolympicsmedals.com/medals.csv’);
if($content){

if($content[‘form’]){
echo ‘

Filters

‘;
echo $content[‘form’];
}

if($content[‘table’]){
echo ‘

Results

‘;
echo $content[‘table’];
}

}
?>

Some examples

Let’s quickly go through some examples that show you the power of YQL and using already existing free, open systems to build a web interface.

  • GooHooBi is a search interface that allows you to search the web simultaneously using Google, Yahoo and Bing. You can see how it was done in a live screencast on Vimeo.
  • UK House Prices is a mashup I build for the release of the UK government open data site. It allows you to compare the house prices in England and Wales from 1996 up to now and see where it makes sense to buy a place and how stable the prices are in that area.
  • GeoMaker is a

In summary

  • We have the network and we have the technology.
  • We have people who work effectively with the tools they use.
  • We have a new generation coming who naturally use the internet and are happy with our web interfaces.
  • If we use our efforts 50/50 on new and building APIs and converters to get the data of the old the web will rock.

Homework

As a homework I want people to have a look at the open tables for YQL on GitHub
and see what is missing. I’d especially invite people to add their APIs to the tables using the open table documentation. We want your data to allow people to play with it.

Learn more

If you want to learn more about building sites like the ones I showed with this approach, there is a Video of me talking you through the building of UK-House-Prices.com available on the YUI blog.

TTMMHTM: Monitoring the web, Synth Britania, charting the Beatles and htc performance

Wednesday, January 20th, 2010

Fantastic voyage into the web of data – my talk at the Webmontag in Frankfurt, Germany

Tuesday, August 11th, 2009

Yesterday I gave a talk at the Webmontag in Frankfurt, Germany about using APIs to build web sites from distributed data using YQL. Here are the slides of the talk followed by my notes.

Transcript / Notes

Fantastic voyage into the web of data

Web development has changed drastically in the last few years. Sadly enough not all of the options that are open to us are common practice though.

Developing the web vs. developing for the web

The main issue is that instead of using the web to develop our products we still develop products for the web. Instead of embracing the fact that there is no “offline” when it comes to web sites we still build products that keep all the content and media on one server and write mediocre solutions for people to deal with images, video content or outbound links whilst there are already great products in place that were built for exactly those use cases.

Instead of concentrating our energies on improving the content of the web – using proper textual structures, providing alternative content, adding semantic meaning and geospatial context and so on – we spend most of our days bickering on forums, mailing lists, blogs and really any other platform about the technologies that drive the web.

There are dozens of solutions on how to make rounded corners work with any old browser out there; we keep re-inventing new ways to use custom fonts on web sites yet the documentation on proper localisation and real accessibility of web products that benefit everybody are rare.

Decentralised Data

The biggest mistake in web development to me is building a single point of entry for our users and then hope that people will come. This is why we spend more time and money on SEO, link-building, newsletters and other ways of promoting our domain and brand instead of embracing the idea of the web.

The web is an interlinked structure of data – media, documents and URL endpoints. By spreading ourselves all over it we make our domain less important but we also weave ourselves into the fabric of the web.

There is a different approach to web development. About two years ago I wrote this book. In it I explained that you can build an easy to maintain and successful web site without needing to know much about programming. The lack of success of the book to me is related to the title which is far too complex. “Web Development Solutions: Ajax, APIs, Libraries, and Hosted Services Made Easy” originally was meant to be “No Bullshit Web Design”.

The trick that I explained in the book and want to re-iterate here is the following: instead of trying to bring the web to our site we are much better off bringing our site to the web.

The main core of the site should be a CMS and it doesn’t really matter which one. This could be as easy as a blogging system like WordPress or as complex as a full-blown enterprise level system like Vignette, Tridion or RedDot. The main feature should be that it is modular and that we can write our own extensions for it to retrieve data from the web.

The next step is to spread our content on the web:

The benefits of this approach are the following:

  • The data is distributed over multiple servers – even if your own web site is offline (for example for maintenance) the data lives on
  • You reach users and tap into communities that would never have ended up on your web site.
  • You get tags and comments about your content from these sites. These can become keywords and guidelines for you to write very relevant copy on your main site in the future. You know what people want to hear about rather than guessing it.
  • Comments on these sites also mean you start a channel of communication with users of the web that happens naturally instead of sending them to a complex contact form.
  • You don’t need to worry about converting image or video materials into web formats – the sites that were built exactly for that purpose automatically do that for you.
  • You allow other people to embed your content into their products and can thus piggy-back on their success and integrity.

If you want to more about this approach, check out the Developer Evangelism Handbook where I cover this in detail in the “Using the (social) web” chapter.

APIs are the key

The main key to this kind of development are Application Development Interfaces or for short APIs. Using APIs you get programmatic access to the content of the API provider. There are hundreds of APIs available for you and one site that lists them is programmable web.

Using an API can be as easy as opening an address like http://search.twitter.com/trends/current.json in a browser. In this case this will get you the currently trending topics on Twitter in JSON format.

API issues

Of course there are also problems with APIs. The main one is inconsistency. Each API has its own ways of authenticating, needs different input parameters and has different output formats and structures. That means that you have to spend a lot of time reading API documentation or – if that one doesn’t exist which happens a lot – trial and error development. The other big problem with APIs is that a lot of providers underestimate the performance the API needs and the amount of traffic it will have to deal with. Therefore you will find APIs being unavailable or constantly changing to work around traffic issues.

No need for rock stars

Whilst development using third party APIs used to be an exclusive skill set of experts this is very much over. Newer products and meta APIs allow everyone to simply put together a product using several APIs. There is no need any longer to call in a “rock star developer”

YQL - making it really easy

YQL is a meta API that allows you to mix, match, convert and filter API output in a very simple format:

select {what} from {where} where {condition(s)}

The easiest way to start playing with YQL is by using the console. This is a simulation of a call to the YQL web service and allows you to enter your query and define the output format (XML or JSON or JSON-P or JSON-PX). You then run the query and will see the results either as raw data format or as a tree to drill into. If you’re happy with the result you can copy and paste the URL to use either in a browser or your script.
You have a list of your recent queries, some demo queries to get you going and a list of all the available data tables. Data tables are the definitions that point to the third party API and they come with demo queries and a description which tells you what parameters are expected to make the request work.

For example: Frankfurt

As an example, let’s build an interface that shows information about Frankfurt.

The main piece of code that you need is a function that uses cURL to get data from the web:

function getstuff($url){
$curl_handle = curl_init();
curl_setopt($curl_handle, CURLOPT_URL, $url);
curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, 2);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
$buffer = curl_exec($curl_handle);
curl_close($curl_handle);
if (empty($buffer)){
return ‘Error retrieving data, please try later.’;
} else {
return $buffer;
}

}

Then you can get a description of Frankfurt from Wikipedia via YQL and the HTML table:

$root = ‘http://query.yahooapis.com/v1/public/yql?q=’;
$city = ‘Frankfurt’;
$loc = ‘Frankfurt’;

$yql = ‘select * from html where url = ‘http://en.wikipedia.org/wiki/’.$city.’’ and xpath=”//div[@id=’bodyContent’]/p” limit 3’;
$url = $root . urlencode($yql) . ‘&format=xml’;
$info = getstuff($url);
$info = preg_replace(“/.*|.*/”,’‘,$info);
$info = preg_replace(“/ ” encoding=”UTF-8”?>/”,’‘,$info);
$info = preg_replace(“//”,’‘,$info);
$info = preg_replace(“/”/wiki/”,’”http://en.wikipedia.org/wiki’,$info);

Newest events from upcoming:

$yql = ‘select * from upcoming.events.bestinplace(4) where woeid in (select woeid from geo.places where text=”’.$loc.’”) | unique(field=”description”)’;
$url = $root . urlencode($yql) . ‘&format=json’;
$events = getstuff($url);
$events = json_decode($events);
foreach($events->query->results->event as $e){
$evHTML.=’
  • $yql = ‘select * from flickr.photos.info where photo_id in (select id from flickr.photos.search where woe_id in (select woeid from geo.places where text=”’.$loc.’”) and license=6) limit 16’;
    $url = $root . urlencode($yql) . ‘&format=json’;
    $photos = getstuff($url);
    $photos = json_decode($photos);
    foreach($photos->query->results->photo as $s){
    $src = “http://farm{$s->farm}.static.flickr.com/{$s->server}/{$s->id}_{$s->secret}_s.jpg”;
    $phHTML.=’
  • title.’” src=”’.$src.’”>
  • ‘;
    }

  • And the weather forecast from Yahoo Weather:

    $yql=’select description from rss where url=”http://weather.yahooapis.com/forecastrss?p=GMXX0040&u=c”’;
    $url = $root . urlencode($yql) . ‘&format=json’;
    $weather = getstuff($url);
    $weather = json_decode($weather);
    $weHTML = $weather->query->results->item->description;

    Kobayashi Maru

    Kobayashi Maru is a fictional test that graduates of Star Fleet Academy in Star Trek have to pass in order to get their first commission. The interesting part about the test is that it cannot be solved and its purpose is to confront people with the idea of failure and death and see how they cope with it. The only person to ever successfully pass the test is James Tiberius Kirk because a) he is the definition of awesome and b) he cheated by modifying the computer program.

    YQL can be used the same way to create an API where none exists. For example by scraping the headlines of a newspaper web site using the HTML table and an xpath:

    select * from html where url=”http://faz.de” and xpath=”//h2”

    See it here or try it in the console.

    You can then go even further and translate the headlines using Google’s translation API:

    select * from google.translate where q in (select a from html where url=”http://faz.de” and xpath=”//h2”) and target=”en”;

    See it here or try it in the console.

    You can also use an API to filter cleverly to get information that normally is not readily available. For example getting all twitter updates from two different users but only when they posted a link:

    select title from twitter.user.timeline where title like “%@%” and id=”codepo8” or id=”ydn”

    See it here or try it in the console.

    Benefits of using YQL

    YQL gives you a lot of flexibility when it comes to remixing the web and filtering the results. You can:

    • mix and match APIs
    • filter results
    • simplify authentication
    • use in console or from code
    • minimal research of documentation
    • caching of results
    • proxied on Yahoos servers

    Join the web of data!

    Using YQL you can not only read and mix API data but you can also make your own data available to the world. By defining a simple XML schema as an Open Table you give YQL data access to your API endpoint. The really useful part of this is that YQL limits to the outside access to 100000 hits a day and 1000 hits an hour and caches your data for you. Thus you can have the world use your data without having to buy an own server park.

    Thanks

    I hope I got you interested in YQL - now it is up to you to have a go using it!

    Reaching those web folk – a talk about data distribution, APIs and social media at the NMM

    Wednesday, April 29th, 2009

    Yesterday evening I was very happy indeed to go to the National Maritime Museum in London to talk to representatives of several Museums about data distribution, YQL and open tables.

    The whole thing was initiated by Jim O’Donnell who had spent quite some time with YQL and NMM’s data.

    In my talk Reaching those web folk [PDF, 6.2mb] I covered the switch from a web sites as end points to open data as an opportunity to reach many more users and turn any of your visitors from a receiver to a relay broadcasting your information to their friends, contacts and distribution channels you are not even aware of.

    You can download the audio recording of the talk, too: Reaching those web folk [MP3, 76.6mb]

    Also thanks to Mia Ridge for taking notes in case you want someone else’s view.