webofdata | Christian Heilmann

Posts Tagged ‘webofdata’

Participating in the Web of Data with Open Standards

Wednesday, March 17th, 2010

These are the detail notes for my talk at the mix10 conference in Las Vegas. The description of my talk was the following:

Web development as we do it right now is on the way out. The future of the web is what its founders have planned a long time ago: loosely joined pieces of information for you to pick and choose and put together in interfaces catered to your end users. In this session, see how to build a web portfolio that is always up-to-date, maintained by using the web rather than learning a bespoke interface and high in performance as the data is pulled and cached for you by a high traffic server farm rather than your server. If you wondered how you can leave your footprint on the web without spending thousands on advertising and development, here are some answers.

The slides

Participating in the Web of Data

View more presentations from Christian Heilmann.

The detailed notes / transcript

Welcome to the web

When I started using the web I was working for a radio station as a newscaster and producer. I’ve always dabbled with computers and connecting them world-wide and I was simply and utterly amazed at how easy it was to find information. This was 1996 and I convinced my boss at the radio station that it is of utmost importance that I get an internet access to be the first in our town to get the news from Associated Press and other very important sources. In reality, I just fell in love with the web and its possibilities.

Working as a professional web developer

I quit my job soon after that and built my first web sites. Together with some friends we maintained a banner exchange and I was admin for a few mailing lists, IRC and built a lot of really bad web sites which – back then – were the bee’s knees.

I went through the first .com boom living in a hotel in Los Angeles with all expenses paid for writing HTML.

Joining the Enterprise crew

I then worked for an agency and delivered lots of products with enterprise level content management systems. Massive systems intended to replace the inadequate systems of the past.

This is when things went wrong

For a few years I released things that were incredibly expensive and meant that the people using them had to be sent on £3K+ trainings just to be able to do the things they already did before that – only much less effective. Or did they?

Joining the corp environment (to a degree)

Again, this was a very interesting step as it meant I moved away from the large world of delivering for clients to concentrating on one single company – one of the companies that defined the world wide web as it is now and also a large player in a newly emerging world.

Hey Ho, Web 2.0 highfive

The web 2.0 came around and run rampant in the media and in the mind of early adopters. User Generated Content sounded like an awesome scam to make millions of dollars with content that comes for free – all you need to do is set up the infrastructure and find the right people.

And it went wrong again.

The main purpose of the first round of web2.0 was to be as visible as possible – doesn’t matter if all the content added by your users is really just “first” and “you’re a fag” comments – as long as the user numbers were great you won.

And now?

Makes you wonder what comes now, doesn’t it?

The web as the platform, the Mobile Web and GeoLocation

This is where we are. We can use the web as our platform, we work with virtual servers, hosted services and are happy to mix and match different services to get our UGC fix. What is missing though is the glue.

Market changes leave a track

The thing we keep forgetting though are the users. All the changes we’ve gone through bred generations of users that are happy to use what they learnt to do in their job but are not happy having to re-learn basic chores over and over again.

Time to shift down a gear

We have the infrastructure in place, we can already make this work. I get a real feeling that we are not innovating but moving the web to a mainstream media. Streaming TV without commenting, massive successes like Farmville and MafiaWars makes me realize that the web is becoming ubiquitous but that it is also boring for me as someone who wants to move it further. In order to accelerate with a car you need to shift down a gear (or kick down the accelerator with an automatic). This is the time to do so.

Finding the common denominator

What is the common denominator that drove all the innovation and movement in the past? Data. Information becoming readily available and interesting by mixing it with other information sources to find the story in the data.

Tapping into the world of data

Data is around us, it is just not well structured. People cannot even be bothered to write semantic HTML and it is tough to teach editors to enter tags, alternative text for images and video descriptions. The reason is that we lack success stories and learning our CMS takes too long. Instead teaching people how to use systems we should teach people how to structure information and simplify the indexing process rather than empowering to make things pretty and shiny – this is what other experts do better.

Why APIs work

APIs or Application Programming Interfaces are the best thing you can think of if you want to build something right now. If you remove the information from the interface – both in terms of data entry and when it comes to consumption we can build a web that works for everybody. By making information searchable and allowing filtering out-of-the-box you can scale to any size or cut down to the utmost necessary.

APIs made easy

The Yahoo Query language or short YQL is a very simple language that allows you to use the web like you would use a database. In its simplest form a YQL query looks like this:

select {what} from {where} where {conditions}

Say for example you want to find photos in Flickr with the keyword donkey. The YQL statement for this is the following:

select * from flickr.photos.search

where text=”donkey” and license=4

The license=4 means that the photos you are retrieving and displaying are licensed with Creative Commons, which means that you are allowed to show them on your page. This is very important as Flickr users are happy to show photos but not all are happy for you to use their photos in your product. Play nice and we all get more good data.

YQL is not limited to Yahoo APIs and data – anything on the web can become accessible with it. If you want to find a flower pot in the Bay Area you can use for example Craigslist with YQL:

select * from craigslist.search where

location=”sfbay” and type=”sss” and query=”flower pot”

If you want to get the latest news from Google about the topic of healthcare, you can use:

select * from google.news where q=”healthcare”

If you want to collate different APIs and get one data set back, you can use the query.multi table. The following for example searches the New York Times archive, Microsoft Bing News and Google News for the term healthcare and returns one collated set of results:

select * from query.multi where queries in (

‘select * from nyt.article.search where query=”healthcare”’,

‘select * from microsoft.bing.news where query=”healthcare”’,

‘select * from google.news where q=”healthcare”’

)

YQL even allows you to get information when there is no API available. For example in order to get the text of all the latest headlines from Fox News you can use YQL’s HTML table:

select content from html where

url=”http://www.foxnews.com/” and xpath=”//h2/a”

This goes to the foxnews.com home page, retrieves the HTML, runs it through the W3C HTML Tidy tool to clean up broken HTML and filters the information down to the text content of all the links inside headings of the second level. This is done via the xpath statement //h2/a which means all links inside h2 elements and you retrieve the content instead of everything with the *.

You can then do more with this content – for example using the Google translation API to translate the headlines into French:

select * from google.translate where q in (

select content from html where url=”http://www.foxnews.com/”

and xpath=”//h2/a”

) and target=”fr”

As you can see it is pretty easy to mix and match different APIs to reach your goal – all in the same language.

YQL goes further than just reading data though. If you have an API that allows for writing to it, you can do insert, update and delete commands much like you can with a database. You can for example post a blog post on a WordPress blog with the following command:

insert into wordpress.post

(title, description, blogurl, username, password)

values (“Test Title”, “This is a test body”, “http://yqltest.wordpress.com”, “yqltest”, “password”)

The insert, update and delete tables require you to use HTTPS to send the information, which means that although your name and password are perfectly readable here they won’t leak out to the public – unless you are using them inside JavaScript where they would be readable.

The YQL endpoint

As YQL is a web service, it needs an endpoint on the web. For tables that don’t need any authentication this end point is the following URL:

http://query.yahooapis.com/v1/public/yql?

q={query}

&format=xml|json

&callback={callbackfunction}

You can get the information back as XML or as JSON format. If you choose JSON then you can use the information in JavaScript and by providing a callback function even without any server interaction.

Benefits of using YQL

YQL is a way to use the web as a database. Instead of using your time reading up on different APIs, requesting access and learning how they work you can simply access and mix the data and you get it back ready to use a few minutes later. YQL does all this for you:

No time wasted reading API docs – every YQL table comes with a desc command that tells you what parameters are expected and what data will come back for you to use.
Creating complex queries with the console. – the YQL console allows you to play with YQL and quickly click together complex queries in an easy to use interface. It also previews the information directly to you so you can see what comes back and in which format.
Filter data before use – YQL allows you to select all the data in a certain query using the * or specifically define what you want – down to a very granular level. In very limited application environments like mobile devices this can be a real benefit. It also means that you don’t need to spend a lot of time converting information to something useful after you requested it but even before you send it to the interface layer.
Fast pipes – YQL is hosted on a distributed server network that is very well connected to the internet. Chances are that the server is fast to reach and a few times faster than your own server when it comes to accessing API servers from all over the world.
Caching + converting – YQL by default gives out XML or JSON which can be useful to convert any data that is on the web in another format to these highly versatile and open data formats. YQL also has an in-built caching system that only gives you new data when it is available and returns it very fast if all you need to do is request in another time.
Server-side JavaScript – if the out-of-the-box filtering, sorting and converting methods are not enough for you you can use YQL execute tables to run the returned data through JavaScript before YQL gives it back to you. This allows for all the conversion and extension power JavaScript comes with in a safe and powerful environment as we use Rhino to run it server-side.

All in all YQL allows you to really use and access data without the hassle of resorting to XSLT, Regular Expressions, scraping and other tried-and-true but also complex ways of getting things web-ready.

Government as a trailblazer?

One thing that gets me very excited lately is governments throwing out information for us to use. Data that has been gathered with our tax money is now available for experts and laymen to play with it, look at it and see where the interesting parts are.

Conjuring APIs out of thin air

The problem is that the government data is not available for us as APIs – instead you will find it entered into Excel spreadsheets and in all kind of other data formats coming out of – yes – Microsoft products. We could now bitch about this and claim this is old school or do something about it. So how can we do this?

A few weeks ago I build http://winterolympicsmedals.com – an interface to research the history of the Medals won in the Winter Olympics from 1924 up to now. There is no API for that and this data is also not available anywhere.

What happened was that the UK newspaper the Guardian released a dataset of this information on their data blog (this is a free service the newspaper provides). The data was provided as an Excel sheet and all I did was upload it to Google Docs. I then selected “Share” and “export as CSV” which gives me the following URL:

http://spreadsheets.google.com/pub?key=tpWDkIZMZleQaREf493v1Jw&output=csv

Now, using YQL with this we have a web service:

select * from csv where url=”http://spreadsheets.google.com/pub?key=tpWDkIZMZleQaREf493v1Jw&output=csv” and columns=”Year,City,Sport,Discipline,Country,Event, Gender,Type”

By giving the CSV some columns we can now filter by them, for example to get all the Silver medals of the 1924 games we can use:

select * from csv where url=”http://spreadsheets.google.com/pub?key=tpWDkIZMZleQaREf493v1Jw&output=csv” and columns=”Year,City,Sport,Discipline,Country,Event, Gender,Type” and Year=”1924” and Medals=”Silver”

Spreadsheet to web services made easy!

This makes it easy for us to turn any spreadsheet into a web service. With Google docs as the cloud storage and YQL as the interface and doing the caching and limiting for us it is not that hard to do.

In order to make the creation of a search form and results table easier, I built a PHP script that takes this job on:


include(‘csvtoservice.php’);

$content = csvtoservice(‘http://winterolympicsmedals.com/medals.csv’);

if($content){

	if($content[‘form’]){

echo ‘
Filters
‘;

echo $content[‘form’];

}

	if($content[‘table’]){

echo ‘
Results
‘;

echo $content[‘table’];

}

	}

?>

Some examples

Let’s quickly go through some examples that show you the power of YQL and using already existing free, open systems to build a web interface.

GooHooBi is a search interface that allows you to search the web simultaneously using Google, Yahoo and Bing. You can see how it was done in a live screencast on Vimeo.
UK House Prices is a mashup I build for the release of the UK government open data site. It allows you to compare the house prices in England and Wales from 1996 up to now and see where it makes sense to buy a place and how stable the prices are in that area.
GeoMaker is a

In summary

We have the network and we have the technology.
We have people who work effectively with the tools they use.
We have a new generation coming who naturally use the internet and are happy with our web interfaces.
If we use our efforts 50/50 on new and building APIs and converters to get the data of the old the web will rock.

Homework

As a homework I want people to have a look at the open tables for YQL on GitHub
and see what is missing. I’d especially invite people to add their APIs to the tables using the open table documentation. We want your data to allow people to play with it.

Learn more

If you want to learn more about building sites like the ones I showed with this approach, there is a Video of me talking you through the building of UK-House-Prices.com available on the YUI blog.

Tags: apis, lasvegas, mix10, opendata, webofdata, yql, YUI
Posted in General | 8 Comments »

On being cleverly lazy – my talk at the WebExpo in Prague

Monday, October 19th, 2009

I am currently in Prague, Czech Republic and gave a talk on re-use, professionalism and ease of development at the WebExpo.

Today I am going to have a longer chat with the people at the University about accessibility and creating a WAI-ARIA enabled framework for web applications.

For now, here are the slides of the WebExpo presentation and the audio recording of the talk. There was a video streaming, too, and it is up to Microsoft now to get this out as a recording.

Slides

Being Cleverly Lazy

Audio recording

You can download on being cleverly lazy as an MP3 - 36MB or see other audio options on archive.org

Notes / Transcript

Today I will talk about the fact that being cleverly lazy will make you a better developer.

First of all, I am Chris, a developer evangelist. This means I am a developer who evolved into a role where I can tell other developers how to have an easier life and tell my company what other things developers would want to make it even easier for them. I’ve written a developer evangelism handbook if you are interested in learning more about this role and why your company would benefit from having someone like me.

Cleverly lazy

Let me quickly define cleverly lazy here. Lazy is “I don’t want to do it” wheres cleverly lazy is “I don’t want to do that ever again so I do it right this time”. In development the difference between lazy and cleverly lazy is that lazy products do the job but are impossible to maintain whereas cleverly lazy products do the job, are easy to understand, extend and are built on a solid base.

Evolving the web

Our job as developers, designers, planners and organisers of web products is to evolve the web. We should have the chance to concentrate on building solutions that make people happy to use them rather than having to work out how one browser or another fails or how we convert data into a format the web technologies can display and understand.

In order to achieve this, we need to be free to do new things rather than worrying about the past.

The feature loop trap.

One problem is that as developers, we stand in our own way. The biggest trap that there is for developers is the feature loop:

As developers we love to take a complex problem and tackle it. We love solving problems, that is why we become developers.
We then normally and quite quickly find a simple solution to the complex problem.
We then release the solution to the world and get feedback.
This is where it goes pear-shaped. We get stuck in a feature adding and feedback loop until the elegant, easy and simple solution becomes a complex one again.
Other developers will find our former simple solution and will think “hey, this is a complex problem, let’s make that easy!”

This keeps us from evolving as web developers. We don’t develop the web, we fill it up with solutions to the same problem.

My solution

The next issue is that as developers we love to find solutions ourselves and not use other people’s information. We appreciate other people’s work but in the end we only trust ourselves to come up with the best solution ever.

Short attention span

The problem with finding and building own solutions is that we tend to lose interest in them really fast and then leave them behind as we already another complex problem to simplify. This leads to lots of half-finished solutions on the web that don’t get any love any longer.

Unmaintained code is a security problem

All the code that is not getting any love stays unmaintained on the web and becomes a wonderful attack vector for new security threats or vulnerability exploits. Nothing beats being up-to-date when you want to build secure systems.

Things web developers must know to do their job.

Looking at our job as web developers, here are the things you have to be aware of to build things that people can use.

The technologies involved
How browsers deal with these technologies and how they fail to support them
Security concerns and attack vectors
Usability and accessibility of the product
Internationalisation of our products
Performance concerns
Multiple platform support
Flexibility of the interface

Browsers suck

Browsers are the bane of our existence as they are unpredictable, unreliable, there are hundreds of them (in various configurations) and they are not being updated by people out there.

That is a lot to know and deal with. The good news is that you don’t need to know all of it.

Good developers are like librarians

Librarians are great people. Instead of knowing everything that is in the library they know exactly where to find the information to solve a problem. As a cleverly lazy developer you should do the same.

Build on a solid foundation

Web development libraries came around for one simple purpose: Make our life as developers easier and our work as web developers predictable. They make browsers suck less and work around differences in browsers. They allow us to use web standards and get results instead of bugs to fix.

Build with components

If you build web applications and interfaces take a look around for what has already been created for you. Almost every library comes with widgets that are tested, work for everybody regardless of ability and can be changed to your needs. If you build your own component you are very likely indeed to forget some very necessary functionality that a collaborative product already had put in.

Use a good debugging environment

You can’t write great products without being able to know what is going on. Luckily enough nowadays we have great debugging tools like Firebug, Yahoo Profiler, JSLint and validators of all kind.

Plan for extensions

If you build a solution, plan a way so that people can extend it without having to change the central code. The central code should only have to change when there is a new browser or platform to be supported or there’s a security fix. Other than that people should be allowed to extend by listening for events or using other API hooks into your product.

Read, use and write documentation

If you build something, document it. There is no such thing as code that explains itself, that is a myth and an arrogant one at that.

Use the web

One of the really cleverly lazy things to do these days is to use the web to build your product rather than building your product and putting it on the web. Distribute your content on specialised systems like Flickr, YouTube, Delicious and LinkedIn and then put it together in a centralised CMS. That way your site will be easy to maintain and not vulnerable to a single point of failure.

Use APIs

To achieve this you need to use APIs. APIs can take a lot of your time to read up on and understand which is why it is a good plan to use YQL as an in-between to avoid going crazy on authentication and understanding what parameters go in and what data comes out.

An example

As an example, let’s take a look at a web site built solely from data on the web, using YQL and PHP.

Thanks!

Thank you, remember if you use what is out there you have time to be creative. I want us to write code most of our time, not to fix bugs.

Tags: libraries, mashups, pragmatism, professionalism, reuse, webexpo, webofdata, yql, YUI
Posted in General | 8 Comments »