Christian Heilmann

Posts Tagged ‘scraping’

TTMHHTM: Uni Hack Day, accessibility wins, out with the Bush and Testpilot

Wednesday, January 21st, 2009

There are so many things in my daily feed that made me happy today, I had to categorise them:

Work and colleagues – education + accessibility

That new fella in the white house

  • Bush street in San Francisco renamed to Obama Street – I love this city and its people
  • Finding out that the new whitehouse.gov is licensed Creative Commons
    bq. Except where otherwise noted, third-party content on this site is licensed under a Creative Commons Attribution 3.0 License. Visitors to this website agree to grant a non-exclusive, irrevocable, royalty-free license to the rest of the world for their submissions to Whitehouse.gov under the Creative Commons Attribution 3.0 License.

Geek stuff

  • The elecronic playground is a web site dedicated to listing cameos of arcade machines and consoles in film and TV
  • YouTube Street Fighter – another YouTube annotations game/hack. Let’s hope this one stays up, not like the Laserdisc Dragon’s Lair walkthrough.

Open source good news and things

  • Mozilla Labs going on with Test Pilot – a usability testing platform in Mozilla that will analyze user’s behaviours and publish the findings. No matter how this works out the logo is full of win and I had a great time at the last Mozilla Labs monthly meetup. If you are in the Valley, make sure to visit them!
  • Crowbar – a powerful screen scraping library based on Mozilla used for rendering output before converting it (and a cool bar in Tottenham Court Road in London)

Misc

YQL is so the bomb to get web data as XML or JSON

Friday, December 12th, 2008

Yesterday I wrote a blog post on YDN about opening the web covering curl, pipes and YQL and today I did a more detailed deep-dive on Ajaxian about how YQL can help you to convert the web to JSON.

Suffice to say, I like YQL a lot – it is the command line interface to the web (and a text version of Yahoo Pipes). Go and play with it yourself:

YQL console

As explained in the Ajaxian article, all the non-authentication web services can be accessed through a public REST API. Simply add your YQL statement to http://query.yahooapis.com/v1/public/yql?q= and add a format=json parameter and a callback parameter with the name of your callback function and you are set.

This would for example to allow you to search for rabbit images on the web and display them quick and dirty with a few lines of JavaScript:





YQL allows you to access any freely available data service and even scrape HTML, how cool is that?