How SearchMonkey can lead to a cleaner, more data-rich webFriday, May 16th, 2008 at 7:44 pm
I normally don’t write about products of my company here, but I want to quickly talk about SearchMonkey, a new service by Yahoo available for developers right now.
I spent some time with journalists from several magazines yesterday explaining the technicalities of SearchMonkey following a presentation about the business and end user benefits explained by a colleague who knows more about that area. This opened my eyes as to what a massive impact SearchMonkey will have if we use it the right way.
What is SearchMonkey and what is the business or user benefit?
On the surface, SearchMonkey means that Yahoo! opens their search result pages to write plugins for them. You can individually style search results and provide “a much richer user experience”. In other words you can actually write “monkeys” that allow people to deep-dive into your site from the search results provided by Yahoo.
Say for example you are netflix or any other DVD rental company and you want people to rent the movie. You could write a monkey that styles search results by imdb.com or other film review sites to show the cover of the DVD, a synopsis and a link to netflix to rent the movie.
The benefits are that the end user does not need to load the IMDB page if she’s happy with the information in the synopsis and you as netflix get a lot of traffic you hadn’t had before.
Splendid, that gets the end users and business people happy, but how does it lead to a cleaner web?
How can you use this to make the web a cleaner place?
Well, this is a strong business case to show to people. Search result pages are boring and people tried to game them with SEO techniques (and abominations) for years – people love to be found. Yahoo! opening their SERPs (Search Engine Result Pages) to the world means first of all you can offer a monkey as a service if you are a developer or small agency.
Monkeys are built upon data available about the document. In its most basic form that is the information Yahoo is keeping in its index – titles, descriptions and other meta information. You can use this information to add more HTML to the SERP showing it, but the harvest is on the meager side of things.
To battle that problem, SearchMonkey allows you to offer “Custom Data Services” to get more information to display. These services could be an XSLT that grabs information from the document, an API that returns an XML in a certain DataRSS format or – and here is where it gets really interesting – RDF or microformats information in the document.
Scraping a document with XSLT always feels wrong to me as the HTML structure is impacted by so many factors – the other issue of course is that it is very memory and computation hungry.
Providing an own API returning XML is of course the optimal way of offering data, but you may not have the resources or drive to do so.
Which leaves microformats and RDF. Right now we support hAtom, hCalendar, hCard, hReview and XFN in terms of microformats and embedded RDF and that makes it a lot easier to retrieve information.
There is a leveraging the Data Web section in the SearchMonkey documentation that explains this in detail (and I am sorry someone considered “leveraging” good language).
So, all in all SearchMonkey is not only a massive step in opening the web for developers (you can style search results without paying for that – something that would have given marketing people seizures not too long ago), it also means that we finally have a channel to display the data we add in microformatic or RDF format that is not a Firefox extension or a specialized search engine. With information in the page developers (either in-house or external) can create customized search results in one step without having to rely on the structure on your document and without having to set up both a monkey app and a data service.
On the flipside, even the idea that someone can use SearchMonkey to enhance search results with data provided by you (and link back to you) if it is easy to write an XSLT to get to the data is a mighty argument to show to people when you want to promote clean HTML structure (no, I won’t call it POSH, as there is nothing plain and old about good, semantic HTML - it is not common practice, sadly enough).