Christian Heilmann

Author Archive

H4xx0r3d! – how I found out that I am running a spam blog

Wednesday, March 3rd, 2010

Yesterday, actually ten minutes before I had to leave for Kilburn to give my talk at ignite I had a shocking moment. I found in one of the sub-folders of my vast server a blog that offers cheap OEM software:

Phantom OEM blog on my server

All of these links sooner or later redirect to firemicrosoft.net which is owned by someone in Russia and hosted by GoDaddy.

Don’t make folders writable to the world

What happened is that I had a very old guestbook script I had written once still running in this folder. The trick back then (and advocated by a lot of PHP tutorials as it is much easier that way) was to chmod a folder to 777 (read/write/execute permission for all) to store flat files in it. That was good enough for me back then (around 2000) and guess what? It was good enough for the spammers to store their blog.

Static page generation – in bulk

The blog was set up quite craftily in terms of SEO: Search Engines love static pages, so instead of accessing a DB - which wasn’t compromised – they simply created static pages for all the search queries that came in. After all this is about showing links and Google juice, not about delivering content. In the end, I found that I had 23487 HTML files advertising spam. Thank god for SSH access as this would have taken some time to delete over SFTP.

I investigated last night and I am happy to say that this is all that happened. If I found a folder to store whatever I pleased into I’d have also tried to read other files, including the wp_config.php for example.

Google Reader as a whistle blower

The interesting part about this is how I came to find out about it: Google Reader. I have a Google blog search RSS feed in my reader that notifies me every time someone links to http://wait-till-i.com – I found this much more useful than trackbacks which seem only to be used by spammers these days anyways:

In this feed I got a lot of posts from http://vancouverisawesome.com/:

lots of weird links back to my blog in Google Reader

I thought at first that this is because of http://winterolympicmedals.com – after all it is timely for that. When I looked at the source code of this site, however, I found that just before the closing BODY tag spammers had injected links to different sites advertising OEM software:

[... lots of links interspersed with random HTML ...]

At first I sniggered about them linking to a folder on my site I know that doesn’t exist but when I clicked the link and found the blog my smile vanished quickly.

See the whole stuff on pastebin – as you can see, all in all eight sites were attacked the same way.

What I find curious is that the links on vancouverisawesome are hidden and seem to still be indexed by Google – I remember being almost kicked out of AdSense once for absolutely positioning ads. Also, the links might be on the top of the screen but in the document are way down the tree, and vancouverisawesome is quite packed with links already.

I’ve cleaned up my server and I have contacted the maintainers of the other seven sites (and got a lot of “thank you” for that). I also contacted vancouverisawesome about them having spam links in the bottom. This is a pretty common attack (we had it on Ajaxian.com, too) targeted at WordPress installs.

How to avoid all this (and how to detect it)

So in order to make sure that this doesn’t happen to you:

  • Do not leave folders writable to the world – if a piece of software tells you that you need to do this tell them to change it – it is inviting spammers like a dog turd invites flies.
  • Do monitor your incoming links – if I hadn’t had the blog search RSS feed running I probably wouldn’t have found the blog until it really showed up in my traffic stats.
  • Always upgrade your WordPress install – this is automated now and takes a second – there is no excuse not to.
  • Redirect or – in the most extreme case – delete old things on your server that you don’t maintain any longer.

GeoPlanet Explorer – another showcase for quick development with YQL and YUI

Friday, February 26th, 2010

A few days ago Gary Gale pinged me on messenger and subsequently carried a cup of coffee to my desk to pester me with another challenge. This time he talked about just how rich and cool the GeoPlanet data is and that it is tough to show people this in a simple interface. Internally we have a few pretty cool tools for testing and analyzing the data but most of them are too loaded with information only understandable for the geo folk out there. So in essence, the benevolent overlord of geo technologies in Yahoo was asking to build a simple interface to navigate the GeoPlanet data.

Well, this morning I got a chance to have a go at his request and here’s the GeoPlanet Explorer interface for you. Check the following screencast to see it in action:

Building the interface wasn’t magic – I used YQL to access the data, write a few lines of PHP to display it in a nested list and then added a few lines of YUI3 JavaScript to collapse and expand the location details.

Notice that the whole interface uses progressive enhancement throughout. If you have no JavaScript at your disposal you get a static map and all the information in one single page. The lat/lon links open in Yahoo Maps and you can see the location there.

If you have JavaScript enabled the interface collapses and the map is Ajax and will be refreshed every time you click on a lat/lon link.

The source code of the GeoPlanet Explorer is available on GitHub and it can give you a few pointers how to use the GeoPlanet API with YQL for your own solutions.

TTMMHTM – BBC Web animals, two very cool APIs and there’s something about the LG logo

Tuesday, February 23rd, 2010

Things that made me happy this morning:

Analysing the history of Winter Olympics medals with YQL

Thursday, February 18th, 2010

I am a big fan of the Guardian Data Blog which releases all kind of cool datasets used in their research for people to mash up. One of the recent data sets was the Statistics of Winter Olympics medals over the years.

I’ve taken the Excel sheet and exported it as a CSV. Then I created a YQL open table to make it easier to use and filter the information.

Using this table you can now get the statistics of the Winter Olympics in terms of medals from 1924 up to 2006 (the 2010 data is of course not in there yet). Try it out for yourself:

Get all medal information:

use "http://isithackday.com/wintermedals.xml" as medals;
select * from medals

see it as XML or in the YQL console

UK Gold medals (small dataset):

use "http://isithackday.com/wintermedals.xml" as medals;
select * from medals where country="gbr" and type="gold"

see it as XML or in the YQL console

All Skating medals in the Speed Skating discipline won by men

use "http://isithackday.com/wintermedals.xml" as medals;
select * from medals where sport="skating"
and discipline="speed skating"
and gender="m"

see it as XML or in the YQL console

All US, Canadian and French Medals in the Games before 2000

use "http://isithackday.com/wintermedals.xml" as medals;
select * from medals where country in
('usa','can','fra') and year="19"

see it as XML or in the YQL console

All the things you can filter with

Basically you can get the medal information and you can filter your research with the following criteria:

  • year – the year of the olympics
  • city – the city it was held in, like “Lillehammer” or “Sarajevo”
  • sport – the sport
  • discipline – the sub-discipline
  • country – the country as an NOC code
  • event – the event, like “alpine combined” or “two-man”
  • gender – the athlete’s gender, X is for pair sports
  • type – the medal type (Gold, Silver, Bronze)

Any of these could also be used as a wildcard, so country="g" would find GBR, FRG, GDR, YUG and GER.

Have fun!

Diving into the web of data – the YQL talk at boagworld live 200

Friday, February 12th, 2010

I just finished a quick podcast demo for the 200th podcast of Boagworld, streamed live on ustream. I thought I had an hour but it turned out to be half an hour. My topic was YQL and I wanted to actually do something like shown in the video that was just released (click through to watch or download the video in English or German):

YQl and YUI video

The story of this is:

  • We spend a lot of time thinking about building the interface and using the right semantic markup and trying to make browsers work (or expect certain browsers in certain settings as we gave up on that idea)
  • What we should be concentrating on more is the data that drives our web sites – it is boring to have to copy and paste texts from Word documents or get a CMS to generate something that is almost but not quite like useful HTML.
  • You could say that once published as HTML the data is available but for starters HTML4 is bad as a data format to store information. Furthermore too many pieces of software can access web sites and the cleanest HTML you release somewhere can be messed up by somebody else with a CMS or any other mean of access further down the line. Sadly enough most content editing software still produces HTML that is tied with its presentation rather than what it should structure and define.
  • Having worked on the datasets provided by the UK government at data.gov.uk I’ve realised that we are nowhere near as a market to provide re-usable and easily convertible data to each other. XML was meant to be that but got lost in complexities of dictionaries, taxonomies and other things that you can spend days on to define English content but have to re-think anyway once you go multilingual. Most content – let’s face it – is maintained in Excel sheets and Word documents – which is OK cause people should not be forced to use a system they don’t like.
  • If you really think about the web as a platform and as a media then we should have simple ways to provide textual data or binary information (for videos and images) instead of getting bogged down in how to please the current generation of browsers.
  • If you really want to be accessible to any web user – and that is anyone who can get content over HTTP - you should think about making your content available as an API. This allows people to build interfaces necessary for edge cases that you didn’t even think existed.
  • YQL is a simple way to use the web as a database and mix and match data and also a very simple way to provide data in easy to digest formats – give it a go.

In any case, after the 2 o’clock podcast where most of my questions were eloquently answered by Jeremy Keith and the Skype connection died in the last 5 minutes I spent the afternoon putting together some demos for this YQL talk as YQL is really easiest to explain with examples and to have something for the people on flaky connections to play with. So if you go to:

You can see what I talked about during the podcast. People in the chat asked if this will be open source. Yes it is, the passcode is pressing cmd+u in Firefox or whatever other way you choose to “view source” in your browser of choice.

Normally I would not do any of these calls purely in JavaScript (as explained in the video) but this was the quickest solution and it can give you an insight just how easy it is to use information you requested, filtered and converted with YQL.