Skip to content
Christian Heilmann

You are currently browsing the archives for the General category.

Archive for the ‘General’ Category

  • πŸ”™ Older Entries
  • Newer Entries πŸ”œ

Quick hack tutorial – how to build a Google Auto Suggest comparison

Monday, January 11th, 2010

This morning I stumbled upon this funny article on predictably irrational which shows the difference between men and women by using Google Autosuggest:

What boyfriends and girlfriends search for on Google

Google suggest has been the source of many a joke so far (and there are even collections of those available) and I wondered how hard it would be to write a comparison tool.

Turns out it is dead easy, see it here:

Google suggest comparisons - compare the results of two Google suggest searches by  you.

The first step was to find the right API. A Google search for “Google Suggest API” gave me this old blog post on blogoscoped which told me about the endpoint for an XML output of the Google Suggest:

http://google.com/complete/search?output=toolbar&q=chuck+norris

The result is very straight forward XML:






[... repeated 10 times with other data ...]

Now to get more than one I could have used two calls, but why do that when you have YQL?

select * from xml where url in (
‘http://google.com/complete/search?output=toolbar&q=chuck+norris’,
‘http://google.com/complete/search?output=toolbar&q=steven+seagal’
)

Using this gives me a dataset with both results:









[... repeated with different data ...]






[... repeated with different data ...]


Good, that’s the data – now for the interface. Using YUI grids and the grids builder it is easy to build this:

Google suggest comparisons

Written by Chris Heilmann, using YQL and the unofficial Google Autosuggest API.

All that is left is the PHP:

query->results->toplevel[0]->CompleteSuggestion;

// if data came through, assemble the list to be shown
if(sizeof($res1)>0){
$out1 = '
    '; foreach($res1 as $r){ $out1 .= '
  • '.$r->suggestion->data.'
  • '; } $out1 .= '
'; // otherwise assemble an error message } else { $out1 = '
  • Error: No results found
'; } // do the same for the second set $res2 = $data->query->results->toplevel[1]->CompleteSuggestion; if(sizeof($res2)>0){ $out2 = '
    '; foreach($res2 as $r){ $out2 .= '
  • '.$r->suggestion->data.'
  • '; } $out2 .= '
'; } else { $out = '
  • Error: No results found
'; } // if no data was sent, say so... } else { $error = 'Please enter a search term for each box.'; } ?>

That’s it! Adding a lick of CSS and we have the final product.

Posted in General | 3 Comments »

Loading external content with Ajax using jQuery and YQL

Sunday, January 10th, 2010

Let’s solve the problem of loading external content (on other domains) with Ajax in jQuery. All the code you see here is available on GitHub and can be seen on this demo page so no need to copy and paste!

OK, Ajax with jQuery is very easy to do – like most solutions it is a few lines:

$(document).ready(function(){
  $('.ajaxtrigger').click(function(){
    $('#target').load('ajaxcontent.html');
  });
});

$(document).ready(function(){ $('.ajaxtrigger').click(function(){ $('#target').load('ajaxcontent.html'); }); });

Check out this simple and obtrusive Ajax demo to see what it does.

This will turn all elements with the class of ajaxtrigger into triggers to load “ajaxcontent.html” and display its contents in the element with the ID target.

This is terrible, as it most of the time means that people will use pointless links like “click me” with # as the href, but this is not the problem for today. I am working on a larger article with all the goodies about Ajax usability and accessibility.

However, to make this more re-usable we could do the following:

$(document).ready(function(){
  $('.ajaxtrigger').click(function(){
    $('#target').load($(this).attr('href'));
    return false;
  });
});

$(document).ready(function(){ $('.ajaxtrigger').click(function(){ $('#target').load($(this).attr('href')); return false; }); });

You can then use load some content to load the content and you make the whole thing re-usable.

Check out this more reusable Ajax demo to see what it does.

The issue I wanted to find a nice solution for is the one that happens when you click on the second link in the demo: loading external files fails as Ajax doesn’t allow for cross-domain loading of content. This means that see my portfolio will fail to load the Ajax content and fail silently at that. You can click the link until you are blue in the face but nothing happens. A dirty hack to avoid this is just allowing the browser to load the document if somebody really tries to load an external link.

Check out this allowing external links to be followed to see what it does.

$(document).ready(function(){
  $('.ajaxtrigger').click(function(){
    var url = $(this).attr('href');
    if(url.match('^http')){
      return true;
    } else {
      $('#target').load(url);
      return false;
    }
  });
});

$(document).ready(function(){ $('.ajaxtrigger').click(function(){ var url = $(this).attr('href'); if(url.match('^http')){ return true; } else { $('#target').load(url); return false; } }); });

Proxying with PHP

If you look around the web you will find the solution in most of the cases to be PHP proxy scripts (or any other language). Something using cURL could be for example proxy.php:

<?php
$url = $_GET['url'];
$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL, $url); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
$output = curl_exec($ch); 
curl_close($ch);
echo $content;
?>

<?php $url = $_GET['url']; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); $output = curl_exec($ch); curl_close($ch); echo $content; ?>

People then could use this with a slightly changed script (using a proxy):

$(document).ready(function(){
  $('.ajaxtrigger').click(function(){
    var url = $(this).attr('href');
    if(url.match('^http')){
      url = 'proxy.php?url=' + url;
    }
    $('#target').load(url);
    return false;
  });
});

$(document).ready(function(){ $('.ajaxtrigger').click(function(){ var url = $(this).attr('href'); if(url.match('^http')){ url = 'proxy.php?url=' + url; } $('#target').load(url); return false; }); });

It is also a spectacularly stupid idea to have a proxy script like that. The reason is that without filtering people can use this to load any document of your server and display it in the page (simply use firebug to rename the link to show anything on your server), they can use it to inject a mass-mailer script into your document or simply use this to redirect to any other web resource and make it look like your server was the one that sent it. It is spammer’s heaven.

Use a white-listing and filtering proxy!

So if you want to use a proxy, make sure to white-list the allowed URIs. Furthermore it is a good plan to get rid of everything but the body of the other HTML document. Another good idea is to filter out scripts. This prevents display glitches and scripts you don’t want executed on your site to get executed.

Something like this:

<?php
$url = $_GET['url'];
$allowedurls = array(
  'http://developer.yahoo.com',
  'http://icant.co.uk'
);
if(in_array($url,$allowedurls)){
  $ch = curl_init(); 
  curl_setopt($ch, CURLOPT_URL, $url); 
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
  $output = curl_exec($ch); 
  curl_close($ch);
  $content = preg_replace('/.*<body[^>]*>/msi','',$output);
  $content = preg_replace('/<\/body>.*/msi','',$content);
  $content = preg_replace('/<?\/body[^>]*>/msi','',$content);
  $content = preg_replace('/[\r|\n]+/msi','',$content);
  $content = preg_replace('/<--[\S\s]*?-->/msi','',$content);
  $content = preg_replace('/<noscript[^>]*>[\S\s]*?'.
                          '<\/noscript>/msi',
                          '',$content);
  $content = preg_replace('/<script[^>]*>[\S\s]*?<\/script>/msi',
                          '',$content);
  $content = preg_replace('/<script.*\/>/msi','',$content);
  echo $content;
} else {
  echo 'Error: URL not allowed to load here.';
}
?>

<?php $url = $_GET['url']; $allowedurls = array( 'http://developer.yahoo.com', 'http://icant.co.uk' ); if(in_array($url,$allowedurls)){ $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); $output = curl_exec($ch); curl_close($ch); $content = preg_replace('/.*<body[^>]*>/msi','',$output); $content = preg_replace('/<\/body>.*/msi','',$content); $content = preg_replace('/<?\/body[^>]*>/msi','',$content); $content = preg_replace('/[\r|\n]+/msi','',$content); $content = preg_replace('/<--[\S\s]*?-->/msi','',$content); $content = preg_replace('/<noscript[^>]*>[\S\s]*?'. '<\/noscript>/msi', '',$content); $content = preg_replace('/<script[^>]*>[\S\s]*?<\/script>/msi', '',$content); $content = preg_replace('/<script.*\/>/msi','',$content); echo $content; } else { echo 'Error: URL not allowed to load here.'; } ?>

Pure JavaScript solution using YQL

But what if you have no server access or you want to stay in JavaScript? Not to worry – it can be done. YQL allows you to load any HTML document and get it back in JSON. As jQuery has a nice interface to load JSON, this can be used together to achieve what we want to.

Getting HTML from YQL is as easy as using:

select * from html where url="http://icant.co.uk"

select * from html where url="http://icant.co.uk"

YQL does a few things extra for us:

  • It loads the HTML document and sanitizes it
  • It runs the HTML document through HTML Tidy to remove things .NETnasty frameworks considered markup.
  • It caches the HTML for a while
  • It only returns the body content of the HTML - so no styling (other than inline styles) will get through.

As output formats you can choose XML or JSON. If you define a callback parameter for JSON you get JSON-P with all the HTML as a JavaScript Object – not fun to re-assemble:

foo({
  "query":{
  "count":"1",
  "created":"2010-01-10T07:51:43Z",
  "lang":"en-US",
  "updated":"2010-01-10T07:51:43Z",
  "uri":"http://query.yahoo[...whatever...]k%22",
  "results":{
    "body":{
      "div":{
        "id":"doc2",
        "div":[{"id":"hd",
          "h1":"icant.co.uk - everything Christian Heilmann"
        },
        {"id":"bd",
        "div":[
        {"div":[{"h2":"About this and me","
        [... and so on...]
}}}}}}}});

foo({ "query":{ "count":"1", "created":"2010-01-10T07:51:43Z", "lang":"en-US", "updated":"2010-01-10T07:51:43Z", "uri":"http://query.yahoo[...whatever...]k%22", "results":{ "body":{ "div":{ "id":"doc2", "div":[{"id":"hd", "h1":"icant.co.uk - everything Christian Heilmann" }, {"id":"bd", "div":[ {"div":[{"h2":"About this and me"," [... and so on...] }}}}}}}});

When you define a callback with the XML output you get a function call with the HTML data as string in an Array – much easier:

foo({
  "query":{
  "count":"1",
  "created":"2010-01-10T07:47:40Z",
  "lang":"en-US",
  "updated":"2010-01-10T07:47:40Z",
  "uri":"http://query.y[...who cares...]%22"},
  "results":[
    "<body>\n    <div id=\"doc2\">\n<div id=\"hd\">\n 
     <h1>icant.co.uk - \n
     everything Christian Heilmann<\/h1>\n 
      ... and so on ..."
  ]
});

foo({ "query":{ "count":"1", "created":"2010-01-10T07:47:40Z", "lang":"en-US", "updated":"2010-01-10T07:47:40Z", "uri":"http://query.y[...who cares...]%22"}, "results":[ "<body>\n <div id=\"doc2\">\n<div id=\"hd\">\n <h1>icant.co.uk - \n everything Christian Heilmann<\/h1>\n ... and so on ..." ] });

Using jQuery’s getJSON() method and accessing the YQL endpoint this is easy to implement:

$.getJSON("http://query.yahooapis.com/v1/public/yql?"+
          "q=select%20*%20from%20html%20where%20url%3D%22"+
          encodeURIComponent(url)+
          "%22&format=xml'&callback=?",
  function(data){
    if(data.results[0]){
      var data = filterData(data.results[0]);
      container.html(data);
    } else {
      var errormsg = '<p>Error: can't load the page.</p>';
      container.html(errormsg);
    }
  }
);

$.getJSON("http://query.yahooapis.com/v1/public/yql?"+ "q=select%20*%20from%20html%20where%20url%3D%22"+ encodeURIComponent(url)+ "%22&format=xml'&callback=?", function(data){ if(data.results[0]){ var data = filterData(data.results[0]); container.html(data); } else { var errormsg = '<p>Error: can't load the page.</p>'; container.html(errormsg); } } );

Putting it all together you have a cross-domain Ajax solution with jQuery and YQL:

$(document).ready(function(){
  var container = $('#target');
  $('.ajaxtrigger').click(function(){
    doAjax($(this).attr('href'));
    return false;
  });
  function doAjax(url){
    // if it is an external URI
    if(url.match('^http')){
      // call YQL
      $.getJSON("http://query.yahooapis.com/v1/public/yql?"+
                "q=select%20*%20from%20html%20where%20url%3D%22"+
                encodeURIComponent(url)+
                "%22&format=xml'&callback=?",
        // this function gets the data from the successful 
        // JSON-P call
        function(data){
          // if there is data, filter it and render it out
          if(data.results[0]){
            var data = filterData(data.results[0]);
            container.html(data);
          // otherwise tell the world that something went wrong
          } else {
            var errormsg = '<p>Error: can't load the page.</p>';
            container.html(errormsg);
          }
        }
      );
    // if it is not an external URI, use Ajax load()
    } else {
      $('#target').load(url);
    }
  }
  // filter out some nasties
  function filterData(data){
    data = data.replace(/<?\/body[^>]*>/g,'');
    data = data.replace(/[\r|\n]+/g,'');
    data = data.replace(/<--[\S\s]*?-->/g,'');
    data = data.replace(/<noscript[^>]*>[\S\s]*?<\/noscript>/g,'');
    data = data.replace(/<script[^>]*>[\S\s]*?<\/script>/g,'');
    data = data.replace(/<script.*\/>/,'');
    return data;
  }
});

$(document).ready(function(){ var container = $('#target'); $('.ajaxtrigger').click(function(){ doAjax($(this).attr('href')); return false; }); function doAjax(url){ // if it is an external URI if(url.match('^http')){ // call YQL $.getJSON("http://query.yahooapis.com/v1/public/yql?"+ "q=select%20*%20from%20html%20where%20url%3D%22"+ encodeURIComponent(url)+ "%22&format=xml'&callback=?", // this function gets the data from the successful // JSON-P call function(data){ // if there is data, filter it and render it out if(data.results[0]){ var data = filterData(data.results[0]); container.html(data); // otherwise tell the world that something went wrong } else { var errormsg = '<p>Error: can't load the page.</p>'; container.html(errormsg); } } ); // if it is not an external URI, use Ajax load() } else { $('#target').load(url); } } // filter out some nasties function filterData(data){ data = data.replace(/<?\/body[^>]*>/g,''); data = data.replace(/[\r|\n]+/g,''); data = data.replace(/<--[\S\s]*?-->/g,''); data = data.replace(/<noscript[^>]*>[\S\s]*?<\/noscript>/g,''); data = data.replace(/<script[^>]*>[\S\s]*?<\/script>/g,''); data = data.replace(/<script.*\/>/,''); return data; } });

This is rough and ready of course. A real Ajax solution should also consider timeout and not found scenarios. Check out the full version with loading indicators, error handling and yellow fade for inspiration.

Tags: ajax, crossdomain, javascript, jquery, php, proxy, yql
Posted in General | 29 Comments »

Cleaning up the “CSS only sexy bookmark” demo code

Friday, January 8th, 2010

Going through my Google Reader I stumbled upon an article today called Sexy bookmark like effect in pure CSS. Normally when I hear “pure CSS” I skip as 99% of these solutions don’t work with a keyboard and are thus a bad idea to use on the web. However, this one intrigued me as I had no clue what a “sexy bookmark like effect” might be.

Turns out it was not a porn bookmark but one of those “share this with the social media” link-bars you have below blog posts to help the copy and paste challenged people out there:

Link menu with different social media options

OK, that can’t be that problematic. The trick to do a list of links as a cool rollover using CSS only has been done in 2004 in the first CSS sprites article. Notice that the links are in a list.

Now, the new article’s HTML is the following:

There are a few things wrong with this in my book:

  • There is no semantic structure of what is going on here. Line breaks do not mean anything to HTML so in essence this is a list of links without any separation. Imagine sending three links to a friend in an email or putting them on a page. Would you do it like this: GoogleI can has cheezburgerb3taSpotify ? looks confusing to me…
  • There is no content in the links – when CSS is turned off you find nothing whatsoever.
  • There is quite a repetion of classes there. When every element in another element has the same class in it then something fishy is going on. Unless this class is used as a handle for – let’s say a microformat, you can get rid of it and use the cascade in CSS. So in this case you can style all the links with .sharing-cl a{} and get rid of the repeated classes.
  • A navigation is a structured thing, so instead of a div with links in it, how about using a list? This way when CSS is off, this still makes sense.

So here’s my replacement:

  • email
  • feed
  • twitter
  • facebook
  • stumbleupon
  • digg

Of course you should replace the empty href attributes with the real links.

Normally I’d use IDs instead of classes, but as this bar might be used several times in the document, let’s leave it like it is.

The HTML now is 318 bytes instead of 294 which is a slight increase. But:

  • It makes sense without CSS
  • It is well-structured and makes sense even to screen readers
  • The links make sense as they say where they are pointing to.

Let’s check on the CSS:

.sharing-cl{
}
.sharing-cl a{
display:block;
width:75px;
height:30px;
float:left;
}
.sharing-cl .share-sprite{
background:url(http://webdeveloperjuice.com/demos/images/share-sprite.png) no-repeat}
.sharing-cl .sh-su{
margin-right:5px;
background-position:-210px -40px;
}
.sharing-cl .sh-feed{
margin-right:5px;
background-position:-70px -40px;
}
.sharing-cl .sh-tweet{
margin-right:5px;
background-position:-140px -40px;
}
.sharing-cl .sh-mail{
margin-right:5px;
background-position:0 -40px;
}
.sharing-cl .sh-digg{
margin-right:5px;
background-position:-280px -40px;
}
.sharing-cl .sh-face{
background-position:-350px -40px;
}
.sharing-cl .sh-mail:hover{
margin-right:5px;
background-position:0 1px;
}
.sharing-cl .sh-feed:hover{
margin-right:5px;
background-position:-70px 1px;
}
.sharing-cl .sh-tweet:hover{
margin-right:5px;
background-position:-140px 1px;
}
.sharing-cl .sh-su:hover{
margin-right:5px;
background-position:-210px 1px;
}
.sharing-cl .sh-digg:hover{
margin-right:5px;
background-position:-280px 1px;
}
.sharing-cl .sh-face:hover{
background-position:-350px 1px;
}

So here we have a lot of repetition. You also see where the share-sprite class comes in: if you wanted to add an element to that section that is a link but has no image background you just leave out the class. This, however is exactly the wrong approach to CSS. We can assume that every link in this construct gets the background image, which is why it makes more sense to apply the image to the a element with .sharing-cl a{}. As every link has a class you can easily override this as the “odd one out” with for example .sharing-cl a.plain{}.

The same applies to the margin-right:5px. If that is applied to all the links but one, don’t define it for all the others and leave it out at the “odd one out”. Instead, only apply it to the odd one out and save a lot of code.

Final CSS:

.sharing-cl{
overflow:hidden;
margin:0;
padding:0;
list-style:none;
}
.sharing-cl a{
overflow:hidden;
width:75px;
height:30px;
float:left;
margin-right:5px;
text-indent:-300px;
}
.sharing-cl a{
background:url(http://webdeveloperjuice.com/demos/images/share-sprite.png) no-repeat;
}
a.sh-su{background-position:-210px -40px;}
a.sh-feed{background-position:-70px -40px;}
a.sh-tweet{background-position:-140px -40px;}
a.sh-mail{background-position:0 -40px;}
a.sh-digg{background-position:-280px -40px;}
a.sh-face{
margin-right:0;
background-position:-350px -40px;
}
a.sh-mail:hover{background-position:0 1px;}
a.sh-feed:hover{background-position:-70px 1px;}
a.sh-tweet:hover{background-position:-140px 1px;}
a.sh-su:hover{background-position:-210px 1px;}
.sh-digg:hover{background-position:-280px 1px;}
a.sh-face:hover{
margin-right:0;
background-position:-350px 1px;
}

From 1028 bytes down to 880. Just by understanding how CSS works and how the cascade can be used to your advantage. I would have loved to get rid of the a selectors, too, but they are needed for specificity. Notice the overflow on the main selector – this fixes the issue of the floats not being cleared in the original CSS. By using negative text-indent we get rid of the text being displayed, too. Personally I think this is bad and you should try to show the text as you cannot expect end users to know all these icons.

For example:

#text{
margin-top:3em;
font-weight:bold;
font-family:helvetica,arial,sans-serif;
}
#text a{
text-indent:0;
height:auto;
text-align:center;
font-size:11px;
padding-top:35px;
color:#999;
text-decoration:none;
}

You can see the solution in action here:

Sharing bar - cleaned up by  you.

To me, praising “CSS only solutions” is not enough – if you really love CSS and see it as a better solution than JavaScript then you should also show how people can use its features to create smart, short and flexible code.

Tags: accessibility, bestpractice, cascade, code, css, tutorial, usability
Posted in General | 13 Comments »

The Table of Contents script – my old nemesis

Wednesday, January 6th, 2010

One thing I like about – let me rephrase that – one of the amazingly few things that I like about Microsoft Word is that you can generate a Table of Contents from a document. Word would go through the headings and create a nested TOC from them for you:

Adding a TOC to a Word Document

Microsoft Word generated Table of Contents.

Now, I always like to do that for documents I write in HTML, too, but maintaining them by hand is a pain. I normally write my document outline first:

<h1 id="cute">Cute things on the Interwebs</h1>
<h2 id="rabbits">Rabbits</h2>
<h2 id="puppies">Puppies</h2>
<h3 id="labs">Labradors</h3>
<h3 id="alsatians">Alsatians</h3>
<h3 id="corgies">Corgies</h3>
<h3 id="retrievers">Retrievers</h3>
<h2 id="kittens">Kittens</h2>
<h2 id="gerbils">Gerbils</h2>
<h2 id="ducklings">Ducklings</h2>

<h1 id="cute">Cute things on the Interwebs</h1> <h2 id="rabbits">Rabbits</h2> <h2 id="puppies">Puppies</h2> <h3 id="labs">Labradors</h3> <h3 id="alsatians">Alsatians</h3> <h3 id="corgies">Corgies</h3> <h3 id="retrievers">Retrievers</h3> <h2 id="kittens">Kittens</h2> <h2 id="gerbils">Gerbils</h2> <h2 id="ducklings">Ducklings</h2>

I then collect those, copy and paste them and use search and replace to turn all the hn to links and the IDs to fragment identifiers:

<li><a href="#cute">Cute things on the Interwebs</a></li>
<li><a href="#rabbits">Rabbits</a></li>
<li><a href="#puppies">Puppies</a></li>
<li><a href="#labs">Labradors</a></li>
<li><a href="#alsatians">Alsatians</a></li>
<li><a href="#corgies">Corgies</a></li>
<li><a href="#retrievers">Retrievers</a></li>
<li><a href="#kittens">Kittens</a></li>
<li><a href="#gerbils">Gerbils</a></li>
<li><a href="#ducklings">Ducklings</a></li>
 
<h1 id="cute">Cute things on the Interwebs</h1>
<h2 id="rabbits">Rabbits</h2>
<h2 id="puppies">Puppies</h2>
<h3 id="labs">Labradors</h3>
<h3 id="alsatians">Alsatians</h3>
<h3 id="corgies">Corgies</h3>
<h3 id="retrievers">Retrievers</h3>
<h2 id="kittens">Kittens</h2>
<h2 id="gerbils">Gerbils</h2>
<h2 id="ducklings">Ducklings</h2>

<li><a href="#cute">Cute things on the Interwebs</a></li> <li><a href="#rabbits">Rabbits</a></li> <li><a href="#puppies">Puppies</a></li> <li><a href="#labs">Labradors</a></li> <li><a href="#alsatians">Alsatians</a></li> <li><a href="#corgies">Corgies</a></li> <li><a href="#retrievers">Retrievers</a></li> <li><a href="#kittens">Kittens</a></li> <li><a href="#gerbils">Gerbils</a></li> <li><a href="#ducklings">Ducklings</a></li> <h1 id="cute">Cute things on the Interwebs</h1> <h2 id="rabbits">Rabbits</h2> <h2 id="puppies">Puppies</h2> <h3 id="labs">Labradors</h3> <h3 id="alsatians">Alsatians</h3> <h3 id="corgies">Corgies</h3> <h3 id="retrievers">Retrievers</h3> <h2 id="kittens">Kittens</h2> <h2 id="gerbils">Gerbils</h2> <h2 id="ducklings">Ducklings</h2>

Then I need to look at the weight and order of the headings and add the nesting of the TOC list accordingly.

<ul>
  <li><a href="#cute">Cute things on the Interwebs</a>
    <ul>
      <li><a href="#rabbits">Rabbits</a></li>
      <li><a href="#puppies">Puppies</a>
        <ul>
          <li><a href="#labs">Labradors</a></li>
          <li><a href="#alsatians">Alsatians</a></li>
          <li><a href="#corgies">Corgies</a></li>
          <li><a href="#retrievers">Retrievers</a></li>
        </ul>
      </li>
      <li><a href="#kittens">Kittens</a></li>
      <li><a href="#gerbils">Gerbils</a></li>
      <li><a href="#ducklings">Ducklings</a></li>
    </ul>
  </li>
</ul>
 
<h1 id="cute">Cute things on the Interwebs</h1>
<h2 id="rabbits">Rabbits</h2>
<h2 id="puppies">Puppies</h2>
<h3 id="labs">Labradors</h3>
<h3 id="alsatians">Alsatians</h3>
<h3 id="corgies">Corgies</h3>
<h3 id="retrievers">Retrievers</h3>
<h2 id="kittens">Kittens</h2>
<h2 id="gerbils">Gerbils</h2>
<h2 id="ducklings">Ducklings</h2>

<ul> <li><a href="#cute">Cute things on the Interwebs</a> <ul> <li><a href="#rabbits">Rabbits</a></li> <li><a href="#puppies">Puppies</a> <ul> <li><a href="#labs">Labradors</a></li> <li><a href="#alsatians">Alsatians</a></li> <li><a href="#corgies">Corgies</a></li> <li><a href="#retrievers">Retrievers</a></li> </ul> </li> <li><a href="#kittens">Kittens</a></li> <li><a href="#gerbils">Gerbils</a></li> <li><a href="#ducklings">Ducklings</a></li> </ul> </li> </ul> <h1 id="cute">Cute things on the Interwebs</h1> <h2 id="rabbits">Rabbits</h2> <h2 id="puppies">Puppies</h2> <h3 id="labs">Labradors</h3> <h3 id="alsatians">Alsatians</h3> <h3 id="corgies">Corgies</h3> <h3 id="retrievers">Retrievers</h3> <h2 id="kittens">Kittens</h2> <h2 id="gerbils">Gerbils</h2> <h2 id="ducklings">Ducklings</h2>

Now, wouldn’t it be nice to have that done automatically for me? The way to do that in JavaScript and DOM is actually a much trickier problem than it looks like at first sight (I always love to ask this as an interview question or in DOM scripting workshops).

Here are some of the issues to consider:

  • You can easily get elements with getElementsByTagName() but you can’t do a getElementsByTagName('h*') sadly enough.
  • Headings in XHTML and HTML 4 do not have the elements they apply to as child elements (XHTML2 was proposing that and HTML5 has it to a degree – Bruce Lawson write a nice post about this and there’s also a pretty nifty HTML5 outliner available).
  • You can do a getElementsByTagName() for each of the heading levels and then concatenate a collection of all of them. However, that does not give you their order in the source of the document.
  • To this end PPK wrote an infamous TOC script used on his site a long time ago using his getElementsByTagNames() function which works with things not every browser supports. Therefore it doesn’t quite do the job either. He also “cheats” at the assembly of the TOC list as he adds classes to indent them visually rather than really nesting lists.
  • It seems that the only way to achieve this for all the browsers using the DOM is painful: do a getElementsByTagName('*') and walk the whole DOM tree, comparing nodeName and getting the headings that way.
  • Another solution I thought of reads the innerHTML of the document body and then uses regular expressions to match the headings.
  • As you cannot assume that every heading has an ID we need to add one if needed.

So here are some solutions to that problem:

Using the DOM:

(function(){
  var headings = [];
  var herxp = /h\d/i;
  var count = 0;
  var elms = document.getElementsByTagName('*');
  for(var i=0,j=elms.length;i<j;i++){
    var cur = elms[i];
    var id = cur.id;
    if(herxp.test(cur.nodeName)){
      if(cur.id===''){
        id = 'head'+count;
        cur.id = id;
        count++;
      }
      headings.push(cur);
    }
  }
  var out = '<ul>';
  for(i=0,j=headings.length;i<j;i++){
    var weight = headings[i].nodeName.substr(1,1);
    if(weight > oldweight){
      out += '<ul>'; 
    }
    out += '<li><a href="#'+headings[i].id+'">'+
           headings[i].innerHTML+'</a>';
    if(headings[i+1]){
      var nextweight = headings[i+1].nodeName.substr(1,1);
      if(weight > nextweight){
        out+='</li></ul></li>'; 
      }
      if(weight == nextweight){
        out+='</li>'; 
      }
    }
    var oldweight = weight;
  }
  out += '</li></ul>';
  document.getElementById('toc').innerHTML = out;
})();

(function(){ var headings = []; var herxp = /h\d/i; var count = 0; var elms = document.getElementsByTagName('*'); for(var i=0,j=elms.length;i<j;i++){ var cur = elms[i]; var id = cur.id; if(herxp.test(cur.nodeName)){ if(cur.id===''){ id = 'head'+count; cur.id = id; count++; } headings.push(cur); } } var out = '<ul>'; for(i=0,j=headings.length;i<j;i++){ var weight = headings[i].nodeName.substr(1,1); if(weight > oldweight){ out += '<ul>'; } out += '<li><a href="#'+headings[i].id+'">'+ headings[i].innerHTML+'</a>'; if(headings[i+1]){ var nextweight = headings[i+1].nodeName.substr(1,1); if(weight > nextweight){ out+='</li></ul></li>'; } if(weight == nextweight){ out+='</li>'; } } var oldweight = weight; } out += '</li></ul>'; document.getElementById('toc').innerHTML = out; })();

You can see the DOM solution in action here. The problem with it is that it can become very slow on large documents and in MSIE6.

The regular expressions solution

To work around the need to traverse the whole DOM, I thought it might be a good idea to use regular expressions on the innerHTML of the DOM and write it back once I added the IDs and assembled the TOC:

(function(){
  var bd = document.body,
      x = bd.innerHTML,
      headings = x.match(/<h\d[^>]*>[\S\s]*?<\/h\d>$/mg),
      r1 = />/,
      r2 = /<(\/)?h(\d)/g,
      toc = document.createElement('div'),
      out = '<ul>',
      i = 0,
      j = headings.length,
      cur = '',
      weight = 0,
      nextweight = 0,
      oldweight = 2,
      container = bd;
  for(i=0;i<j;i++){
    if(headings[i].indexOf('id=')==-1){
      cur = headings[i].replace(r1,' id="h'+i+'">');
      x = x.replace(headings[i],cur);
    } else {
      cur = headings[i];
    }
    weight = cur.substr(2,1);
    if(i<j-1){
      nextweight = headings[i+1].substr(2,1);
    }
    var a = cur.replace(r2,'<$1a');
    a = a.replace('id="','href="#');
    if(weight>oldweight){ out+='<ul>'; }
    out+='<li>'+a;
    if(nextweight<weight){ out+='</li></ul></li>'; }
    if(nextweight==weight){ out+='</li>'; }
    oldweight = weight;
  }
  bd.innerHTML = x;
  toc.innerHTML = out +'</li></ul>';
  container = document.getElementById('toc') || bd;
  container.appendChild(toc);
})();

(function(){ var bd = document.body, x = bd.innerHTML, headings = x.match(/<h\d[^>]*>[\S\s]*?<\/h\d>$/mg), r1 = />/, r2 = /<(\/)?h(\d)/g, toc = document.createElement('div'), out = '<ul>', i = 0, j = headings.length, cur = '', weight = 0, nextweight = 0, oldweight = 2, container = bd; for(i=0;i<j;i++){ if(headings[i].indexOf('id=')==-1){ cur = headings[i].replace(r1,' id="h'+i+'">'); x = x.replace(headings[i],cur); } else { cur = headings[i]; } weight = cur.substr(2,1); if(i<j-1){ nextweight = headings[i+1].substr(2,1); } var a = cur.replace(r2,'<$1a'); a = a.replace('id="','href="#'); if(weight>oldweight){ out+='<ul>'; } out+='<li>'+a; if(nextweight<weight){ out+='</li></ul></li>'; } if(nextweight==weight){ out+='</li>'; } oldweight = weight; } bd.innerHTML = x; toc.innerHTML = out +'</li></ul>'; container = document.getElementById('toc') || bd; container.appendChild(toc); })();

You can see the regular expressions solution in action here. The problem with it is that reading innerHTML and then writing it out might be expensive (this needs testing) and if you have event handling attached to elements it might leak memory as my colleage Matt Jones pointed out (again, this needs testing). Ara Pehlivavian also mentioned that a mix of both approaches might be better – match the headings but don’t write back the innerHTML – instead use DOM to add the IDs.

Libraries to the rescue – a YUI3 example

Talking to another colleague – Dav Glass – about the TOC problem he pointed out that the YUI3 selector engine happily takes a list of elements and returns them in the right order. This makes things very easy:

<script type="text/javascript" src="http://yui.yahooapis.com/3.0.0/build/yui/yui-min.js"></script>
<script>
YUI({combine: true, timeout: 10000}).use("node", function(Y) {
  var nodes = Y.all('h1,h2,h3,h4,h5,h6');
  var out = '<ul>';
  var weight = 0,nextweight = 0,oldweight;
  nodes.each(function(o,k){
    var id = o.get('id');
    if(id === ''){
      id = 'head' + k;
      o.set('id',id);
    };
    weight = o.get('nodeName').substr(1,1);
    if(weight > oldweight){ out+='<ul>'; }
    out+='<li><a href="#'+o.get('id')+'">'+o.get('innerHTML')+'</a>';
    if(nodes.item(k+1)){
      nextweight = nodes.item(k+1).get('nodeName').substr(1,1);
      if(weight > nextweight){ out+='</li></ul></li>'; }
      if(weight == nextweight){ out+='</li>'; }
    }
    oldweight = weight;
  });
  out+='</li></ul>';
  Y.one('#toc').set('innerHTML',out);
});</script>

<script type="text/javascript" src="http://yui.yahooapis.com/3.0.0/build/yui/yui-min.js"></script> <script> YUI({combine: true, timeout: 10000}).use("node", function(Y) { var nodes = Y.all('h1,h2,h3,h4,h5,h6'); var out = '<ul>'; var weight = 0,nextweight = 0,oldweight; nodes.each(function(o,k){ var id = o.get('id'); if(id === ''){ id = 'head' + k; o.set('id',id); }; weight = o.get('nodeName').substr(1,1); if(weight > oldweight){ out+='<ul>'; } out+='<li><a href="#'+o.get('id')+'">'+o.get('innerHTML')+'</a>'; if(nodes.item(k+1)){ nextweight = nodes.item(k+1).get('nodeName').substr(1,1); if(weight > nextweight){ out+='</li></ul></li>'; } if(weight == nextweight){ out+='</li>'; } } oldweight = weight; }); out+='</li></ul>'; Y.one('#toc').set('innerHTML',out); });</script>

There is probably a cleaner way to assemble the TOC list.

Performance considerations

There is more to life than simply increasing its speed. – Gandhi

Some of the code above can be very slow. That said, whenever we talk about performance and JavaScript, it is important to consider the context of the implementation: a table of contents script would normally be used on a text-heavy, but simple, document. There is no point in measuring and judging these scripts running them over gmail or the Yahoo homepage. That said, faster and less memory consuming is always better, but I am always a bit sceptic about performance tests that consider edge cases rather than the one the solution was meant to be applied to.

Moving server side.

The other thing I am getting more and more sceptic about are client side solutions for things that actually also make sense on the server. Therefore I thought I could use the regular expressions approach above and move it server side.

The first version is a PHP script you can loop an HTML document through. You can see the outcome of tocit.php here:

<?php
$file = $_GET['file'];
if(preg_match('/^[a-z0-9\-_\.]+$/i',$file)){
$content = file_get_contents($file);
preg_match_all("/<h([1-6])[^>]*>.*<\/h.>/Us",$content,$headlines);
$out = '<ul>';
foreach($headlines[0] as $k=>$h){
 if(strstr($h,'id')===false){
   $x = preg_replace('/>/',' id="head'.$k.'">',$h,1);
   $content = str_replace($h,$x,$content);
   $h = $x;
 };
 $link = preg_replace('/<(\/)?h\d/','<$1a',$h);
 $link = str_replace('id="','href="#',$link);
 if($k>0 && $headlines[1][$k-1]<$headlines[1][$k]){
   $out.='<ul>';
 }
 $out .= '<li>'.$link.'';
 if($headlines[1][$k+1] && $headlines[1][$k+1]<$headlines[1][$k]){
   $out.='</li></ul></li>';
 }
 if($headlines[1][$k+1] && $headlines[1][$k+1] == $headlines[1][$k]){
   $out.='</li>';
 }
}
$out.='</li></ul>';
echo str_replace('<div id="toc"></div>',$out,$content);
}else{
  die('only files like text.html please!');
}
?>

<?php $file = $_GET['file']; if(preg_match('/^[a-z0-9\-_\.]+$/i',$file)){ $content = file_get_contents($file); preg_match_all("/<h([1-6])[^>]*>.*<\/h.>/Us",$content,$headlines); $out = '<ul>'; foreach($headlines[0] as $k=>$h){ if(strstr($h,'id')===false){ $x = preg_replace('/>/',' id="head'.$k.'">',$h,1); $content = str_replace($h,$x,$content); $h = $x; }; $link = preg_replace('/<(\/)?h\d/','<$1a',$h); $link = str_replace('id="','href="#',$link); if($k>0 && $headlines[1][$k-1]<$headlines[1][$k]){ $out.='<ul>'; } $out .= '<li>'.$link.''; if($headlines[1][$k+1] && $headlines[1][$k+1]<$headlines[1][$k]){ $out.='</li></ul></li>'; } if($headlines[1][$k+1] && $headlines[1][$k+1] == $headlines[1][$k]){ $out.='</li>'; } } $out.='</li></ul>'; echo str_replace('<div id="toc"></div>',$out,$content); }else{ die('only files like text.html please!'); } ?>

This is nice, but instead of having another file to loop through, we can also use the output buffer of PHP:

<?php
function tocit($content){
  preg_match_all("/<h([1-6])[^>]*>.*<\/h.>/Us",$content,$headlines);
  $out = '<ul>';
  foreach($headlines[0] as $k=>$h){
   if(strstr($h,'id')===false){
     $x = preg_replace('/>/',' id="head'.$k.'">',$h,1);
     $content = str_replace($h,$x,$content);
     $h = $x;
   };
   $link = preg_replace('/<(\/)?h\d/','<$1a',$h);
   $link = str_replace('id="','href="#',$link);
   if($k>0 && $headlines[1][$k-1]<$headlines[1][$k]){
     $out.='<ul>';
   }
   $out .= '<li>'.$link.'';
   if($headlines[1][$k+1] && $headlines[1][$k+1]<$headlines[1][$k]){
     $out.='</li></ul></li>';
   }
   if($headlines[1][$k+1] && $headlines[1][$k+1] == $headlines[1][$k]){
     $out.='</li>';
   }
  }
  $out.='</li></ul>';
  return str_replace('<div id="toc"></div>',$out,$content);
}
ob_start("tocit");
?>
[... the document ...]
<?php ob_end_flush();?>

<?php function tocit($content){ preg_match_all("/<h([1-6])[^>]*>.*<\/h.>/Us",$content,$headlines); $out = '<ul>'; foreach($headlines[0] as $k=>$h){ if(strstr($h,'id')===false){ $x = preg_replace('/>/',' id="head'.$k.'">',$h,1); $content = str_replace($h,$x,$content); $h = $x; }; $link = preg_replace('/<(\/)?h\d/','<$1a',$h); $link = str_replace('id="','href="#',$link); if($k>0 && $headlines[1][$k-1]<$headlines[1][$k]){ $out.='<ul>'; } $out .= '<li>'.$link.''; if($headlines[1][$k+1] && $headlines[1][$k+1]<$headlines[1][$k]){ $out.='</li></ul></li>'; } if($headlines[1][$k+1] && $headlines[1][$k+1] == $headlines[1][$k]){ $out.='</li>'; } } $out.='</li></ul>'; return str_replace('<div id="toc"></div>',$out,$content); } ob_start("tocit"); ?> [... the document ...] <?php ob_end_flush();?>

The server side solutions have a few benefits: they always work, and you can also cache the result if needed for a while. I am sure the PHP can be sped up, though.

See all the solutions and get the source code

  • See all the solutions and get the source code on GitHub.

I showed you mine, now show me yours!

All of these solutions are pretty much rough and ready. What do you think how they can be improved? How about doing a version for different libraries? Go ahead, fork the project on GitHub and show me what you can do.

Tags: dom, generator, headings, HTML, javascript, outline, php, tableofcontents, toc, word, YUI3
Posted in General | 20 Comments »

16m Britons use the same password for every website – or do they?

Sunday, January 3rd, 2010

I am right now writing a primer on web security for a blog and doing my research on passwords I came across The Telegraph’s article Almost 16 million use same password for every website, study finds is actually full of cool figures and I was very tempted to use some quotes like:

The average internet user is asked for a password by 23 websites a month.
The research found 46 per cent of British internet users, 15.6 million, have the same password for most web-based accounts and five per cent, or 1.7 million, use the same password for every single website.

According to the Telegraph, the study was done by CPP:

This could lead to money being stolen from bank accounts, fraudulent purchases via online shops or identity theft, according to life assistance company CPP.

What puzzled me is that there is no link to be found on the CPP site. Their last press release is from November and a site search for password doesn’t yield any results.

The Telegraph does not list the source of the figures or where to see the original survey – actually this would mean the article would get deleted from Wikipedia!

It gets really interesting when you do a Google search for the same survery. You then find an article on based on data of chinaview.cn that reveals just how many people were asked in the survey:

More worrying was that of 1,661 Britons questioned, nearly 40 per cent of adults admitted that at least one other person knows their passwords, ranging from children, colleagues and friends. With phishing and smishing attacks, as well as malicious software attacks, on the rise, consumers and Internet users need to be more careful with their personal data.

I am all for scaling, but using 1661 people and multiplying that up to 16 million is a bit of stretch of the imagination, don’t you think? Seeing that the survey is from September also gives me the idea that there was a slow news day to cover. This is another annoyance as you cannot research what other news sites have said at that time as they delete content after 31 days. So much for “cool links never change”.

That said, I am happy that mainstream media is at least covering the topic of bad passwords. We can do a lot in security, but if end users still consider “password” or “letmein” a good idea as a password we are doomed.

I would love to see the CPP survey, and I’d also love to have a way to comment on The Telegraph. Alas…

Update As reported by marksteward on Twitter the Telegraph already reported about the survey in September – mentioning the 1661 number and there is a report on the CPP site talking about the survey in more detail – thanks!

Tags: cpp, maths, passwords, statistics, survey, telegraph
Posted in General | Comments Off on 16m Britons use the same password for every website – or do they?

  • < Older Entries
  • Newer Entries >
Skip to search
  • Christian Heilmann Avatar
  • About this
  • Archives
  • Codepo8 on GitHub
  • Chris Heilmann on BlueSky
  • Chris Heilmann on Mastodon
  • Chris Heilmann on YouTube
Christian Heilmann is the blog of Christian Heilmann chris@christianheilmann.com (Please do not contact me about guest posts, I don't do those!) a Principal Program Manager living and working in Berlin, Germany.

Theme by Chris Heilmann. SVG Icons by Dan Klammer . Hosted by MediaTemple. Powered by Coffee and Spotify Radio.

Get the feed, all the cool kids use RSS!