Christian Heilmann

You are currently browsing the Christian Heilmann blog archives for January, 2010.

Archive for January, 2010

The Table of Contents script – my old nemesis

Wednesday, January 6th, 2010

One thing I like about – let me rephrase that – one of the amazingly few things that I like about Microsoft Word is that you can generate a Table of Contents from a document. Word would go through the headings and create a nested TOC from them for you:

Adding a TOC to a Word Document

Microsoft Word generated Table of Contents.

Now, I always like to do that for documents I write in HTML, too, but maintaining them by hand is a pain. I normally write my document outline first:

<h1 id="cute">Cute things on the Interwebs</h1>
<h2 id="rabbits">Rabbits</h2>
<h2 id="puppies">Puppies</h2>
<h3 id="labs">Labradors</h3>
<h3 id="alsatians">Alsatians</h3>
<h3 id="corgies">Corgies</h3>
<h3 id="retrievers">Retrievers</h3>
<h2 id="kittens">Kittens</h2>
<h2 id="gerbils">Gerbils</h2>
<h2 id="ducklings">Ducklings</h2>

I then collect those, copy and paste them and use search and replace to turn all the hn to links and the IDs to fragment identifiers:

<li><a href="#cute">Cute things on the Interwebs</a></li>
<li><a href="#rabbits">Rabbits</a></li>
<li><a href="#puppies">Puppies</a></li>
<li><a href="#labs">Labradors</a></li>
<li><a href="#alsatians">Alsatians</a></li>
<li><a href="#corgies">Corgies</a></li>
<li><a href="#retrievers">Retrievers</a></li>
<li><a href="#kittens">Kittens</a></li>
<li><a href="#gerbils">Gerbils</a></li>
<li><a href="#ducklings">Ducklings</a></li>
 
<h1 id="cute">Cute things on the Interwebs</h1>
<h2 id="rabbits">Rabbits</h2>
<h2 id="puppies">Puppies</h2>
<h3 id="labs">Labradors</h3>
<h3 id="alsatians">Alsatians</h3>
<h3 id="corgies">Corgies</h3>
<h3 id="retrievers">Retrievers</h3>
<h2 id="kittens">Kittens</h2>
<h2 id="gerbils">Gerbils</h2>
<h2 id="ducklings">Ducklings</h2>

Then I need to look at the weight and order of the headings and add the nesting of the TOC list accordingly.

<ul>
  <li><a href="#cute">Cute things on the Interwebs</a>
    <ul>
      <li><a href="#rabbits">Rabbits</a></li>
      <li><a href="#puppies">Puppies</a>
        <ul>
          <li><a href="#labs">Labradors</a></li>
          <li><a href="#alsatians">Alsatians</a></li>
          <li><a href="#corgies">Corgies</a></li>
          <li><a href="#retrievers">Retrievers</a></li>
        </ul>
      </li>
      <li><a href="#kittens">Kittens</a></li>
      <li><a href="#gerbils">Gerbils</a></li>
      <li><a href="#ducklings">Ducklings</a></li>
    </ul>
  </li>
</ul>
 
<h1 id="cute">Cute things on the Interwebs</h1>
<h2 id="rabbits">Rabbits</h2>
<h2 id="puppies">Puppies</h2>
<h3 id="labs">Labradors</h3>
<h3 id="alsatians">Alsatians</h3>
<h3 id="corgies">Corgies</h3>
<h3 id="retrievers">Retrievers</h3>
<h2 id="kittens">Kittens</h2>
<h2 id="gerbils">Gerbils</h2>
<h2 id="ducklings">Ducklings</h2>

Now, wouldn’t it be nice to have that done automatically for me? The way to do that in JavaScript and DOM is actually a much trickier problem than it looks like at first sight (I always love to ask this as an interview question or in DOM scripting workshops).

Here are some of the issues to consider:

  • You can easily get elements with getElementsByTagName() but you can’t do a getElementsByTagName('h*') sadly enough.
  • Headings in XHTML and HTML 4 do not have the elements they apply to as child elements (XHTML2 was proposing that and HTML5 has it to a degree – Bruce Lawson write a nice post about this and there’s also a pretty nifty HTML5 outliner available).
  • You can do a getElementsByTagName() for each of the heading levels and then concatenate a collection of all of them. However, that does not give you their order in the source of the document.
  • To this end PPK wrote an infamous TOC script used on his site a long time ago using his getElementsByTagNames() function which works with things not every browser supports. Therefore it doesn’t quite do the job either. He also “cheats” at the assembly of the TOC list as he adds classes to indent them visually rather than really nesting lists.
  • It seems that the only way to achieve this for all the browsers using the DOM is painful: do a getElementsByTagName('*') and walk the whole DOM tree, comparing nodeName and getting the headings that way.
  • Another solution I thought of reads the innerHTML of the document body and then uses regular expressions to match the headings.
  • As you cannot assume that every heading has an ID we need to add one if needed.

So here are some solutions to that problem:

Using the DOM:

(function(){
  var headings = [];
  var herxp = /h\d/i;
  var count = 0;
  var elms = document.getElementsByTagName('*');
  for(var i=0,j=elms.length;i<j;i++){
    var cur = elms[i];
    var id = cur.id;
    if(herxp.test(cur.nodeName)){
      if(cur.id===''){
        id = 'head'+count;
        cur.id = id;
        count++;
      }
      headings.push(cur);
    }
  }
  var out = '<ul>';
  for(i=0,j=headings.length;i<j;i++){
    var weight = headings[i].nodeName.substr(1,1);
    if(weight > oldweight){
      out += '<ul>'; 
    }
    out += '<li><a href="#'+headings[i].id+'">'+
           headings[i].innerHTML+'</a>';
    if(headings[i+1]){
      var nextweight = headings[i+1].nodeName.substr(1,1);
      if(weight > nextweight){
        out+='</li></ul></li>'; 
      }
      if(weight == nextweight){
        out+='</li>'; 
      }
    }
    var oldweight = weight;
  }
  out += '</li></ul>';
  document.getElementById('toc').innerHTML = out;
})();

You can see the DOM solution in action here. The problem with it is that it can become very slow on large documents and in MSIE6.

The regular expressions solution

To work around the need to traverse the whole DOM, I thought it might be a good idea to use regular expressions on the innerHTML of the DOM and write it back once I added the IDs and assembled the TOC:

(function(){
  var bd = document.body,
      x = bd.innerHTML,
      headings = x.match(/<h\d[^>]*>[\S\s]*?<\/h\d>$/mg),
      r1 = />/,
      r2 = /<(\/)?h(\d)/g,
      toc = document.createElement('div'),
      out = '<ul>',
      i = 0,
      j = headings.length,
      cur = '',
      weight = 0,
      nextweight = 0,
      oldweight = 2,
      container = bd;
  for(i=0;i<j;i++){
    if(headings[i].indexOf('id=')==-1){
      cur = headings[i].replace(r1,' id="h'+i+'">');
      x = x.replace(headings[i],cur);
    } else {
      cur = headings[i];
    }
    weight = cur.substr(2,1);
    if(i<j-1){
      nextweight = headings[i+1].substr(2,1);
    }
    var a = cur.replace(r2,'<$1a');
    a = a.replace('id="','href="#');
    if(weight>oldweight){ out+='<ul>'; }
    out+='<li>'+a;
    if(nextweight<weight){ out+='</li></ul></li>'; }
    if(nextweight==weight){ out+='</li>'; }
    oldweight = weight;
  }
  bd.innerHTML = x;
  toc.innerHTML = out +'</li></ul>';
  container = document.getElementById('toc') || bd;
  container.appendChild(toc);
})();

You can see the regular expressions solution in action here. The problem with it is that reading innerHTML and then writing it out might be expensive (this needs testing) and if you have event handling attached to elements it might leak memory as my colleage Matt Jones pointed out (again, this needs testing). Ara Pehlivavian also mentioned that a mix of both approaches might be better – match the headings but don’t write back the innerHTML – instead use DOM to add the IDs.

Libraries to the rescue – a YUI3 example

Talking to another colleague – Dav Glass – about the TOC problem he pointed out that the YUI3 selector engine happily takes a list of elements and returns them in the right order. This makes things very easy:

<script type="text/javascript" src="http://yui.yahooapis.com/3.0.0/build/yui/yui-min.js"></script>
<script>
YUI({combine: true, timeout: 10000}).use("node", function(Y) {
  var nodes = Y.all('h1,h2,h3,h4,h5,h6');
  var out = '<ul>';
  var weight = 0,nextweight = 0,oldweight;
  nodes.each(function(o,k){
    var id = o.get('id');
    if(id === ''){
      id = 'head' + k;
      o.set('id',id);
    };
    weight = o.get('nodeName').substr(1,1);
    if(weight > oldweight){ out+='<ul>'; }
    out+='<li><a href="#'+o.get('id')+'">'+o.get('innerHTML')+'</a>';
    if(nodes.item(k+1)){
      nextweight = nodes.item(k+1).get('nodeName').substr(1,1);
      if(weight > nextweight){ out+='</li></ul></li>'; }
      if(weight == nextweight){ out+='</li>'; }
    }
    oldweight = weight;
  });
  out+='</li></ul>';
  Y.one('#toc').set('innerHTML',out);
});</script>

There is probably a cleaner way to assemble the TOC list.

Performance considerations

There is more to life than simply increasing its speed. – Gandhi

Some of the code above can be very slow. That said, whenever we talk about performance and JavaScript, it is important to consider the context of the implementation: a table of contents script would normally be used on a text-heavy, but simple, document. There is no point in measuring and judging these scripts running them over gmail or the Yahoo homepage. That said, faster and less memory consuming is always better, but I am always a bit sceptic about performance tests that consider edge cases rather than the one the solution was meant to be applied to.

Moving server side.

The other thing I am getting more and more sceptic about are client side solutions for things that actually also make sense on the server. Therefore I thought I could use the regular expressions approach above and move it server side.

The first version is a PHP script you can loop an HTML document through. You can see the outcome of tocit.php here:

<?php
$file = $_GET['file'];
if(preg_match('/^[a-z0-9\-_\.]+$/i',$file)){
$content = file_get_contents($file);
preg_match_all("/<h([1-6])[^>]*>.*<\/h.>/Us",$content,$headlines);
$out = '<ul>';
foreach($headlines[0] as $k=>$h){
 if(strstr($h,'id')===false){
   $x = preg_replace('/>/',' id="head'.$k.'">',$h,1);
   $content = str_replace($h,$x,$content);
   $h = $x;
 };
 $link = preg_replace('/<(\/)?h\d/','<$1a',$h);
 $link = str_replace('id="','href="#',$link);
 if($k>0 && $headlines[1][$k-1]<$headlines[1][$k]){
   $out.='<ul>';
 }
 $out .= '<li>'.$link.'';
 if($headlines[1][$k+1] && $headlines[1][$k+1]<$headlines[1][$k]){
   $out.='</li></ul></li>';
 }
 if($headlines[1][$k+1] && $headlines[1][$k+1] == $headlines[1][$k]){
   $out.='</li>';
 }
}
$out.='</li></ul>';
echo str_replace('<div id="toc"></div>',$out,$content);
}else{
  die('only files like text.html please!');
}
?>

This is nice, but instead of having another file to loop through, we can also use the output buffer of PHP:

<?php
function tocit($content){
  preg_match_all("/<h([1-6])[^>]*>.*<\/h.>/Us",$content,$headlines);
  $out = '<ul>';
  foreach($headlines[0] as $k=>$h){
   if(strstr($h,'id')===false){
     $x = preg_replace('/>/',' id="head'.$k.'">',$h,1);
     $content = str_replace($h,$x,$content);
     $h = $x;
   };
   $link = preg_replace('/<(\/)?h\d/','<$1a',$h);
   $link = str_replace('id="','href="#',$link);
   if($k>0 && $headlines[1][$k-1]<$headlines[1][$k]){
     $out.='<ul>';
   }
   $out .= '<li>'.$link.'';
   if($headlines[1][$k+1] && $headlines[1][$k+1]<$headlines[1][$k]){
     $out.='</li></ul></li>';
   }
   if($headlines[1][$k+1] && $headlines[1][$k+1] == $headlines[1][$k]){
     $out.='</li>';
   }
  }
  $out.='</li></ul>';
  return str_replace('<div id="toc"></div>',$out,$content);
}
ob_start("tocit");
?>
[... the document ...]
<?php ob_end_flush();?>

The server side solutions have a few benefits: they always work, and you can also cache the result if needed for a while. I am sure the PHP can be sped up, though.

See all the solutions and get the source code

I showed you mine, now show me yours!

All of these solutions are pretty much rough and ready. What do you think how they can be improved? How about doing a version for different libraries? Go ahead, fork the project on GitHub and show me what you can do.

16m Britons use the same password for every website – or do they?

Sunday, January 3rd, 2010

I am right now writing a primer on web security for a blog and doing my research on passwords I came across The Telegraph’s article Almost 16 million use same password for every website, study finds is actually full of cool figures and I was very tempted to use some quotes like:

The average internet user is asked for a password by 23 websites a month.
The research found 46 per cent of British internet users, 15.6 million, have the same password for most web-based accounts and five per cent, or 1.7 million, use the same password for every single website.

According to the Telegraph, the study was done by CPP:

This could lead to money being stolen from bank accounts, fraudulent purchases via online shops or identity theft, according to life assistance company CPP.

What puzzled me is that there is no link to be found on the CPP site. Their last press release is from November and a site search for password doesn’t yield any results.

The Telegraph does not list the source of the figures or where to see the original survey – actually this would mean the article would get deleted from Wikipedia!

It gets really interesting when you do a Google search for the same survery. You then find an article on based on data of chinaview.cn that reveals just how many people were asked in the survey:

More worrying was that of 1,661 Britons questioned, nearly 40 per cent of adults admitted that at least one other person knows their passwords, ranging from children, colleagues and friends. With phishing and smishing attacks, as well as malicious software attacks, on the rise, consumers and Internet users need to be more careful with their personal data.

I am all for scaling, but using 1661 people and multiplying that up to 16 million is a bit of stretch of the imagination, don’t you think? Seeing that the survey is from September also gives me the idea that there was a slow news day to cover. This is another annoyance as you cannot research what other news sites have said at that time as they delete content after 31 days. So much for “cool links never change”.

That said, I am happy that mainstream media is at least covering the topic of bad passwords. We can do a lot in security, but if end users still consider “password” or “letmein” a good idea as a password we are doomed.

I would love to see the CPP survey, and I’d also love to have a way to comment on The Telegraph. Alas…

Update As reported by marksteward on Twitter the Telegraph already reported about the survey in September – mentioning the 1661 number and there is a report on the CPP site talking about the survey in more detail – thanks!

How to write an article or tutorial the fast way

Saturday, January 2nd, 2010

As you know if you come here often, I am a very prolific writer and churn out blog posts and articles very quickly. Some people asked me how I do that – especially as they want to take part in Project 52.

Well, here is how I approach writing a new post/article:

Step 1: Find a problem to solve

Your article should solve an issue – either one you encountered yourself and always wanted to find a solution to on the web (this is how I started this blog) or something people ask on mailing lists, forums or Twitter.

Step 2: Research or code (or both)

The first step is the research of the topic you want to cover. When you write, you don’t want to get side-tracked by looking up web sites. Do your surfing, copy and paste the quotes and URLs, take the screenshots and all that jazz. Put them in a folder on your hard drive.

If your article is a code tutorial, code the whole thing and save it in different steps (plain_html.html, styled.html, script.html, final.html,final_with_docs.html). Do this step well – you will copy and paste part of the code into your article and when you find mistakes then you need to maintain it in two spots again. Make sure this code can be used by others and does not need anything only you can provide (for more tips check the write excellent code examples chapter of the developer evangelism handbook).

Step 3: Build the article outline

The next thing I do is write the outline of the article as weighted headlines (HTML, eh?). This has a few benefits.

  • You know what you will cover and it allows you to limit yourself to what is really needed.
  • You will know what follows what you are writing and already know what you don’t need to mention. I myself tend to get excited and want to say everything in the first few lines. This is bad as it doesn’t get the readers on a journey but overloads them instead.
  • You can estimate the size of the overall article
  • You can write the different parts independent of another. If you get stuck with one sub-topic, jump to one you know inside-out and get this out of the way.

It would look something like this:

Turning a nested list into a tree navigation

See the demo, download the code

Considering the audience

How do tree navigations work?

Allowing for styling

Accessibility concerns

Start with the minimal markup

Add styling

The dynamic CSS class switch

Add the script

Event delegation vs. Event handling

Adding a configuration file

Other options to consider

See it in action

Contact and comment options

Step 4: Fill in keywords for each section

For each of the sections just put in a list of keywords or topics you want to cover. This will help you to write the full text.

Turning a nested list into a tree navigation

See the demo, download the code

working demo, code on github

Considering the audience

who needs tree navigations? where are they used?

How do tree navigations work?

How does a tree navigation work? What features are common? How to allow expanding a sub-branch and keep a link to a landing page?

Allowing for styling

keep look and feel away from the script, write a clean css with background images.

Accessibility concerns

Consider keyboard access. cursor keys, tabbing not from link to link but section to section and enter to expand.

Start with the minimal markup

clean HTML, simple CSS handles, not a class per item

Add styling

show the style, explain how to alter it – show a few options

The dynamic CSS class switch

the trick to add a class to a parent element. allows for styles for the dynamic and non-dynamic version. Also prevents the need for looping

Add the script

Performance tricks, safe checking for elements, structure of the script

Event delegation vs. Event handling

One event is enough. Explain why – the menu will change as it will be maintained elsewhere.

Adding a configuration file

Take all the strings, colours and parameters and add it to a configuration file – stops people from messing with your code.

Other options to consider

Dynamic loading of child branches.

See it in action

Show again where it is and if it was used in live sites

Contact and comment options

Tell me where and how to fix things

Step 5: Write the full text for each section.

As said before you can do that in succession or part by part. I find myself filling in different sections at different times. Mostly I get out the laptop on the train and fill in a quick section I know very well on a short ride. That means it is out of my way.

Step 6: Add fillers from section to section

I then add a sentence after each section that sums up what we achieved and what we will do next. This is not really needed but great for reading flow.

Step 7: Read the lot and delete what can be deleted

The last step is to read the whole text (probably printed out as you find more mistakes that way) and see how it flows. Alter as needed and remove all the things that seemed a great idea at the first time of writing but seem superfluous now. People are busy.

Step 8: Put it live and wait for the chanting groupies

Find a place to put the article, convert it to the right format, check all the links and images and you are ready to go.

More, please, more!

More tips on the style of the article itself are also listed in the Write great posts and articles chapter of the developer evangelism handbook.

Look back at 2009 and resolutions for 2010

Friday, January 1st, 2010

Well it is the beginning of 2010 so time to talk about some resolutions.

  • As I am planning to get a lighter laptop to lug around my main resolution will go down from 1400×900 to 1280×800 – unless Apple will change that for the new 13”.
  • My place is a mess – the reason is that I am never here. So my resolution for this year is:

** De-clutter my flat

** Throw away (or bring to the charity shop or give to @pekingspring to sell at the Amnesty International boot sales) stuff I don’t need – and that is a lot. I have unpacked moving crates from my flat back in Germany (1999).

** Use my flat more (I lived here for 4 years and yet have to use the oven!)

** Work more from home

  • Check my financial situation and what is working for me – I have money and I have a few insurances and private pension plans but I got no clue what they are any more. Also, I have quite a few shares and I am totally oblivious to what they are worth and what I can do with them.
  • Write another book
  • Organize (or help organize) another small conference
  • Concentrate more on the topics of security and performance.
  • Do something awesome for my parents as they need to get out of the house and have trouble traveling.
  • Pick my battles instead of trying to change things that should have changed years ago but are hindered by complacency.
  • I will go to Iceland this year – I have always wanted to and in 10 years in England I never managed to.
  • I will also try to get to New Zealand and South Africa this year.

Interestingly enough 2009 was a very cool year for me and if I had had any resolutions last year these were the ones I was able to fulfill:

  • I spoke a lot at conferences and released a lot of articles, blog posts, code and tutorials.
  • I’ve reached out to audiences I hadn’t tapped beforehand (domainer conference, design agencies and museum brownbags…)
  • I’ve tech-edited a book and a few chapters and I was judge at some competiions – I like doing that a lot.
  • I’ve lost a lot of weight by going to the gym a lot – there’s even the outline of a 4-pack visible (yes I know there is something wrong with this).
  • I’ve been to Japan – something I always wanted to do (shame I was sick when I was there). I’ve also been to Australia which is something I always wanted to.

I think this is about it – I am back to bed now.