Christian Heilmann

Posts Tagged ‘php’

Building an advent calendar for Mozilla in PHP/JS/CSS (part 1)

Tuesday, November 29th, 2011
Note: the server is defunct at the moment but you can get the source code of the calendar on GitHub: code and zip

Yesterday morning I got inspired to have a go at building an advent calendar for the Mozilla Developer Network where we are going to have a daily link for you. Here’s my work log on how I did it. Maybe it is inspirational for you, maybe not. Let’s just have a go.

Step 1: Distributing the 24 “doors” on a massive logo

This was simple. I put the logo as a background into an element with the right dimensions and positioned that one relative. I then wrote a wee script to position 24 list items with the correct links randomly on the “canvas”:

<?php 
echo '<ol>';
$width = 700;
$height = 800;
for ($i=0;$i<24;$i++) {
  $x = rand(10,$width-10);
  $y = rand(10,$width-10);
  echo '<li style="left:'.$x.'px;top:'.$y.'px">'.
          '<a href="index.php?day='.($i+1).'">'.($i+1).'</a>'.
       '</li>';
}
echo '</ol>';
?>

I gave the list items a size and positioned them absolutely and then looked at the results. I reloaded until not many overlapped, moved those by hand, grabbed the innerHTML of the list in Firebug and I had my awesome calendar layout. :)

See it here: http://isithackday.com/calendar-tutorial/random.php

Step 2: Build the API

The next step was to build the API to display the when you click on one of the links. This I build as a dummy, too. It is pretty simple, really.

<?php
  $data = '';
  $day = 0;
  $day = +$_GET['day'];
  if ($day > 24) { $day = 24; }
 
  if ($day) {
    $data .= '<h1><a href="#">Title '.$day.'</a></h1>'.
             '<p>Description</p>'.
             '<p>See it in action <a href="#">here</a></p>';
  }
?>

I cast the day URL parameter to an integer and make sure that it can’t be more than 24. If there was a valid number sent in, I assemble a string of HTML with the output data to show for the day. As $day is preset as 0 and will be 0 when the data sent in is not a valid number, $data will be empty.

In the page I then create an element and conditionally write the content out:

All I need to make this work as an advent calendar later on is add a date validation in there.

<?php 
if (+date('m') < 12 || $day > +date('d')) {
  $day = 0;
}
?>

For now, I commented that out, though.

See it in action here: http://isithackday.com/calendar-tutorial/dummyapi.php

Step 3: Ajaxify

This all works now, but to make it slicker, let’s load the content via Ajax when possible. The first step there was to define the article element as the output container and use event delegation to find out which list element was clicked:

var list = document.querySelector('ol'),
    output = document.querySelector('article');
 
list.addEventListener('click', function(ev) {
  var t = ev.target;
  if (t.tagName ==='A') {
    load(+t.innerHTML);
  }
  ev.preventDefault();
}, false);
 
function load(day) {
  alert(day);
}

I then moved the dummy api out to an own file (simpleapi) and added a check for another parameter:

<?php
  $data = '';
  $day = 0;
  $day = +$_GET['day'];
  if ($day > +date('d')) {
    $day = 0;
  }
  if ($day > 24) { $day = 24; }
 
  if ($day) {
    $data .= '<h1><a href="#">Title '.$day.'</a></h1>'.
             '<p>Description</p>'.
             '<p>See it in action <a href="#">here</a></p>';
  }
  if (isset($_GET['ajax'])) { echo $data; }
?>

That way I only had to do a classic Ajax call adding a new parameter to the url to trigger the echo in the PHP:

function load(day) {
  var httpstats = /200|304/;
  var request = new XMLHttpRequest();
  request.onreadystatechange = function() {
    if (request.readyState == 4) {
      if (request.status && httpstats.test(request.status)) {
        output.innerHTML = request.responseText;
      } else {
        document.location = 'index.php?day=' + day;
      }
    }
  };
  request.open('get', 'simpleapi.php?ajax=1&day='+day, true);
  request.setRequestHeader('If-Modified-Since',
                           'Wed, 05 Apr 2006 00:00:00 GMT');
  request.send(null);  
}

If you wonder about the date in the request header, yes I did indeed rip my own code from my 2006 Ajax book. :)

http://isithackday.com/calendar-tutorial/ajax.php

That’s all for today

That is the main functionality, really. In part 2 I will then show you how to pull the real data off the web and make this look much sexier, as shown here:

A research interface for the social web – fork it now and find what people are talking about

Wednesday, September 22nd, 2010

Researching something on the web can be pretty annoying. Search engines get better every year, but there is a whole world of social sites that are not indexed. For example if I search for a nice photo of a red panda I use Google image search. If I want to use this photo later on I am better off using Flickr or Picasa and see what license the photo is.

Yahoo’s researchers had the same problem which is why they assembled all the social updates in one XML feed – the Yahoo! Firehose. This, in contrast to other Yahoo APIs also comes with commercial terms and conditions and is available through YQL. In terms of data, the Firehose aggregates a lot of different sources:

Yahoo! 360, AOL, Bebo, Blogger, Bloglines, Digg, Diigo, Goodreads, Google, Google Reader, Last.fm, Ma.gnolia, Movable Type, Netflix, Pandora, Picasa, Pownce, Seesmic, Slideshare, SmugMug, StumbleUpon, ThisNext, TravelPod, Tumblr, Twitter, TypePad, Vimeo, Vox, Webshots, Xanga, Yelp, YouTube, Zooomr, Yahoo! Avatars, Yahoo! Buzz, Yahoo! Profiles, Wisteria, Yahoo! Answers, Yahoo! Shopping, Yahoo! Autos, Bix for Yahoo!, Yahoo! Bookmarks, Yahoo! Briefcase, Yahoo! Calendar, Yahoo! Classifieds, Delicious, Yahoo! Family, Yahoo! Sports, Yahoo! Finance, Flickr, Yahoo! Food, Yahoo! Games, Yahoo! Geocities, Yahoo! Green, Yahoo! Greetings, Yahoo! Groups, Yahoo! Health, Yahoo! Hotjobs, Yahoo! Kids, Yahoo! Local, Yahoo! Movies, Yahoo! Music, MyBlogLog, Yahoo! News, OMG! from Yahoo!, Yahoo! Personals, Yahoo! Pets, Yahoo! Status Updates, Yahoo! Guestbook Comments, SearchMonkey from Yahoo!, Yahoo! Shopping, Yahoo! Sports, Yahoo! Tech, Yahoo! Travel, Yahoo! TV, Yahoo! Video.

You can do the data junkie part and use it in the YQL console:

This can be annoying though, especially as you cannot see the photos and videos. This is why I put together a research interface on top of the Yahoo Firehose:

You can see the research interface in action here but more importantly, the source code of the interface is available on GitHub which means that you can host it yourself – for example behind a firewall or make it part of your Intranet.

For a local install you need to sign up for a developer key, edit the keys.php file, put all the files up on your PHP enabled server and you are done. If you get stuck you can get help on the YDN Forums.

Notice that I am keeping the state of your last search by storing it in local storage when your browser supports it – this can be useful for larger searches.

The Hackday Toolbox – getting you started faster

Thursday, July 29th, 2010

Just having spent a lot of time at the amazing open hack day in Bangalore, India I found that most of the questions about starting a hack using Yahoo technology revolved around a few issues:

  • How do I access data on the web/from web services?
  • How do I use YQL from JavaScript or PHP?
  • How do I display information I received from YQL with PHP or JavaScript?
  • How do I get the location of the user and how do I analyse content for geographical locations?
  • How do I access oAuth authenticated information of Yahoo?
  • How do I set up PHP and where can I see errors?

So, to avoid having to repeat myself again I put together The Hackday Toolbox which contains sample code that deals with all these issues.

The Hackday Toolbox

The hackday toolbox contains:

  • An introduction to installing and using PHP with MAMP/XAMPP and debugging it
  • YQLGeo for all your geo and location needs
  • Demos of querying YQL in JavaScript, YUI3 and PHP
  • Demos to display YQL data
  • Authenticated example to access the Yahoo Firehose
  • Rendering Yahoo Geoplanet data as a map

You can download the Hackday Toolbox on GitHub or try the examples.

The toolbox is BSD licensed, so if you want to add Java/Ruby/Python/Heskell/Pascal/Logo/Fortran/6502 Assembly code examples, please do so.

I put my hack in a box…

Loading external content with Ajax using jQuery and YQL

Sunday, January 10th, 2010

Let’s solve the problem of loading external content (on other domains) with Ajax in jQuery. All the code you see here is available on GitHub and can be seen on this demo page so no need to copy and paste!

OK, Ajax with jQuery is very easy to do – like most solutions it is a few lines:

$(document).ready(function(){
  $('.ajaxtrigger').click(function(){
    $('#target').load('ajaxcontent.html');
  });
});

Check out this simple and obtrusive Ajax demo to see what it does.

This will turn all elements with the class of ajaxtrigger into triggers to load “ajaxcontent.html” and display its contents in the element with the ID target.

This is terrible, as it most of the time means that people will use pointless links like “click me” with # as the href, but this is not the problem for today. I am working on a larger article with all the goodies about Ajax usability and accessibility.

However, to make this more re-usable we could do the following:

$(document).ready(function(){
  $('.ajaxtrigger').click(function(){
    $('#target').load($(this).attr('href'));
    return false;
  });
});

You can then use load some content to load the content and you make the whole thing re-usable.

Check out this more reusable Ajax demo to see what it does.

The issue I wanted to find a nice solution for is the one that happens when you click on the second link in the demo: loading external files fails as Ajax doesn’t allow for cross-domain loading of content. This means that see my portfolio will fail to load the Ajax content and fail silently at that. You can click the link until you are blue in the face but nothing happens. A dirty hack to avoid this is just allowing the browser to load the document if somebody really tries to load an external link.

Check out this allowing external links to be followed to see what it does.

$(document).ready(function(){
  $('.ajaxtrigger').click(function(){
    var url = $(this).attr('href');
    if(url.match('^http')){
      return true;
    } else {
      $('#target').load(url);
      return false;
    }
  });
});

Proxying with PHP

If you look around the web you will find the solution in most of the cases to be PHP proxy scripts (or any other language). Something using cURL could be for example proxy.php:

<?php
$url = $_GET['url'];
$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL, $url); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
$output = curl_exec($ch); 
curl_close($ch);
echo $content;
?>

People then could use this with a slightly changed script (using a proxy):

$(document).ready(function(){
  $('.ajaxtrigger').click(function(){
    var url = $(this).attr('href');
    if(url.match('^http')){
      url = 'proxy.php?url=' + url;
    }
    $('#target').load(url);
    return false;
  });
});

It is also a spectacularly stupid idea to have a proxy script like that. The reason is that without filtering people can use this to load any document of your server and display it in the page (simply use firebug to rename the link to show anything on your server), they can use it to inject a mass-mailer script into your document or simply use this to redirect to any other web resource and make it look like your server was the one that sent it. It is spammer’s heaven.

Use a white-listing and filtering proxy!

So if you want to use a proxy, make sure to white-list the allowed URIs. Furthermore it is a good plan to get rid of everything but the body of the other HTML document. Another good idea is to filter out scripts. This prevents display glitches and scripts you don’t want executed on your site to get executed.

Something like this:

<?php
$url = $_GET['url'];
$allowedurls = array(
  'http://developer.yahoo.com',
  'http://icant.co.uk'
);
if(in_array($url,$allowedurls)){
  $ch = curl_init(); 
  curl_setopt($ch, CURLOPT_URL, $url); 
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
  $output = curl_exec($ch); 
  curl_close($ch);
  $content = preg_replace('/.*<body[^>]*>/msi','',$output);
  $content = preg_replace('/<\/body>.*/msi','',$content);
  $content = preg_replace('/<?\/body[^>]*>/msi','',$content);
  $content = preg_replace('/[\r|\n]+/msi','',$content);
  $content = preg_replace('/<--[\S\s]*?-->/msi','',$content);
  $content = preg_replace('/<noscript[^>]*>[\S\s]*?'.
                          '<\/noscript>/msi',
                          '',$content);
  $content = preg_replace('/<script[^>]*>[\S\s]*?<\/script>/msi',
                          '',$content);
  $content = preg_replace('/<script.*\/>/msi','',$content);
  echo $content;
} else {
  echo 'Error: URL not allowed to load here.';
}
?>

Pure JavaScript solution using YQL

But what if you have no server access or you want to stay in JavaScript? Not to worry – it can be done. YQL allows you to load any HTML document and get it back in JSON. As jQuery has a nice interface to load JSON, this can be used together to achieve what we want to.

Getting HTML from YQL is as easy as using:

select * from html where url="http://icant.co.uk"

YQL does a few things extra for us:

  • It loads the HTML document and sanitizes it
  • It runs the HTML document through HTML Tidy to remove things .NETnasty frameworks considered markup.
  • It caches the HTML for a while
  • It only returns the body content of the HTML - so no styling (other than inline styles) will get through.

As output formats you can choose XML or JSON. If you define a callback parameter for JSON you get JSON-P with all the HTML as a JavaScript Object – not fun to re-assemble:

foo({
  "query":{
  "count":"1",
  "created":"2010-01-10T07:51:43Z",
  "lang":"en-US",
  "updated":"2010-01-10T07:51:43Z",
  "uri":"http://query.yahoo[...whatever...]k%22",
  "results":{
    "body":{
      "div":{
        "id":"doc2",
        "div":[{"id":"hd",
          "h1":"icant.co.uk - everything Christian Heilmann"
        },
        {"id":"bd",
        "div":[
        {"div":[{"h2":"About this and me","
        [... and so on...]
}}}}}}}});

When you define a callback with the XML output you get a function call with the HTML data as string in an Array – much easier:

foo({
  "query":{
  "count":"1",
  "created":"2010-01-10T07:47:40Z",
  "lang":"en-US",
  "updated":"2010-01-10T07:47:40Z",
  "uri":"http://query.y[...who cares...]%22"},
  "results":[
    "<body>\n    <div id=\"doc2\">\n<div id=\"hd\">\n 
     <h1>icant.co.uk - \n
     everything Christian Heilmann<\/h1>\n 
      ... and so on ..."
  ]
});

Using jQuery’s getJSON() method and accessing the YQL endpoint this is easy to implement:

$.getJSON("http://query.yahooapis.com/v1/public/yql?"+
          "q=select%20*%20from%20html%20where%20url%3D%22"+
          encodeURIComponent(url)+
          "%22&format=xml'&callback=?",
  function(data){
    if(data.results[0]){
      var data = filterData(data.results[0]);
      container.html(data);
    } else {
      var errormsg = '<p>Error: can't load the page.</p>';
      container.html(errormsg);
    }
  }
);

Putting it all together you have a cross-domain Ajax solution with jQuery and YQL:

$(document).ready(function(){
  var container = $('#target');
  $('.ajaxtrigger').click(function(){
    doAjax($(this).attr('href'));
    return false;
  });
  function doAjax(url){
    // if it is an external URI
    if(url.match('^http')){
      // call YQL
      $.getJSON("http://query.yahooapis.com/v1/public/yql?"+
                "q=select%20*%20from%20html%20where%20url%3D%22"+
                encodeURIComponent(url)+
                "%22&format=xml'&callback=?",
        // this function gets the data from the successful 
        // JSON-P call
        function(data){
          // if there is data, filter it and render it out
          if(data.results[0]){
            var data = filterData(data.results[0]);
            container.html(data);
          // otherwise tell the world that something went wrong
          } else {
            var errormsg = '<p>Error: can't load the page.</p>';
            container.html(errormsg);
          }
        }
      );
    // if it is not an external URI, use Ajax load()
    } else {
      $('#target').load(url);
    }
  }
  // filter out some nasties
  function filterData(data){
    data = data.replace(/<?\/body[^>]*>/g,'');
    data = data.replace(/[\r|\n]+/g,'');
    data = data.replace(/<--[\S\s]*?-->/g,'');
    data = data.replace(/<noscript[^>]*>[\S\s]*?<\/noscript>/g,'');
    data = data.replace(/<script[^>]*>[\S\s]*?<\/script>/g,'');
    data = data.replace(/<script.*\/>/,'');
    return data;
  }
});

This is rough and ready of course. A real Ajax solution should also consider timeout and not found scenarios. Check out the full version with loading indicators, error handling and yellow fade for inspiration.

The Table of Contents script – my old nemesis

Wednesday, January 6th, 2010

One thing I like about – let me rephrase that – one of the amazingly few things that I like about Microsoft Word is that you can generate a Table of Contents from a document. Word would go through the headings and create a nested TOC from them for you:

Adding a TOC to a Word Document

Microsoft Word generated Table of Contents.

Now, I always like to do that for documents I write in HTML, too, but maintaining them by hand is a pain. I normally write my document outline first:

<h1 id="cute">Cute things on the Interwebs</h1>
<h2 id="rabbits">Rabbits</h2>
<h2 id="puppies">Puppies</h2>
<h3 id="labs">Labradors</h3>
<h3 id="alsatians">Alsatians</h3>
<h3 id="corgies">Corgies</h3>
<h3 id="retrievers">Retrievers</h3>
<h2 id="kittens">Kittens</h2>
<h2 id="gerbils">Gerbils</h2>
<h2 id="ducklings">Ducklings</h2>

I then collect those, copy and paste them and use search and replace to turn all the hn to links and the IDs to fragment identifiers:

<li><a href="#cute">Cute things on the Interwebs</a></li>
<li><a href="#rabbits">Rabbits</a></li>
<li><a href="#puppies">Puppies</a></li>
<li><a href="#labs">Labradors</a></li>
<li><a href="#alsatians">Alsatians</a></li>
<li><a href="#corgies">Corgies</a></li>
<li><a href="#retrievers">Retrievers</a></li>
<li><a href="#kittens">Kittens</a></li>
<li><a href="#gerbils">Gerbils</a></li>
<li><a href="#ducklings">Ducklings</a></li>
 
<h1 id="cute">Cute things on the Interwebs</h1>
<h2 id="rabbits">Rabbits</h2>
<h2 id="puppies">Puppies</h2>
<h3 id="labs">Labradors</h3>
<h3 id="alsatians">Alsatians</h3>
<h3 id="corgies">Corgies</h3>
<h3 id="retrievers">Retrievers</h3>
<h2 id="kittens">Kittens</h2>
<h2 id="gerbils">Gerbils</h2>
<h2 id="ducklings">Ducklings</h2>

Then I need to look at the weight and order of the headings and add the nesting of the TOC list accordingly.

<ul>
  <li><a href="#cute">Cute things on the Interwebs</a>
    <ul>
      <li><a href="#rabbits">Rabbits</a></li>
      <li><a href="#puppies">Puppies</a>
        <ul>
          <li><a href="#labs">Labradors</a></li>
          <li><a href="#alsatians">Alsatians</a></li>
          <li><a href="#corgies">Corgies</a></li>
          <li><a href="#retrievers">Retrievers</a></li>
        </ul>
      </li>
      <li><a href="#kittens">Kittens</a></li>
      <li><a href="#gerbils">Gerbils</a></li>
      <li><a href="#ducklings">Ducklings</a></li>
    </ul>
  </li>
</ul>
 
<h1 id="cute">Cute things on the Interwebs</h1>
<h2 id="rabbits">Rabbits</h2>
<h2 id="puppies">Puppies</h2>
<h3 id="labs">Labradors</h3>
<h3 id="alsatians">Alsatians</h3>
<h3 id="corgies">Corgies</h3>
<h3 id="retrievers">Retrievers</h3>
<h2 id="kittens">Kittens</h2>
<h2 id="gerbils">Gerbils</h2>
<h2 id="ducklings">Ducklings</h2>

Now, wouldn’t it be nice to have that done automatically for me? The way to do that in JavaScript and DOM is actually a much trickier problem than it looks like at first sight (I always love to ask this as an interview question or in DOM scripting workshops).

Here are some of the issues to consider:

  • You can easily get elements with getElementsByTagName() but you can’t do a getElementsByTagName('h*') sadly enough.
  • Headings in XHTML and HTML 4 do not have the elements they apply to as child elements (XHTML2 was proposing that and HTML5 has it to a degree – Bruce Lawson write a nice post about this and there’s also a pretty nifty HTML5 outliner available).
  • You can do a getElementsByTagName() for each of the heading levels and then concatenate a collection of all of them. However, that does not give you their order in the source of the document.
  • To this end PPK wrote an infamous TOC script used on his site a long time ago using his getElementsByTagNames() function which works with things not every browser supports. Therefore it doesn’t quite do the job either. He also “cheats” at the assembly of the TOC list as he adds classes to indent them visually rather than really nesting lists.
  • It seems that the only way to achieve this for all the browsers using the DOM is painful: do a getElementsByTagName('*') and walk the whole DOM tree, comparing nodeName and getting the headings that way.
  • Another solution I thought of reads the innerHTML of the document body and then uses regular expressions to match the headings.
  • As you cannot assume that every heading has an ID we need to add one if needed.

So here are some solutions to that problem:

Using the DOM:

(function(){
  var headings = [];
  var herxp = /h\d/i;
  var count = 0;
  var elms = document.getElementsByTagName('*');
  for(var i=0,j=elms.length;i<j;i++){
    var cur = elms[i];
    var id = cur.id;
    if(herxp.test(cur.nodeName)){
      if(cur.id===''){
        id = 'head'+count;
        cur.id = id;
        count++;
      }
      headings.push(cur);
    }
  }
  var out = '<ul>';
  for(i=0,j=headings.length;i<j;i++){
    var weight = headings[i].nodeName.substr(1,1);
    if(weight > oldweight){
      out += '<ul>'; 
    }
    out += '<li><a href="#'+headings[i].id+'">'+
           headings[i].innerHTML+'</a>';
    if(headings[i+1]){
      var nextweight = headings[i+1].nodeName.substr(1,1);
      if(weight > nextweight){
        out+='</li></ul></li>'; 
      }
      if(weight == nextweight){
        out+='</li>'; 
      }
    }
    var oldweight = weight;
  }
  out += '</li></ul>';
  document.getElementById('toc').innerHTML = out;
})();

You can see the DOM solution in action here. The problem with it is that it can become very slow on large documents and in MSIE6.

The regular expressions solution

To work around the need to traverse the whole DOM, I thought it might be a good idea to use regular expressions on the innerHTML of the DOM and write it back once I added the IDs and assembled the TOC:

(function(){
  var bd = document.body,
      x = bd.innerHTML,
      headings = x.match(/<h\d[^>]*>[\S\s]*?<\/h\d>$/mg),
      r1 = />/,
      r2 = /<(\/)?h(\d)/g,
      toc = document.createElement('div'),
      out = '<ul>',
      i = 0,
      j = headings.length,
      cur = '',
      weight = 0,
      nextweight = 0,
      oldweight = 2,
      container = bd;
  for(i=0;i<j;i++){
    if(headings[i].indexOf('id=')==-1){
      cur = headings[i].replace(r1,' id="h'+i+'">');
      x = x.replace(headings[i],cur);
    } else {
      cur = headings[i];
    }
    weight = cur.substr(2,1);
    if(i<j-1){
      nextweight = headings[i+1].substr(2,1);
    }
    var a = cur.replace(r2,'<$1a');
    a = a.replace('id="','href="#');
    if(weight>oldweight){ out+='<ul>'; }
    out+='<li>'+a;
    if(nextweight<weight){ out+='</li></ul></li>'; }
    if(nextweight==weight){ out+='</li>'; }
    oldweight = weight;
  }
  bd.innerHTML = x;
  toc.innerHTML = out +'</li></ul>';
  container = document.getElementById('toc') || bd;
  container.appendChild(toc);
})();

You can see the regular expressions solution in action here. The problem with it is that reading innerHTML and then writing it out might be expensive (this needs testing) and if you have event handling attached to elements it might leak memory as my colleage Matt Jones pointed out (again, this needs testing). Ara Pehlivavian also mentioned that a mix of both approaches might be better – match the headings but don’t write back the innerHTML – instead use DOM to add the IDs.

Libraries to the rescue – a YUI3 example

Talking to another colleague – Dav Glass – about the TOC problem he pointed out that the YUI3 selector engine happily takes a list of elements and returns them in the right order. This makes things very easy:

<script type="text/javascript" src="http://yui.yahooapis.com/3.0.0/build/yui/yui-min.js"></script>
<script>
YUI({combine: true, timeout: 10000}).use("node", function(Y) {
  var nodes = Y.all('h1,h2,h3,h4,h5,h6');
  var out = '<ul>';
  var weight = 0,nextweight = 0,oldweight;
  nodes.each(function(o,k){
    var id = o.get('id');
    if(id === ''){
      id = 'head' + k;
      o.set('id',id);
    };
    weight = o.get('nodeName').substr(1,1);
    if(weight > oldweight){ out+='<ul>'; }
    out+='<li><a href="#'+o.get('id')+'">'+o.get('innerHTML')+'</a>';
    if(nodes.item(k+1)){
      nextweight = nodes.item(k+1).get('nodeName').substr(1,1);
      if(weight > nextweight){ out+='</li></ul></li>'; }
      if(weight == nextweight){ out+='</li>'; }
    }
    oldweight = weight;
  });
  out+='</li></ul>';
  Y.one('#toc').set('innerHTML',out);
});</script>

There is probably a cleaner way to assemble the TOC list.

Performance considerations

There is more to life than simply increasing its speed. – Gandhi

Some of the code above can be very slow. That said, whenever we talk about performance and JavaScript, it is important to consider the context of the implementation: a table of contents script would normally be used on a text-heavy, but simple, document. There is no point in measuring and judging these scripts running them over gmail or the Yahoo homepage. That said, faster and less memory consuming is always better, but I am always a bit sceptic about performance tests that consider edge cases rather than the one the solution was meant to be applied to.

Moving server side.

The other thing I am getting more and more sceptic about are client side solutions for things that actually also make sense on the server. Therefore I thought I could use the regular expressions approach above and move it server side.

The first version is a PHP script you can loop an HTML document through. You can see the outcome of tocit.php here:

<?php
$file = $_GET['file'];
if(preg_match('/^[a-z0-9\-_\.]+$/i',$file)){
$content = file_get_contents($file);
preg_match_all("/<h([1-6])[^>]*>.*<\/h.>/Us",$content,$headlines);
$out = '<ul>';
foreach($headlines[0] as $k=>$h){
 if(strstr($h,'id')===false){
   $x = preg_replace('/>/',' id="head'.$k.'">',$h,1);
   $content = str_replace($h,$x,$content);
   $h = $x;
 };
 $link = preg_replace('/<(\/)?h\d/','<$1a',$h);
 $link = str_replace('id="','href="#',$link);
 if($k>0 && $headlines[1][$k-1]<$headlines[1][$k]){
   $out.='<ul>';
 }
 $out .= '<li>'.$link.'';
 if($headlines[1][$k+1] && $headlines[1][$k+1]<$headlines[1][$k]){
   $out.='</li></ul></li>';
 }
 if($headlines[1][$k+1] && $headlines[1][$k+1] == $headlines[1][$k]){
   $out.='</li>';
 }
}
$out.='</li></ul>';
echo str_replace('<div id="toc"></div>',$out,$content);
}else{
  die('only files like text.html please!');
}
?>

This is nice, but instead of having another file to loop through, we can also use the output buffer of PHP:

<?php
function tocit($content){
  preg_match_all("/<h([1-6])[^>]*>.*<\/h.>/Us",$content,$headlines);
  $out = '<ul>';
  foreach($headlines[0] as $k=>$h){
   if(strstr($h,'id')===false){
     $x = preg_replace('/>/',' id="head'.$k.'">',$h,1);
     $content = str_replace($h,$x,$content);
     $h = $x;
   };
   $link = preg_replace('/<(\/)?h\d/','<$1a',$h);
   $link = str_replace('id="','href="#',$link);
   if($k>0 && $headlines[1][$k-1]<$headlines[1][$k]){
     $out.='<ul>';
   }
   $out .= '<li>'.$link.'';
   if($headlines[1][$k+1] && $headlines[1][$k+1]<$headlines[1][$k]){
     $out.='</li></ul></li>';
   }
   if($headlines[1][$k+1] && $headlines[1][$k+1] == $headlines[1][$k]){
     $out.='</li>';
   }
  }
  $out.='</li></ul>';
  return str_replace('<div id="toc"></div>',$out,$content);
}
ob_start("tocit");
?>
[... the document ...]
<?php ob_end_flush();?>

The server side solutions have a few benefits: they always work, and you can also cache the result if needed for a while. I am sure the PHP can be sped up, though.

See all the solutions and get the source code

I showed you mine, now show me yours!

All of these solutions are pretty much rough and ready. What do you think how they can be improved? How about doing a version for different libraries? Go ahead, fork the project on GitHub and show me what you can do.