Christian Heilmann

Lecturing at MIT about HTML5 Video – video, slides and (lots of) notes

Sunday, January 16th, 2011 at 12:56 am

I am currently in Boston and went to MIT to give a guest lecture on HTML5 multimedia as part of a lecture series by Mozilla leading up to a game competition. Here are the slides and some notes on what I talked about.

You can see “Multimedia on the web” on any HTML5 enabled device here (courtesy of or embedded here:

See the slides on Slideshare

Here are the notes of what I talked about:

Quick history of Multimedia on the web

It is sometimes a good idea to think back about what we did before we continue to move on. That gives us a chance to avoid repeating mistakes and to remember solutions to issues we face once more.

In the beginning, we had images as the only multimedia elements in browsers. As the connection speeds were bad we had to be very careful in what formats to use:

  • JPEG images have millions of colours but are lossy when you make them smaller in file size. You can see those “artifacts” when people save text screenshots as JPEG - the text looks fuzzy and is hard to read.
  • GIF images had a fixed number of 256 colours, had transparent pixels and could be animated images. Saving photos as GIF resulted in dithering to simulate more colours and larger files.
  • PNG was the first open format and a good middle ground – no animation though and IE doesn’t get the transparency right
  • WBMP was a special 2 colour format for mobile devices in WML pages

As the connection speeds were bad we came up with some very clever image formats and browser attributes:

  • Progressive JPG shows a highly artifacted and small (in filesize) version of the image first and increases its quality during download
  • Interlaced GIF loaded the GIF line by line – first lines 1, 3, 5, 7 and so and then 2, 4, 6 and so on – that way you saw what the image was about whilst it was loading
  • Netscape had a lowsrc attribute that allowed you to define a smaller image to show first which gradually got covered up by the full fat image

We animated with GIFs and later on with JavaScript animations – the latter giving us control over the animation but also requiring JS, which was not commonly used or turned on.

Audio on web sites was mostly MIDI background sounds – and let’s thank our lucky stars that this is over.

Next we used Java Applets for high fidelity animations and they gave a lot of web sites the charm of things found in the bargain bin at Hallmark.

Then RealPlayer came around. Now not on the “to have” list of anyone any more RealPlayer was awesome for its time: SMIL support, great streaming ability, supported by all platforms and very good compression. A shame that both the player and the encoders were closed and you had to pay for everything.

Other players that allowed video embedding were Quicktime, Windows Media Player, Shockwave and even Adobe Acrobat had some image conversion options. Other plugins like iPix allowed for zooming into images and VRML was the idea of having 3D worlds inside the browser.

All of them had the same problems though – they were plugins and thus alien to the browser. You needed to constantly upgrade them, they made the browser less stable and there was no interaction with the rest of the page.

Another issue is security – a lot of security exploits of browsers actually work by attacking plugins as they have deeper access into the operating systems than JavaScript itself.

After a while, Flash came out as the main plugin people would use for multimedia on the web. It came with great abilities, a good set of editors and libraries and allowed you to show and – more importantly – protect your videos from being downloaded. Right now DRM will be one of the the main reasons to use Flash movies instead of going native with HTML5.

Annoyances with Flash

The first thing I found lately with Flash is bad performance. On my MacBook air watching a few YouTube videos triggers the fan of the processor – something HTML5 Video doesn’t do. The reason is that Flash is a one size fits all solution for audio, video, animation and lately also 3D animation.

Whilst Air and Flex allows you to build movies with external editors, you mostly still use Flash builder to create your files instead of the editor you use to build the rest of the page.

No matter how cool Flash is, it is still an alien, black box in the browser and not available to manipulate from the outside. This means first of all bad accessibility:

  • You can’t reach Flash videos with your keyboard on browsers othe than IE - regardless of keyboard controls inside the video
  • Keyboard access in the movies needs to be added by the developers.
  • Whilst audio and video can be powerful tools to aid people with learning difficulties it is tough to make them available to them because of the two points above

The way to make Flash talk to the rest of the page is to provide APIs to your Flash controls. YouTube for example has an API that allows you to write your own player controls that are keyboard accessible. I’ve used this API in the past to build Easy YouTube. Some things I could not do though – like providing subtitles and captions in an accessible manner.

HTML5 audio and video

And this is where HTML5 comes to the rescue – it comes with native audio and video elements that are not in a black box but part of the browser and as accessible for you to manipulate like a block of text, an element or an image is. There are many upsides to native multimedia controls:

  • Better accessibility
  • Better performance – the elements do one thing well rather than being a catch-all solution
  • Much simpler API
  • Allows for styling and overlays – covering Flash movies with other elements in Internet Explorer was always a pain
  • View-source “hackable” – changes can be done in plain HTML rather than needing changes in another editor and compilation

Take Oprah’s new web site for example – hovering over any of the big pictures reveals a video. This is HTML5 video and thus easy to maintain in a CMS without the whole page being a Flash movie. It is also much higher quality and smaller file size than animated GIFs.

Painful stuff – codecs and conversion

Of course not all is sunshine, bunnies and daisies in HTML5 video and audio land and the reason is copyright and intellectual property. An open system using closed video and audio encodings cannot work. Video and audio gets converted to smaller, stream-able formats before it goes on the web. There is no such things as an “AVI file” or “MOV file” – these are container formats with audio and video streams all using different codecs to make them small. The issue is that these systems are all copyrighted and you need to pay money to create files in these formats. Which is why we needed open formats for HTML5.

We now have H.264 or MP4 files that are open but we might need to pay for them later, we have Ogg Theora files that are open and Google released VP8 or WebM to have a new, high quality and small filesize format for the web. Our job now is to convert proprietary format videos as created by our digital cameras into these open formats. There are a lot of tools for that:

  • Audacity is a full-fledged audio editor that allows you to save OGG audio files.
  • WebM tools allow you to create files in WebM format.
  • Evom is another encoder.
  • VLC is not only an awesome video player but also allows recording and encoding.

  • Ogg convert, Firefogg and TinyOgg are all OGG converters.

  • For Mac there is Miro Video Converter which is free and open source to easily convert Videos to WebM.
  • MPEG StreamClip converts videos to MP4 for Mac and Windows.
  • ffmpeg is a command line tool that most of the above solutions use in one way or another.

A very simple way to convert a video to all the necessary formats is to host it on If your video is licensed with Creative Commons does not only offer you full-length hosting (YouTube only allows 15-30 minutes) but also converts the video to OGG and MP4 automatically for you.


Embedding HTML5 video into your document is incredibly easy:

Anything inside the audio or video that will be displayed when the browser doesn’t support them – in this case a not very helpful message but an important one. Normally you’d use this to offer a link to download the audio or video file or a Flash fallback (if you really must).

Right now, that doesn’t do anything though. If you want to display a video that the user can control (jumping to another time, changing the volume, playing our pausing the video) all you need to do is add a controls attribute:

With that you have a video that can be controlled via mouse or keyboard – it is that easy. The controls look different from browser to browser and some have features others don’t. Safari for example is the only browser that allows the video to be full screen. Other browser makers decided against that option to prevent people from using video for phishing purposes.

In order to support all the capable browsers you need to offer the video in at least two formats:

Use MP4 as the first format to ensure that iOS devices play it. Make sure to add a type attribute to each source as otherwise browsers will load a small part of each file to determine the type.

Additionally you can use the media attribute to serve different quality movies to different devices. The following example sends a high quality video to devices that are wider than 800 pixels and a lesser quality version to others (when using MP4):

Other attributes you can use in the video and audio tag are:

  • poster – define a picture to show before loading.
  • height/width – dimensions of the video
  • loop – automatically restart the video or audio
  • preload (auto/none/metadata) – when set to auto the browser preloads the video/audio when it is possible (most mobile browsers will not to save bandwidth, though). When set to metadata only the necessary data gets loaded to show the length of the media and set the right dimensions


As there are differences in the controls across browsers you might want to create your own player controls. This is pretty easy. You have a few methods to use:

  • load() – load a new media.
  • canPlayType(type) – returns probably, maybe and “” (really!)
  • play() – play the movie
  • pause() – pause the movie.
  • addTrack(label,kind,language) -for subtitles

You also have a few properties to control and read the state of different parts of the video/audio file:

  • Video details
    • width
    • height
    • videoWidth
    • videoHeight
    • poster
  • Controls
    • controls
    • volume
    • muted
  • Tracks
    • tracks
  • Network state
    • src
    • currentSrc
    • networkState
    • preload
    • buffered
  • Ready state
    • readyState
    • seeking
  • Playback state
    • currentTime
    • startTime
    • duration
    • paused
    • defaultPlayBackRate
    • playbackRate
    • played
    • seekable
    • ended
    • autoplay
    • loop

This is a lot to play with. For example to create a simple play button for an audio file you might have a button element in the page with the class button. To make it functional all you need to do is this:

var audio = document.getElementsByTagName(‘audio’)[0];
var play = document.getElementsByClassName(‘play’)[0];
var t =;
t.innerHTML = ‘pause’;
} else {
t.innerHTML = ‘play’;


You check if audio.paused is true and call play() when it is, otherwise you call pause(). Change the innerHTML of the button at the same time and you are doen.

However, simply checking the state of an object is not safe. It makes much more sense to actually listen to events fired by the audio/video object. And there are a lot of interesting events to listen to:

  • loadstart
  • progress
  • suspend
  • abort
  • error
  • emptied
  • stalled
  • play
  • pause
  • loadedmetadata
  • loadeddate
  • waiting
  • playing
  • canplay
  • canplaythrough
  • seeking
  • seeked
  • timeupdate
  • ended
  • ratechange

Using these events, the player button code changes slightly:

video.addEventListener(‘play’, playEvent, false);
video.addEventListener(‘pause’, pausedEvent, false);
video.addEventListener(‘ended’, function () {
}, false);
function playEvent() {
play.innerHTML = ‘pause’;

function pausedEvent() {
play.innerHTML = ‘play’;

play.onclick = function () {
if (video.ended) { video.currentTime = 0; }
if (video.paused){;


No magic there.

If you want to see all the events and properties in action (and more importantly see if a browser supports them or not) check out the Media events demo page at the W3C web site.

One very good trick for syncing video and other effects is using the currentTime property. One example of that is the Spirit of Indiana demo where we sync a Google Maps animation with a video. The main gotcha is that the timeupdate event fires a lot and not necessarily every second. Therefore you need to throttle the changes to full seconds by using parseInt(). You can see this in action at the video demo:

function() {
var stage = document.getElementById(‘stage’);
var log = document.getElementById(‘logger’);
var seqlog = document.getElementById(‘loggersequences’);
var v = document.getElementsByTagName(‘video’)[0];
but = document.createElement(‘button’);
but.innerHTML = ‘Click to see Lindbergh’s flight’;
but.addEventListener(‘click’,function(e) {;
var now = 0;
log.innerHTML = v.currentTime;
var full = parseInt(v.currentTime);
if(full >= now) {
seqlog.innerHTML = now;
now = now + 1;



The real power of HTML5 and native video is when you can see how easy it is to transform the video element in the page. Here are some demos:

  • Transforming Video shows how simple it is to create a player that allows you to zoom inside a video and rotate it using CSS transformations
  • Paul Rouget’s round video demo shows how to create a play/pause button in SVG, greyscale a video and rotate it using CSS transformations
  • Paul’s Video Mashup demo shows how to apply SVG filters like Gaussian Blur, masking and how to apply CSS transforms to a running video

Simply check the source codes on these demos to see how they are done – the power of an open web.

Realtime changes

Adding filters and transforming the video container with CSS is one thing. Using Canvas and copying frames from the video into it you can do realtime changes to the video itself, which is really nice.

  • Remy Sharp’s round video demo shows how to create a rotating, bouncing circle that contains a video
  • Paul Rouget’s Green demo shows how to simulate a green screen effect with open web technologies. You copy each frame over to a canvas and read out the RGB value of each pixel. When it is green, just set its opacity to allow the picture behind the video to shine through.
  • The dynamic content injection by Paul shows how to analyse a video and find white parts to use as corner points for an embedded canvas. You can then inject other videos, canvas animations or images into the other video.
  • The Tracker demo shows how to detect human shapes and movement in a video
  • The edge detection demo shows how to detect edges in a live video using JavaScript

Awesome audio stuff

As you can see there is a lot of cool realtime things you can do with video. Audio is less sexy somehow as you can play and maybe play backwards (only works in Safari at the moment). Wouldn’t it be cool to read out the audio data or dynamically generate sounds though?

There is a Firefox4 API out there that allows you to do that. Read the introduction to the API or check the Audio API Wiki to get to know it.

One cool demo using this API is the HTML5 Guitar Tab generator and there is already a library out there called sfxr that allows you to create audio on the fly.

Tags: , , , , , ,

Share on Mastodon (needs instance)

Share on Twitter

My other work: