Lynx would not be impressed – on semantics and HTML

Wednesday, November 16th, 2011 at 11:50 pm

unimpressed lynx Lately there has been a lot of discussion about markup, and especially the new HTML5 elements. There was a big hoo-hah when ~~Hixie~~the WHATWG wanted to remove the time element from the HTML spec, Divya stirred lots of emotions with her “Our Pointless Pursuit Of Semantic Value” and of course Jeremy posted his views on the subject, too in a counterpoint article “Pursuing semantic value“.

Maybe there wasn’t a counterpoint, maybe there was. Frankly, I was too busy to read the lot. It also doesn’t matter that much, as I get more and more the feeling that we really need to think about the web as it was and how it will be. The lack of understanding of the value of semantic markup to me is just a symptom of a change that is happening.

Tales of yesteryear

A lot of the debate about semantic value and using the correct HTML is kept alive by people who have been around for a long time and seen browsers fail in more ways we care to remember. Valid markup and sensible structure was our only chance to reach maintainability and make sense of the things around us. This was especially important in the long long ago. I remember using Lynx to surf the web.

Lynx showing twitter.com

I also remember to keep Lynx in my arsenal for a bit longer. I used it to “see” what search engines and assistive technologies see. The former was correct at the time (not any longer, Google does index Flash and JavaScript and actually follows ill conceived links using hashbangs).

The latter was even wrong back then. A lot of the debate around using proper HTML5 right now tries to back up with the argument that “assistive technology like screen readers need it”. Nah, not quite the case yet.

Build it quickly, make it work

I’ve mentioned it in a few talks that when people mention the good old days where markup mattered and people cared and such they are talking nonsense. These days never existed and when we started with web development we struggled to make things work. We used tables for layout, NBSP for whitespace, lots of BR elements for vertical whitespace and more evil things. We then used spacer gifs for padding and margin and just started to care when CSS got out and supported. The reason was not that we wanted to write cleaner HTML. The reason was that we wanted to make things work as all we got was a design to build, not a description how to structure the document or what to build. When you start from the look of a web product, semantics are already on the endangered list.

Write less, achieve more

This is the mantra of now. The big success of jQuery is based on it. JavaScript standards were too complex and too verbose to write code quickly and change it quickly. So the jQuery crowd analysed what people did the most – changes to the DOM and adding and removing classes (and later Ajax) and made it damn easy and short to do. No need to write code that doesn’t do much.

The same happens over and over again. less and sass make the prefix hell and repetition for different browsers in CSS easy to maintain and client-side templating languages and browser-internal templating and client-side MVC make HTML the outcome of computation and programming logic and not a starting point.

If you can’t see it, why do it?

A lot of what the fans of HTML and semantics are getting excited about is not visible. Whenever a new HTML element got support and had a visual representation in the browser it was a no-brainer. People used it immediately. In most cases they used it wrongly, but they used it (I’ve seen fieldset and legend used around images as it is pretty and of course indentation with BLOCKQUOTE).

A lot of the semantically rich elements don’t show up at all. Blockquotes’s cite attribute was meant to give a quote meaning by telling us where it is from. ACRONYM and ABBR were supposed to tell people what a TLA meant – heck we don’t even do that in meetings and press releases so why bother adding info that the browsers don’t show the users.

This is also a big issue for Microformats. If browsers made an address draggable to your address book or created voting buttons for VoteLinks, if a browser would automatically detect events and give you a simple interface to add to your calendar they’d be a no-brainer to use. As it is, we have a few success stories to tell, for a lot of work to do.

The big book of ARIA

It gets really frustrating when we are talking about accessibility. Making a web document available for people with various abilities should be easy when we stick to keeping things simple and follow human logic.

It should, but it isn’t. And by keeping things very simple we can reach more people but we could also deprive a large group of great interfaces. Whenever we do crazy things in the browser and the talk comes to making them accessibile the way out is the mythical ARIA.

If you dive into ARIA you will realise very quickly that it is a lot of work, hard to understand concepts and above all a lot of code to write. Instead of having accessibility as an integral part of HTML5, we have to deal with two parallel standards. One to achieve things quickly and move the web from documents to applications and another one to keep it available for everyone out there. This is not a good place to be in. Accessibility happens when you embrace it from the very start. There is no magic bullet layer at the end of the process that makes things work.

So what about HTML and semantics?

You know what? There is no solution for all. The reason is not that technology moved on or people don’t care about users or that our standards are holding us back or anything like that. The reason is that “write once, deploy anywhere” is simply bullshit. The one thing that made the web work so far and become an amazing market to work in is flexibility. We all enjoy that you can reach a seemingly similar experience for our end users in many different ways. So why are we banging on about one side of the development range or another?

How about this:

If you write a document by hand, use all the semantics you can add in. This is your handwriting, your code is your poetry and people learn from looking at what you did.
If you need to write a hard-core application and every byte is a prisoner try to play nice with the semantics but follow your end goal of delivering speed. Make sure to tell people though that your code is the end result of conversions and optimisations and not for humans to look at.
Regardless of what you build – when you can use new technology – use it.
Remember that the web is not your browser and computer – add fallbacks for other browsers when using bleeding edge technology. When the others catch up you won’t have to alter your code!
The main focus of markup and web code that is not optimised for edge case apps is to make it easy for people to maintain it. If people can see in the HTML what is going on – win. If what only works with JS is generated by JS - even better.
More markup is not a crime when it is markup that adds value. Arguments that STRONG is worse than B because it means more code and slower loading pages are irrelevant in times of gzipping on the server
We can only escape the chicken and egg problem of new HTML when we use it. Right now, if you ask for support in browsers for new elements the answer from most vendors is that nobody uses them so why bother. And when you ask people why they don’t use them they tell you because browsers don’t support them. One of us has to start changing that.

Bigger fish to fry

Personally I am concentrating more on the things that really worry me about the web these days, and it might be interesting to list them.

Death of longevity – I always loved the fact that I can find something on the web and go back to it. This is not the case any longer. A lot of my old bookmarks are dead, my tweets go into the data nirvana after a certain number and I cannot access them any longer, and code you write for companies will be totally different shortly after your departure. This is not the web I want. It is a great mix of entertainment and archive and the “real time web” really messes with this.
High fidelity web sites – I remember when Flash made our 500Mhz machines flare up. Nowadays almost every cool new site I try out does that to my dual core macbook. I can see in the very near future pages coming up telling me that my video card is not good enough to enjoy them. This is the reason I never played games on a PC. Let’s use cool new and flashy in a sensible manner instead
Identity – we are spreading ourselves thin on the web right now and leave a lot of outdated and erroneous profiles of ourselves. Who you are on the web is becoming a very strange concept and some of the work I do right now tries to bring that back into an easier to maintain fashion.
The open web – today the US debates if it is a good idea to censor the internet like China, Syria and other countries do. This scares me. I started on the web as it was less regulated and much less commercial than radio or TV. Let’s not give up that freedom
The maker web – the web is ubiquitous, we use it as a part of our day-to-day work and play. Lately I find though that the creative part of the web is dying and people are consuming it rather than using and enriching it. This, again, scares and annoys me. We should not become virtual couch potatoes.

Semantics are like wonderful prose. You use them to deliver an enjoyable product. People are not celebrated for writing books. They are celebrated for what they filled them with. If we keep putting things on the web that have structure and get better on more sophisticated display products we are building for the future. If we point fingers at others doing it wrong we waste our time.