Christian Heilmann

Let’s make Machine Learning on the web and in the client happen. We need your input!

Friday, December 14th, 2018 at 2:17 pm

Machine learning and deep learning is the hot breakthrough in computing and a well-hyped topic. How “intelligent” your systems are can be a significant factor in sales and many a mobile device is sold as “personal AI” for your convenience.

O hai, robot

There is no question that learning interfaces are superior to hard-wired ones. The biggest example to me is the virtual keyboard on my mobile. The more I use it, the better it gets. It doesn’t only fix typos for me but also starts guessing the words following the current one correctly. And it doesn’t even matter if I swipe in English or German, the system is clever enough to recognize my needs based on earlier actions on my part.

Machine Learning is everywhere – it is just not accessible to most developers

I love the idea of learning machines and I think it is an inevitable step in the evolution of human-machine interaction. What we saw years ago in Star Trek – a ubiquitous computer at our beck and call with a voice interface – is now common. And we use it mostly to ask it about the weather. It makes no sense that we need to buy a certain device to reap the rewards of a new technology – I want this to be available on the web and any device.

The problem I have with this new evolution in computing is that it is opaque and locked to a few vendors. I love the web for what it is and how it broke down closed development environments and vendor lock-in when it comes to tooling. Anyone can start developing on the web, as it is the one and only truly open and distributed platform we have. The difference is the skill you have, not what device you have access to.

Now, when it comes to Machine Learning on the web, things look a lot less open and mature than I’d like them to be. In order to train models or to even get insights from models you need to use a third party service or library. Out-of-the-box, you can’t do any training or even image or facial detection in the browser.

Enter the WebML Community Group

I think this needs to change, which is why I am happy that there is a new community group of the W3C that works on an API proposal for Machine Learning on the web. This work started with Intel and Microsoft and I am happy we’ve come quite far, but now we need your help to make this a reality.

Let’s quickly recap why Machine Learning on the web and on device is a good idea:

  • Enhanced performance – results from trained model are returned immediately without any network latency
  • Offline functionality – lookups running on device don’t need a connection to a cloud service
  • Enhanced privacy – it is great that many cloud services offer us pre-trained models to run our requests against, but what if I don’t want that image I just took to go to some server in some datacenter of some corporation?

As with every innovation, there are current limitations and things to consider. These are some of the ones we are currently working on in the discussion group:

  • File size – well-trained models tend to be on the large side, often hundreds of megabytes. Using file sizes like that on the client side will result in I/O delays and also extensive RAM usage of browsers or your Node solution
  • Limited Performance – browsers are still hindered by a single thread JavaScript engine and no access to multiple cores of the device. Native code doesn’t have that issue which is why we propose an API that allows access to the native ML code on different OSes instead of imitating them.

The current state of affairs

Currently you can use tensorflow.js and onnx.js to talk to models or do your own training on device. Whilst there are some impressive demos and ideas floating around using those it still feels like we’re not taking the notion that serious.

That said, it is pretty amazing what you can do with a few lines of code and the right idea. Charlie Gerard’s learning keyboard is a great example that uses tensorflow.js and mobilenet to teach your computer to recognize head movements and then control a keyboard with it.

Why not just offer this as browser functionality?

One of the requests we often heard was why browsers don’t have this functionality built-in. Why can’t a browser create an alternative text for an image or recognize faces in a video? Well, it could, but once again this would mean your data is in the hand of some company and said company would control the functionality. It is no better than the offerings of native platforms in their SDKs. The web doesn’t work that way.

How can you help?

We’re right now in an experiment and investigating phase. As rolling out a new standard in the W3C isn’t a matter taken lightly we want to make sure that we deliver the right thing. Therefore we need to get real-life implementation examples where running ML on-device would make a massive difference to you.
So, please tell us what use cases aren’t covered in a satisfactory manner in the current web-talks-to-cloud-and-waits-for-data-to-come-back scenario.
We’re not looking for “I’d like to do facial recognition” but scenarios that state “If I had face recognition in JavaScript, I could…”. I’d be very interested in companies who do need this functionality to improve their current products, and I am already working with a few.
You can reach me on Twitter , you can fill out this form , or you can mail me at with the subject “[WEBML scenario]”.

Thanks for your attention and all the work you do to keep the web rocking!

Share on Mastodon (needs instance)

Share on Twitter

My other work: