Christian Heilmann

Posts Tagged ‘internationalization’

Translating or localising documentation?

Friday, December 19th, 2008

We just had an interesting meeting here discussing plans of how to provide translations of our documentations in different languages. I am a big fan of documentation and have given several presentations talking about the why and how of good docs. In essence the “good code explains itself” talk is a big fat arrogant lie. You should never expect the people using your code to be on the same level as you – that would defeat the purpose of writing code.

Fact is that far too much software has no proper documentation at all let alone translations into different languages. This is because there a few issues with translating documentation:

  • Price – translation is expensive, which is why companies started crowdsourcing it.
  • Quality – crowdsourcing means you need to trust your community – a lot (looking at Facebook’s translation app shows a lot of translation agency spam in the comments – not a good start)
  • Updating issues – in essence, you are trying to translate a moving target. With books, courses and large enterprise level software the documentation is fixed for a certain amount of time. With smaller pieces of software and APIs the docs change very frequently and you need to find a way to alert the translators about the changes. If you crowdsourced your translation this is a request on top of another request!
  • Keeping sync – different translations take a different amount of time. Translating English to German is pretty straight forward whereas English to Hindi or Mandarin is a tad trickier. This means that translations could be half finished when the main docs change – which is terribly frustrating to those who spent the time translating now outdated material.
  • Relevancy – your documentation normally spans your complete offer – however in different markets some parts of your product may not be available or make any sense. Translating these parts would be pointless, but it also means that tracking the overall progress of translation becomes quite daunting.

All of these are things to consider and not that easy to get around. Frankly, it is very easy to waste a lot of time, effort and money on translation. It makes me wonder if there really is a need for translation or if it doesn’t make much more sense to invest in a localisation framework. What you’d actually need to make your English product available for people in other locations is:

  • Local experts – this could be someone working for you or a group of volunteers that see the value of your product for their markets.
  • Easy access collaboration tools – this could be a wiki, a mailing list or a forum. Anything that makes it easy for people to contribute in their language and with their view of the world. If these tools make collaboration easy without having to jump through hoops people will help you make your product more understandable in their region.
  • A local point of reference – this could be a blog or small presence of your company in the local language or a collaboration with an already established local partner or group.
  • Moles – most international developer communities have their favourite hangouts already. Instead of building one that might not be used (not every country is going “wahey” when a US company offers them a new channel) it makes sense to have people who speak that language and that you trust be on these channels, listen to what people need and offer the right advice pointing to your documentation.

What do you think? do you have examples of where this is done well and what worked and didn’t work? Here are two I found:

Making twitter multilingual with a hack of the Google Translation API

Monday, March 31st, 2008

After helping to fix the Yahoo search result pages with the correct language attributes to make them accessible for screen reader users I was wondering how this could be done with user generated content. The easiest option of course would be to ask the user to provide the right language in the profile, but if you are bilingual like me you actually write in different languages. The other option would be to offer me as the user to pick the language when I type it, which is annoying.

I then stumbled across Google’s Ajax Translation API and thought it should be very easy to marry it with for example the JSON output of the twitter API to add the correct lang attributes on the fly.

Alas, this was not as easy as I thought. On the surface it is very easy to use Google’s API to tell me what language a certain text is likely to be:


var text = “¿Dónde está el baño?”;
google.language.detect(text, function(result) {
if (!result.error) {
var language = ‘unknown’;
for (l in google.language.Languages) {
if (google.language.Languages[l] result.language) {
language = l;
break;
}

}
var container = document.getElementById("detection");
container.innerHTML = text + " is: " + language + "";
}

});

However, if you want to use this in a loop you are out of luck. The google.language.detect method fires off an internal XHR call and the result set only gives you an error code, the confidence level, a isReliable boolean and the language code. This is a lot but there is no way to tell the function that gets the results which text was analyzed. It would be great if the API repeated the text or at least allowed you to set a unique ID for the current XHR request.

As Ajax requests return in random order, there is no way of telling which result works for which text, so I was stuck.

Enter Firebug. Analyzing the requests going through I realized there is a REST URL being called by the internal methods of google.language. In the case of translation this is:


http://www.google.com/uds/GlangDetect?callback={CALLBACK_METHOD}&context={NUMBER}&q={URL_ENCODED_TEXT}&key=notsupplied&v=1.0

You can use the number and an own callback method to create SCRIPT nodes in the document getting these results back. The return call is:


CALLBACK_METHOD(‘NUMBER’,{“language” : “es”,”isReliable” : true,”confidence” : 0.24716422},200,null,200)

However, as I am already using PHP to pull information from another service, I ended up using curl for the whole proof of concept to make twitter speak in natural language:


    // curl the twitter feed
    $url = ‘http://twitter.com/statuses/public_timeline.rss’;
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    $twitterdata = curl_exec($ch);
    curl_close($ch);
    // get all the descriptions
    preg_match_all(“/([^<]+)/msi”, $twitterdata,$descs);
    // skip the main feed description
    foreach($descs[1] as $key=>$d){
    if($key=0){
    continue;
    }

    // assemble REST call and curl the result
    $url = ‘http://www.google.com/uds/GlangDetect?callback=’ .
    ‘feedresult&context=’ . $key . ‘&q=’ . urlencode($d) .
    ‘&key=notsupplied&v=1.0’;
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    $langcode = curl_exec($ch);
    curl_close($ch);
    // get the language
    preg_match(“/”language”:”([^”]+)”/”,$langcode,$res);
    // write out the list item
    echo ‘

  • ‘.$d.’
  • ‘;
    }

    ?>

Check out the result: Public twitter feed with natural language support

I will do some pure JavaScript solutions soon, too. This could be a great chance to make UGC a lot more accessible.

Thanks to Mark Thomas and Tim Huegdon for bouncing off ideas about how to work around the XHR issue.