If you’re on Twitter, you might have come across Tweets with users questioning what Google translates before translating a local language like Shona to English. If you’re in the camp that’s not at all familiar with how terrible Google Translate is at translating shona/ndebele let me get you upto to speed.
I’ll need you to be an active participant though. Go to Google Translate, translate the phrase “Where is my toast?” from English to Shona. What did you get?? I hope the shock of your life!
Anyway, why is that the case? How can it be so bad at translating to local languages but then with other languages it works much much better? Well, there’s a bunch of different factors and let’s talk about those.
Translate’s inner workings…
A good place to start from is understanding how Google Translate actually works. Originally, Google Translate worked by translating the text to English first and then to the intended language. In this process, Google Translate would reference “millions of documents taken from official United Nations and European Parliament transcripts”.
Google didn’t hire language experts because they believe languages are “ever-evolving” and the efforts of experts would soon be rendered obsolete.
In 2016, Google announced a shift to a neural machine translation (GNMT) to increase fluency and accuracy in Google Translate. GNMT was pitched as soemthing that would improve translation by drawing from millions of examples and allowed for translation of “entire sentences at a time, rather than just piece by piece”.
Why does this all matter? Well, it leads us to understand why local languages have so many absurd translations. Google isn’t drawing these translations from linguists but from a neural network which is based on documentation/web pages available in said languages.
Without Shona/Ndebele documents to use as the basis of translation, these languages are therefore devoid of context necessary to reach accurate or meaningful translations on many occasions.
The lack of webpages in Shona/English or Ndebele/English results in a situation where Google simply doesn’t have much to draw from to improve its machine learning system to better understand these and other less popular languages.
The more documents and web pages available in certain languages make it easier for Google Translate to have enough training and data to give more accurate translations.
Sometimes it’s your fault?
According to a Linguist on Quora it helps to make the phrase you use when translating as simple as possible;
Someone with partial competence in a language can guide Google Translate through the translation, using its vocabulary database and other information to slowly create a translation much better than what the raw output would be for a whole page of translated text not specifically prepared for translation.Daniel Ross, Linguistics PhD student
He also mentions that typos are big deal-breakers for Google Translate.
Ultimately it’s a combination of factors but it ultimately comes down to the popularity of the language in question. The more popular a language is determines how much text Google can tap into to improve it’s translation, ultimately this means Shona/Ndebele will continue to have inferior and questionable translations for the foreseeable future.
Anyway here is a random tweet with pretty weird translations for you to ponder on on your way out of our site!