With the advent of Neural Machine Translation (NMT), the performance of machine translation has improved a lot. As a result, people's use of machine translators has increased a lot. I often use it not only when reading specialized materials such as papers, but also in everyday life, such as watching foreign news or going on a trip.
But it's such a frequently used translator! Is the translation really accurate?
This is something that even God's post misses
Google is making many contributions in various fields, including artificial intelligence, to the extent that it is called Godgl. Google's translator like this also has limitations. As an example, I translated the following two sentences with Google Translate.
Example 1 > I want to drink Ala even in winter.
☝ 'Aaah'Is'Iced Americano'An abbreviation for, it is a word often used not only by a specific generation but also by people of all ages. By the way, Google Translator 'Aaah'As it sounds 'aa'I translated it to 🥲
Example 2 > By design secure key derivation functions use salt (random number, which is different for each key derivation) + many iterations (to speed-down eventual password guessing process).
✌️ From the above sentence about IT security 'key derivation functions'Is 'Key derivation function' It's a security term, 'Key derivation function' It was the literal translation of
As above, even Google Translator can easily see translations such as abbreviations, neologisms, and technical terms that are unnaturally literal or even misinterpreted
A hybrid translator that emerged from the limitations of machine translation
What is hybrid translation?
A combination of two or more translation methods is called hybrid translation. This is also the biggest advantage of LETR translators. By combining artificial neural network translation (NMT) and rule-based machine translation (RBMT), translation that combines naturalness and refinement is possible.
Since RBMT is a translation method based on linguistic structures, it plays an important role in dictionary and grammar translation.* However, translation quality can be improved by adding pre- and post-processing elements to the NMT method, which is an advantage of RBMT.
Generally, machine translation literally translates words or displays English abbreviations as they are. However, the meaning of the same abbreviation may vary depending on the field, so a customized translation for each field is necessary. In particular, terms that are a combination of multiple words (each word must not be a literal translation) It must be translated into the technical terms of the field.
And for this, Hydread translation using a separate glossary is essential.
Build a glossary
For hybrid translation, building a glossary is as important as the NMT model. Therefore, we select technical terms for each field, inspect them, and build a glossary. 'Technical Terminology Methodology Research Paper' published by the National Research Institute**advises that when organizing jargon, it is necessary to consider five characteristics when constructing it.
· Uniformity of terms: A term must refer to a single concept, and a concept is defined by a single name.
· Transparency and clarity of terms : Express the concept clearly and directly enough to infer what the concept is through terms.
· Conciseness of terms: As long as the concept is explicitly revealed, unnecessary or excessive information should not be included in the name.
· Consistency of terms: Terms referring to concepts in the same category should have the same form as much as possible.
If so, what happens if you use a glossary built like this and do a hybrid translation?
Build a simple jargon dictionary and practice General machine translator(Google translator)Compare how the results of LETR hybrid translators differ compared toI tried it.
Google machine translation VS LETR hybrid translation
First, build a glossary to be used for hybrid translation as follows. The name of this dictionary 'LETR-ICT dictionary'I'll say that.
Let's retranslate the two sentences previously translated with Google Translator with a hybrid translator using LETR-ICT dictionary.
Sentence 1 > I want to drink Ala even in winter.
💡 'Aaah' “Iced Americano”You can see it translated into English. Furthermore, the LETR translator can confirm that frequently used neologisms are correctly translated without using a dictionary of specific terms. 😎
So if you have a neologism you use often, it would be a good idea to create and use a dictionary of neologisms, right? 🤘
Sentence 2 > By design secure key derivation functions use salt (random number, which is different for each key derivation) + many iterations (to speed-downeventual password guessing process).
💡 Earlier, it was a problem with Google Translator, 'key derivation function'The translation of 'Key derivation function'Not 'Key derivation function'It was translated to
Now let's check how the translations for the other terms in the glossary are.
Sentence 3 > An application layer gateway is a software component that includes specific application protocols such as SIP and FTP.
💡 What is Google Translator 'application'A 'application'Whereas it was translated as LETRSaved in advance 'Application layer gateway'You can see that it was translated as a more appropriate term.
Sentence 4 > Research results showing that applying metamaterials to antennas improves performance so that they are more suitable for use in wireless human body area communication have been reported recently.
💡 'Wireless human body area communication'Is a communication technology designed so that information can be exchanged within or around the human body***It refers to The correct English translation of this term is 'Wireless Body Area Networks'Joe. However, the LETR translator accurately translated it by referring to the glossary, but the Google translator literally translated each word (e.g. communication = Communication) as it was.
Sentence 5 > Trivial File Transport Protocol is a very simple file transfer protocol, with the advantages of a very basic form of FTP.
💡 English pronouns with the first letter of each phrase capitalizedIn the case of (or proper names), most machine translators are unable to translate into the requested language and often export the original text as is. After all, the sentence above is Google 'Trivial File Transport Protocol'You can see the English translation as it is. Of course, I can't say that Google's translation is wrong, Is this really a proper translation?
Wouldn't these terms, which Google didn't translate properly, look much better if they were translated according to that language? 😎
Sentence 6 > Fig**5 is a diagram showing the 5G communication network in Korea.
💡 In the case of Google Translate 'Figure 5' '5'You can see that it was misinterpreted as On the other hand, LETR translators 'FIG.5'By translating it to, I made proper use of the meaning of a diagram.
While finishing
I've shown through simple examples that using hybrid translation can produce higher quality translations than regular machine translation. This is because hybrid translation can correct mistranslation and literal translation problems common to general machine translation. Furthermore More sophisticated translations using subject specific jargonYou can even do it.
However, in order for hybrid translation to work well, the following conditions must be met. Before building a dictionary of terms and translating them by carefully referring to the relevant termsㆍpost-processingIt must be done properly until now. Only then can it be said that it is a true hybrid translation.
****: A word commonly used in patent documents, meaning a picture, diagram, or drawing
Good content to watch together
Why does LETR, a language processing engine, focus on text languages?