After artificial neural network-based machine translation (NMT, Neuralmachine translation), machine translation is becoming more and more accurate. Of course, it still falls short of the fluency of a professional translator, but when I see a strange translation like before, it's hard to make fun of them by saying, “Did you write a translator?” Furthermore, efforts are still being made for better machine translation technology, and outstanding research results are being added.
However, even now that machine translation has become a part of everyday life, there are still questions about its reliability. Obviously, the efficiency of machine translation can only be acknowledged, but that doesn't mean it took time, money, and effort like humans to get a degree or pass an exam. If so, we looked at what it would be like to evaluate the performance of machine translation through tests like humans.
Evaluating machine translation performance
In fact, “in order to make a good translator, accurate quality assessment is important.” *It says. This is because machine translation systems “compare and analyze the performance when applying changes that occurred during the development process and reflect them in the development process' *.
This kind of machine translation performance evaluation includes objective evaluation using a program and a subjective evaluation method performed by a human evaluator. 'Objective evaluation is a method that excludes subjective judgment or linguistic characteristics of the evaluator to evaluate performance by automatically measuring fluency and adequacy through a program, and subjective evaluation is a method of evaluating the quality of translation through the evaluator's evaluation'**.
In this article, we will look at automatic (objective) evaluations that can “rule out subjective judgments or linguistic characteristics of the evaluator” ** instead of human (subjective) evaluations that are “time-consuming, expensive, and cannot be reused.”
BLEU score?
There are various methods for automatic evaluation by machines. Today, I'll show you the most commonly used method BLEU (Bilingual Evaluation Understudy)This is it.
“BLEU is a method for measuring translation performance by comparing how similar machine translation results are to results translated by humans.” *** “It can be used regardless of language, and the calculation speed is fast"*** has the advantage. In other words, the more machine-translated sentences are similar to the correct answers translated by humans, the higher the evaluation score.
Measurement of BLEU scores
“BLEU calculates a score between 1 and 100 by measuring how much the translation performed by the machine overlaps with human translation through an n-gram (n-gram) **** comparison. “The higher this score, the more machine translation is interpreted as being similar to human translation, and therefore the quality level of machine translation is higher.” *****
In other words, BLEU is an evaluation method based on the idea that the closer it is to the translation of a professional translator, the better the machine translation performance will be. However, there is room for controversy about whether the translation quality is also high if there are simply many words that match the correct answer. Therefore, in some cases, a supplementary method is used by having a human evaluator make an evaluation.
BLEU actual measurement examples
Past 2019 Twig farmWe have also received verification and validation tests (Verification & Validation) from the Korea Information and Communication Technology Association (TTA) Software Testing and Certification Institute in order to receive an objective evaluation of professional translation performance and quality.
At the time, we compared the translation quality with Google Translator in 5 fields (law, finance, machinery, chemistry, and medicine) and scored higher scores than Google Translator in 4 fields (law, finance, machinery, and medicine). In particular, we were able to receive recognition for our technical skills by recording an excellent BLEU score in legal field tests.
While finishing
Machine translation is now being used not only in specialized fields such as patents and law, but also in various fields, from public services to classical research. It's still hard to say that it's as good as human translation, but machine translation has already entered our everyday lives, and words we didn't know before have become as familiar as if we had found them in a dictionary. Looking at the rapid development of machine translation technology, it seems that the day when the expression language barrier will become awkward in the future is not far off.
It is said that the Tower of Babel, which was built by mythical humans, eventually collapsed, causing misunderstandings and disputes in this world due to different languages. However, I hope that the new Tower of Babel, which has been built up through machine translation technology until now, will send humanity to a better future without language barriers.
References
[1] bleu:a Method for Automatic Evaluation of Machine Translation https://aclanthology.org/P02-1040.pdf
[2] Understanding and using BLEU scores for translation quality control https://www.gconstudio.com/post/20200729
[3] Neural network machine translation 'Twig Farm' surpasses Google... ahead in expressivity-accuracy scores https://www.donga.com/news/Economy/article/all/20190924/97561753/1
[4] What does “hybrid translation” mean, which is ahead of Google Translate? https://www.bloter.net/newsView/blt202006250040
[5] Twigfarm's 'hybrid translator' outperforms Google translators in 4 fields http://newstime24.co.kr/news/article.html?no=22664
Good content to watch together
Why does LETR, a language processing engine, focus on text languages?[AI Story] How Machine Translation Meets Artificial Intelligence[AI Story] Machine translation becoming human-like