A mysterious creature called Babel Fish** appears in the sci-fi work “A Hitchhiker's Guide to the Milky Way”, which stands out for its witty black humor by English writer Douglas Adams. If you put it in your ear as a small yellow fish, you'll be able to immediately understand anything spoken in any language. As it turned out, it is an essential presence for the protagonist, an ordinary earthling who has traveled through space.
But isn't the name of this smart guy named Babelfish familiar? It was a free machine translation service previously offered by the portal site Yahoo. In fact, at the time, I guessed that like the puppy in the famous Yahoo search service advertisement, the translation service offered fish. However, it turned out that it was named after an alien creature from this famous science fiction work.
However, in reality, there is something similar to this babel fish. This is the case with automatic translation, or machine translation. Instead of Babel Fish, which is a fictional entity, it uses a computer to automatically translate languages.
In that sense, today I'm going to review the history of the development of machine translation, which is creating a world where people can communicate without language barriers, just like Babel Fish. Today, we will summarize the deep concerns, trial and error of many researchers, and the development process of technology until machine translation meets artificial intelligence and shows high performance based on deep learning and vast language data.
The beginning of machine translation
In fact, machine translation has a long history. If you look at the idea itself, it goes all the way back to the 17th century philosopher Descartes. However, the modern concept of machine translation, which translates languages with computers, was proposed by American mathematician Warren Weaver**** in 1949, and full-scale research began in the 1950s.
However, at the time, the level was simply finding words in a dictionary, changing them to those in the target language, and reorganizing them according to grammar. As a result, research soon hit a wall, and contrary to people's expectations, it was difficult to develop technology at a rapid pace. Eventually, this led to the emergence of a new idea that computers should analyze and understand language.
Rules-based machine translation
Until the 1980s, machine translation mostly used rules-based technology, a method of translating by creating rules based on the structure and grammar of the actual language.
Rule-based machine translation was able to improve translation accuracy with algorithms based on actual language grammar. However, in order to understand linguistics and systematize translation rules, the role of linguists in the research process was important, and development required a lot of time and money. Also, there was a limitation where it was difficult to properly translate sentences that did not match the grammar we commonly use in real life.
However, through this period, basic research required for natural language processing, such as morphological, syntactic, and semantic analysis, had a significant impact on the development of machine translation later.
Statistics-based machine translation
Entering 1990, IBM introduced statistical methods to machine translation, and revolutionary changes occurred.
Statistics-based machine translation is a method of learning statistical models such as the frequency of words or phrases using parallel corpus of original text and translated text. Therefore, since relatively accurate translation is possible if there is enough language data to generate statistics, many companies, starting with Google, were able to enter the heyday of machine translation by entering machine translation research.
However, statistics-based machine translation also has limitations. If sufficient data is not accumulated, translation quality deteriorates, and the disadvantage is that it is difficult to translate languages with different word orders and grammatical structures in particular.
Neural network-based machine translation
Statistics-based machine translation seemed to have become mainstream, but after the 2000s, another level of huge innovation occurred when combined with deep learning.
Unlike traditional approaches, neural network-based machine translation imitates actual human ways of thinking and creates translation results based on information that machines understand the meaning of sentences. As a result, it is possible to show a much more natural translation result by recognizing the meaning of sentences and even subtle differences in nuance. Recently, machine translators that provide automatic translation services such as Google, Microsoft, Naver, and Kakao are basically supplementing existing problems based on this model.
Conventional statistics-based machine translation had methodological limitations where understanding of the context of the entire sentence was inevitable. However, neural network machine translation is a machine learning technology that follows the learning process of the human brain, and it is now possible to identify semantic differences in the entire context and show natural translation results sentence by sentence.
The future of machine translation
Even now, efforts to improve neural network-based machine translation are ongoing. As demand for translation increases around the world, vast amounts of language data are being accumulated, and machine learning technology for learning artificial intelligence is improving, machine translation is developing rapidly. Also, Twig Farm (https://twigfarm.net) Just like using LETR technology, it is also possible to provide customer-specific translations by converting the customer's unique data and technical terms of a specific field or company into data.
In this way, machine translation has already reached a significant level through decades of research and development. There is even a prospect that human translators may soon be replaced. This is a tremendous pace of development considering that until just a few years ago, various examples of mistranslation of machine translators were discussed with laughter.
However, even now, it seems difficult to translate the language, cultural values and unique style contained in the text like a skilled human translator. This probably means that machine translation still has many challenges and paths to be solved in the future. However, as new methods for translating various languages continue to be tested in the latest neural network models along with the development of artificial intelligence, it seems that a future without language barriers is not too far away.
References
[1] Machine translation https://en.wikipedia.org/wiki/Machine_translation#cite_ref-6
[2] Rule-based machine translation https://en.wikipedia.org/wiki/Rule-based_machine_translation
[3] Statistics-based machine translation https://en.wikipedia.org/wiki/Statistical_machine_translation
[4] Neural network machine translation https://ko.wikipedia.org/wiki/신경망_기계_번역
[5] https://www.sciencetimes.co.kr/news/기계번역-어디까지-진화했나/
[6] Kyung Hee University Graduate School Journal, Scientific Research [No. 243: Artificial Intelligence Translation] Trends in AI Translation
[7] Artificial neural networks vs. statistics-based translation, what's the difference? https://zdnet.co.kr/view/?no=20161223190944
[8] [Let's find out] The age of machine translation http://www.dt.co.kr/article_list.html?gcd=3&scd=300&ig=391817&cpage=3&sel_y=2016&sel_m=06&sel_d=03
Good content to watch together
[AI Story] Machine translation becoming human-likeTeaching AI translators: 01. Why are corpses needed? Why does LETR, a language processing engine, focus on text languages?