The new year 2022 was bright. Before starting a new beginning, I would like to summarize the year that has passed.
Since we opened this space in July 2021, we've collected the most visited content so far. Also, I would like to introduce it once again with sentences selected from the main text of the content by the LETR team, who wrote and organized the article themselves. If you haven't read this past year, or if there's an article you'd like to read again, take this opportunity to check it out.
* Click on each content title or image to go to that page!
NER's present and future: 01. From concepts to diverse approaches
'NER plays an important role throughout natural language processing (NLP). It is used in various fields such as searching and summarizing information using natural language processing, answering questions, and building a knowledge base. In particular, it plays a role in improving the quality of machine translation (MT) and providing customized translations to users. '
“For example, if 'TWIGFARM' is interpreted literally, it is interpreted as a 'twig farm' rather than a 'twig farm'. However, this not only causes translation errors, but can also cause an uncomfortable experience for users. On the other hand, if TWIGFARM can be properly recognized as a company name, not only translation quality but also user experience can be improved. '
NER's present and future: 02. Model structure and data set status
'Currently, it is difficult to find an official NER library that only specializes in Korean, and you can find Korean in most models learned in multiple languages. '
'There is a shortage of NER data in Korea. Currently, there are a total of three Korean NER data sets that have been released, and commercial use of all of them is restricted. '
Teaching AI translators: 01. Why are corpses needed?
“For that reason, the surest way to improve the performance of current translators is to create good data. If there is good quality data to act as a textbook for learning translators, the performance of the translator will naturally improve. For example, the data for learning a Korean-English translator is a pair of sentences composed of Korean and English. This pair of sentences is called a corpus in technical terms. '
NER's present and future: 03. Future development direction and goals
“Since NER plays a very important role in information retrieval, active research is being carried out in the field of natural language processing. In particular, since it can automatically detect people's names, organization names, and region names, etc., not only can translation quality be improved by preventing translation errors, but it is also possible to greatly increase user satisfaction through customized translation according to the field. '
'However, despite this, the NER data set specific to the Korean language is still insufficient. Therefore, in order to overcome the limitation of the scarce amount of data, the LETR team built a Korean-centered data set and built a higher-performance NER Korean language model learned based on this to enable more accurate and natural translation. '
[AI Story] Machine translation becoming human-like
'Recent artificial intelligence technology has evolved by creating artificial neural networks to solve various problems and connecting them to build huge and complex network structures. Neural network machine translation can also be seen as one of these various artificial neural networks. The fact that it has evolved at a tremendous speed through various stages of development in a short period of time is reminiscent of the evolutionary process of our human brain. '
“But right now, it doesn't seem easy for machine translation to surpass the level of human experts. This is because all languages have developed complex and unique characteristics over the course of thousands of years. Therefore, in order to make an accurate translation, it is necessary to understand the culture of the relevant language area, and an advanced thought process of understanding the overall context and making inferences based on appropriate background information is necessary. '
Why does LETR, a language processing engine, focus on text languages?
“The most important output of work, before and now, is a document. It was the same now, 20 years ago, and even 200 years ago. Writing documents is necessary for people to communicate, record, remember, and communicate everything from planning, progress, and completion. '
'So we're focusing on text languages. This is because I firmly believe that writing is a sustainable value that will not go away in the future. We create various technologies and services that can be used for actual translation and content management so that anyone can comfortably use content written in Korean as well as in other languages. '
[AI by our side] Does artificial intelligence dream of artists (3)
'Spooner: Can robots write symphonies? Can a robot draw a great masterpiece on a canvas? (Can a robot write a symphony? Can a robot turn a canvas into a beautiful masterpiece?)
SUNNY: Can you do it? (Can you?) '
“It's been less than 20 years since this movie was released, but artificial intelligence for writing, composing, and drawing pictures has already appeared. Nevertheless, it is a matter of humankind, and on one side of my mind, “Artificial intelligence, should I recognize you as an artist? I can't dismiss the question. It's an art I've always believed that only humans can do, but now technology has reached a level where it's difficult to clearly distinguish between works created by machines and humans. '
[AI Story] AlexNet (AlexNet) opened the era of human vs. artificial intelligence (3) deep learning
'Deep learning follows the principles that the human brain learns. Hinton believed that, like the way the human brain works, AI should learn knowledge by itself rather than programming it. Of course, it is also thanks to the dramatic improvement in computer performance and the development of big data that have supported his beliefs. '
“Alexnet was the beginning of a change that opened the heyday of deep learning. It proved that even a very complex image or video can show excellent visual perception when supported by an appropriate algorithm structure, sufficient data-based learning, and computer performance. '
Why is artificial intelligence making Korean more difficult?
“From that point of view, I understand that Korean is difficult due to the nature of natural language processing and machine translation, which has developed mainly around English. Just as Koreans can learn Japanese relatively easily than English, machines will be easier to learn French and Spanish, which are similar to English. Another disadvantage is that there is still a relatively scarcity of Korean data compared to other languages. '
BLEU score to evaluate machine translation performance
'BLEU is a method of measuring translation performance by comparing how similar machine translation results are to results translated by humans. It has the advantage of “It can be used regardless of language, and the calculation speed is fast.” In other words, the more machine-translated sentences are similar to the correct answers translated by humans, the higher the evaluation score. '
Good content to watch together
2021 LETR year-end settlement (2) Content we liked2021 Top Artificial Intelligence and Natural Language Processing News (1)2021 Top Artificial Intelligence and Natural Language Processing News (2)