Product
OverviewVideo​Graphic​Document​
Enterprise
Story
LETR / TECH noteNews / Notice​
Pricing
En
한국어English日本語日本語
User guide
Getting started
한국어English日本語
한국어English日本語
Teaching AI translators: 01. Why are corpses needed?
2024-07-04

‍Getting started

The beginning of artificial intelligence is quite a long time ago. Since it is an advanced technology, it seems to have appeared recently, but if you know, it is known that research* on artificial intelligence also began with the advent of computers in the late 1940s.

 

For that reason, the existence of artificial intelligence is quite familiar. People have already imagined various forms of artificial intelligence. On screens, highly advanced artificial intelligence causes wars to control humans, and also freely communicates with unknown aliens through translators. However, compared to expectations and fears stemming from vague imaginations about artificial intelligence, it is also true that it has not had a significant impact on our actual lives.

 

However, a major event occurred that made us feel that the existence of artificial intelligence was close at hand. Lee Se-dol, the 9th team with the highest number of humans, lost to AlphaGo in a Go match. Until now, machines beat humans in chess, but in Go, the number of cases is much higher, so it was thought that it was difficult to surpass humans.

‍

Deep learning technology has broken the prejudice against the limitations of artificial intelligence and enabled AlphaGo to shine brilliantly. It's about increasing the probability of solving a problem by learning a large amount of data on the machine. AlphaGo started with learning Go's bulletins (chess) *, which has been accumulated over a long period of time, and has greatly improved performance through extensive self-learning.

‍

AI translation and corpus

Artificial intelligence has easily surpassed humans in Go, which requires complex strategic thinking. But why do Google Translator and Papago still get lots of translation errors instead of surpassing human translators?

 

Anyway, compared to Go, where you have to calculate a finite number of cases, it's a much larger world of languages. Language expressions change with time, region, and even people and situations. Even if humans create criteria for judging appropriate expressions, it is bound to be difficult for machines to make their own judgments because there are so many variables.

 

Above all, there isn't enough data for machine learning, such as Go's bulletin. However, English translations in specialized fields with limited terms and relatively large amounts of data are in a better situation. On the other hand, there is still a lack of data on languages other than English and colloquial language used in everyday life.

 

For that reason, the surest way to improve the performance of current translators is to create good data. If there is high-quality data to act as a textbook for learning translators, the performance of the translator will naturally improve. For example, the data for learning a Korean-English translator is a pair of sentences composed of Korean and English. In technical terms, this pair of sentences is called a corpus.

 

Of course, an excellent model must be assumed, but building a good corpus is also very important to improve the performance of machine translators. Therefore, LETR is also making great efforts to secure high-quality corpus as much as possible.

‍

This concludes the first story I've prepared about corpus for learning artificial intelligence translators.

Next, let's talk about corpus generation, or the actual process of building a corpus.

‍

‍

‍

reference
  • History of artificial intelligence https://ko.wikipedia.org/wiki/인공지능#역사
  • Bulletin: A record of Go or organ (Source: Standard Korean Dictionary)
  • A corpus or corpus (corpus) is a set of samples of languages extracted for a specific purpose for natural language research. https://ko.wikipedia.org/wiki/말뭉치
  • ‍

    🚀데이터 인텔리전스 플랫폼 '레터웍스' 지금 바로 경험해보세요.

    • 노트의 내용을 실제로 이용해 보세요! (한 달 무료 이용 가능 🎉)
    • AI 기술이 어떻게 적용되는지 궁금한가요? (POC 샘플 신청하기 💌)

    ‍

    ‍

    ‍

    ‍

    View all blogs

    View featured notes

    LETR note
    Introducing the Universe Matching Translator and AI Dubbing Technology
    2025-06-30
    WORKS note
    Leveraging VTT Solutions for Video Re-creation
    2025-06-27
    LETR note
    Comparing Google Gemini and LETR WORKS Persona chatbots
    2024-12-19
    User Guide
    Partnership
    Twigfarm Co.,Ltd.
    Company registration number : 556-81-00254  |  Mail-order sales number : 2021- Seoul Jongno -1929
    CEO : Sunho Baek  |  Personal information manager : Hyuntaek Park
    Seoul head office : (03187) 6F, 6,Jong-ro, Jongno-gu,Seoul, Republic of Korea
    Gwangju branch : (61472) 203,193-22, Geumnam-ro,Dong-gu,Gwangju, Republic of Korea
    Singapore asia office : (048581) 16 RAFFLES QUAY #33-07 HONG LEONG BUILDING SINGAPORE
    Family site
    TwigfarmLETR LABSheybunny
    Terms of use
    |
    Privacy policy
    ⓒ 2024 LETR WORKS. All rights reserved.