A Lively chat platform via Live translation
The first way to deal with anomalous online colloquy
In order for machine translators to be used commercially, they must satisfy both aspects. The first is the quality of translation, and the second is the speed of translation. In that respect, translating in an online chat window is a challenging challenge. This is because online chat requires stricter quality and higher speed than any other translation task. The fact that “fast translation” is necessary is immediately recognizable when you think about the real-time nature of chat. The reason why “accurate translation” is necessary also stems from the nature of chat, “realtime.”
Since there is a high possibility that the chat will be exposed directly to the other party without editing the translation, a good quality machine translator is necessary. Let's imagine we're writing an email in a foreign language we're not fluent in. First, I'll write it in my native language, try machine translation all over the place, and edit it with all my heart to capture the nuances I want to express as well as possible. Naturally, as time and effort is spent, the quality of the translated text will improve. However, on a chat platform, you have to communicate with the other party right away. This means that the machine translation text derived by the machine translator must be displayed as it is without additional editing or effort. If the machine translator's performance is poor, communication may not be smooth, and even major misunderstandings may occur.
“Accurate” and “fast.” These two tasks alone are actually a heap of work for researchers to do. However, there is something that raises the intermediate level of difficulty to the highest level of difficulty by taking a picture of the dragon, so the fact is that chat is a colloquial language distributed online. On online platforms where anonymity is guaranteed, many people don't care about spelling or spacing, and don't mind using emojis and other characters to express their mood and state. For example, let's look at the following example, which means “I can't eat it because I don't have it.” The phrase “I can't eat you,” contains a typo that is enough to be made by tapping on the phone's keyboard. Also, there are intentional mother variations that can be used to reflect one's own speech habits or mood. In response to this, the machine translators of Company G, Company N, Company D, and Company O translate it as follows.
원문 |
업서어 못 먹쥬 |
G사 | I can't eat it. |
N사 | I can't do that. |
D사 | I can't eat upstairs. |
O사 | I can't eat it because I'm not hungry. |
Company N, Company D, and Company O all came up with the wrong translation, and in the case of Company G, the unit meaning “because there is none” was missing. This is definitely not because the performance of machine translators on the market is poor. It's just that I haven't learned how to speak commonly used in chat. If you change the original text to a refined and clean sentence saying “I can't eat it because I don't have it.” All three types of machine translators (even if they can't translate the hidden nuance of “the food I like very much”) will derive a translation that is faithful to the content of the original text.
원문 |
없어서 못 먹지요. |
G사 | I can't eat it because I don't have it. |
N사 | I can't eat it because I don't have it. |
D사 | I don't have it, so I can't eat it. |
O사 | I can't eat because it's not here. |
Other than that, if you experiment by adding various sentences, you can quickly see that traditional machine translators are weak in online colloquial language.
“Data” holds the secret key to the weaknesses of machine translators on the market.
The language model basically reproduces the taste you've eaten. The language model that has eaten Korean reproduces Korean, and the language model that has eaten English reproduces English. This also applies to subtle changes in tone. Models that have learned a lot of colloquial language have good performance in colloquial language, and models that have learned a lot of written language have good performance in written language. However, online colloquialism, which is our challenge, has gone beyond the “subtlety” level, and usage patterns are changing dramatically. It's not unreasonable for machine translators on the market to be perplexed. From the perspective of a language model, it probably feels like a mixture of Hangul and foreign words. The solution to this is actually simple. In Rome, you can learn Roman law, and online colloquial language online.
However, learning colloquial language online is actually not a trivial task. The simple word “dog” is also expressed in various ways, such as “puppy mouse,” “rat,” “idiot,” and “dang.” The same goes for the word “I love you.” In order to convey this meaning, various anomalies such as “Sharanghae,” “Suranghae,” “Suranghae,” and “Suranghae” are distributed on the internet, and changes in the ending are added to this depending on the position and usage of the verb. It's like this even if you just look at the word level, but if you go to the sentence level, the range of changes is beyond imagination. To convey the meaning of “I love dogs,” there are already 35 ways to write just by combining the anomalous words given as examples. And all of these 35 variations have only one meaning: “I love dogs.”
There are two ways you can try to deal with this unusual online colloquy. The first is to transform anomalous data into batch refined data, and the second is... <2편에 계속>
Good content to watch together
🔗 How to respond to online colloquial language 2. Answers to “Wow Dang Dang Hae ❤️” selected by ChatGPT
Editor l Researcher Ko Won-hee
wonhee.go@twigfarm.net