A Lively chat platform via Live translation
Getting started
In order for AI to respond to online colloquialisms such as “boo dang boo hae ❤️”, two methods can be taken. The first is to refine anomalous data in batches. For example, in the example above, can you replace it with a refined sentence like “I love my dog.”
The advantages of this method are advantages. Most existing models are refined with refined grains. If anomalous expressions can be changed to refined expressions, existing models and data can be used. However, as advantages are, so are the advantages. The point is that the more anomalies, the harder it is to change to a refined expression.
As specific words or memes are widely used online, there is a suggestion for the number of derived words to increase, and the degree of change intensifies. There are places online where the word “dang dang” explains when a cat and dog are combined, or the word “golden-dang” is a word meaning a golden retriever. “Dang Dang” is also written as “Dang Dang” and “Dang Dang.” The key is whether all of these anomalies can be changed to “puppies,” and in the worst case, continuous online monitoring is determined to check for anomalies that appeared from time to time.
The second way to respond to anomalous online colloquy
Another option is to watch matches with huge highlights of data. And this is the method ChatGPT discussion. If a specific word argues in terms or tens of pieces of data, whether it's “dang,” it's possible to infer the meaning of the word based on that data.
The advantage of this method is that it can respond flexes to another anomaly, and the model can notice that affects to which the anomaly is applied are different from normal reactions. What I mean is that if you present online colloquial language to ChatGPT and ask them to say it in an easily read manner, you can recognise that the online colloquy is a sentence that is extremely strangely modified and find the original meaning. (Source: https://www.insight.co.kr/news/430720 )
The downside is that a huge amount of data and computing power is required to use this method. And this is why does it matter with online colloquialism is correct.
However, if there is a mountain, some people can climb it. Letter's discussion create a translator specialized in online colloquialism by engaging two techniques to propose a model with good performance even with little data. One such method is data augmentation (data augmentation). Data augmentation is a method of studying various learning data by studying an existing dataset in various ways. It is intentionally used in the field of computer vision (image processing). Even if the image is zoomed in or rotated slightly, the computer explains that the modified image is different from the original. Various methods can be varied, such as rotating (rotating), flipping (rotating), flipping (zooming), shifting (shifting), and changing the brightness or color of the image.
In comparison, the augmentation of data in languages is limited. This is because cats are cats when they are backwards, but languages are not.
“Hello” vs “Yose Ha Ning An”
In languages, four typical methods are used. The data is called by suspected specific words with synonyms (synonym replacement), random deletion/ random words (random deletion/ random swap), changing the position of any two words (random swap), or reverse translation (back translation).
However, not all four of these fit well into the Korean language. Since reverse translation is a method proposed in the translation project, it was left out of the question, and as an empirical experiment, it was found that the processes of RD (Random Swap) and RS (Random Swap) are appropriate for normal Korean language corpus. Special care must be taken when using the remaining SR (Synonym Replacement) or RI (Random Replacements). (Source: https://github.com/catSirup/KorEDA/tree/master/)
Here, the letter explains that online colloquialism can generate synonyms even with a difficult mechanical processing. Thanks to the meaning of using words with the same meaning in multiple expressions, data can be called rather big. Various types of noise were added to the original text to increase the size of the data. In addition to this augmented data, we were able to confirm dramatic results as a result of learning a customized translator by adding a special secret unique to the letter Both the Korean-Chinese translation model and the Korean-Japanese translation model surpassed all three other translation services.
THE SPECIFIC WAY I MADE IT IN EPISODE 3...
Good content to watch together
🔗 How to respond to online colloquial language 1. Refine anomalous corpus data
Editor l Regret Ko Won-hee
wonhee.go@twigfarm.net