Recently, the second phase of the SOL project, which lasted three months, was completed. Two interns from Hankuk University of Foreign Studies began their first journey to become natural language processing (NLP) experts with the LETR team. Just like in the first year, the interns who participated had time to interact, collaborate, and experience practical projects with LETR team researchers through mentoring.
The SOL Project is recording these precious experiences. So again, I met Lee Yun-jae and Lee Ho-jae, interns who spent the past three months more intensely than anyone else. Check out the stories they shared about what they learned, felt, and thought through this experience.
Everyone is curious. Please introduce yourself briefly!
Yun Jae: Hello! I'm Lee Yun-jae, who participated in the 'Translator Performance Evaluation' project in SOL Project 2. I majored in English Language Convergence Software at Hankuk University of Foreign Studies, and I'm about to graduate. Currently, major concerns include integrating language education with natural language processing, and deriving insights through language data analysis.
Good news: Hello! I am Lee Ho-jae. I also continued my studies on natural language processing while majoring in English at Hankuk University of Foreign Studies, and I plan to continue my studies at Korea University Graduate School later. Prior to this internship, I participated in a chatbot-related project during the 'Data Youth Campus' course at the Korea Data Industry Development Institute.
Why did you become interested in the field of natural language processing (NLP)?
Yun Jae: I served in the military as an Air Force interpreter, and I became interested when I saw news about natural language processing technology at the time. I found it fascinating to be able to combine linguistics and computer science to create services that directly help people who are uncomfortable due to language.
Good news: I was very interested in computational linguistics, which is on the boundary between computer science and linguistics. I thought linguistics could contribute to the development of artificial intelligence. So I mainly studied linguistics in the classroom and computers during the rest of my time. In the process, I naturally became more interested in computers and focused more on studying artificial intelligence and natural language processing.
Please tell us what led you to participate in the SOL project.
Yun Jae: I was invited to listen to a lecture by the Twig Farm LETR team at the Data Youth Campus event held by the school in 2021. After that, I saw an announcement of the recruitment of participants for the SOL project. In practice, I applied because I thought it was a great opportunity to check what topics are being studied, how data is being constructed, and actually experience natural language processing work.
Good news: I was curious about natural language processing in the business. In fact, even before applying this time, I had been studying natural language processing through books and projects for a year and a half. However, I was curious about what kind of projects are being carried out in the actual field, what data is used, and how collaboration and decision-making are carried out, so I supported them.
What did you do with the SOL project?
Yun Jae: The progress of each part of the project was recorded in a notebook. I submitted this as a weekly work report every Monday, and worked on the project while receiving feedback through a Wednesday meeting.
Specifically, I started by looking for previous papers on translator performance evaluation scales, such as QE (quality estimation) and n-gram-based evaluation scales. Next, performance evaluation scales by humans, such as HTER (translation error rate with human-targeted reference) and DA (direct assessment), were first determined, and then parallel corpus data to be used in the project were constructed and refined. After that, I was also in charge of some QE model training, checked the test results, and finished by reflecting them in the final report.
Good news: I've researched and developed a model that can automate “translator quality assessments.” Starting with concerns about 'translation quality', I looked for ways to automate using deep learning. In this process, I came to the conclusion that BERT Score and Sentence-BERT could be used, among others, based on semantic similarity. Afterwards, we proceeded from modeling to fine tuning, and tested and observed how the model evaluates the quality of translated texts based on various situations.
What was it like working on the project with the team members?
Yun Jae: I learned from experience how important it is to set the direction of a project and have a clear division of tasks. In the process, I also reflected a lot. I was also able to feel the difference between artificial intelligence and natural language processing that I learned at school and the knowledge needed in actual business, and it was an opportunity to set a new direction on what to supplement after the internship.
Good news: Above all else, I was able to realize that collaboration and communication are important. When I shared and discussed my thoughts rather than working hard on my own, I often felt that things could go in a good direction I hadn't thought of before. Therefore, I listened attentively to other people's opinions and tried to actively express my own.
What was your most memorable work at the end of the project?
Yun Jae: I realized that even pre-processed data requires careful refinement due to the nature of parallel corpus. There were many twists and turns, such as mistakes made during refinement and errors in the process of using unfamiliar libraries such as PyTorch, but fortunately, I was able to solve them little by little thanks to help from people around me. Finally, I was able to write the final report and look at the entire project, and even felt a great sense of accomplishment.
Good news: Our project is based on a multilingual model. Strangely enough, I put in various sentences such as Korean, English, and Japanese, and it was interesting that sentences with similar meanings actually got a high score. I was curious about how far I could go, so I contacted friends majoring in various languages at the school, such as Chinese, Russian, and French, to get sentences in various relevant languages and put them into the model. And I still remember what was amazing when we all saw the results together 😂
Is there anyone in particular you'd like to say at the end of the project?
Yun Jae: First, I would like to express my sorry and gratitude to team member Ho-jae. The overall schedule was disrupted due to a mistake during data refinement, and there were many parts that were not immediately understood as we entered the actual implementation phase. I was able to learn a lot and find the right direction by talking with Ho-jae every time I faced difficulties.
I would also like to thank manager Kim Hyun-ah and manager Ko Won-hee for their help as LETR team researchers and mentors. We were able to finish well thanks to your consideration in many ways, including advice from various angles on dataset construction and evaluation scale selection.
Finally, I still remember the words of CEO Baek Seon-ho of Twig Farm, who encouraged me to “have a big dream” during the orientation. In the future, I will do my best to become a researcher with skills suitable for big dreams.
Good news: Manager Ko Won-hee, I'm sorry for clearing up the dream of 0.71 correlation coefficient (correlation coefficient, correlation coefficient) after 5 days 😅😥 You helped me a lot as a mentor at the Data Youth Campus, and I was able to finish it well thanks to the core idea of the 'Multidisciplinary Model' at the beginning of the project, thank you!
What fields would you like to tackle and what goals would you like to achieve in the future?
Yun Jae: Since translator performance evaluation scales are a field that has developed rapidly in the last 20 years, prior research is extensive. This time, too, I had to search for a large number of papers in a short period of time, and I was also able to learn about the latest AI architecture and natural language processing research trends. Thanks to this, I have a desire to try more diverse natural language processing projects based on the knowledge I have learned this time. In the future, I would like to do various research such as implementing a Korean dialect translator, natural Korean text generation, stylistic conversion, and speech synthesis.
Also, the long-term goal is to implement and provide solutions that help people who are experiencing difficulties in their language life for various reasons.
Good news: I'm not sure yet, but first I want to go to graduate school and try my hand at research related to common sense reasoning in artificial intelligence. I would like to find ways in which linguistics can help the development of artificial intelligence in the longer term, and conduct research where the two fields can develop together. Since I personally started my studies in linguistics, my ultimate goal is to do research that can connect linguistics and artificial intelligence later.
Finally, please leave a message to the juniors who will be participating in the project in the future.
Yunjae: The most important thing is to keep a systematic record of the progress. Since each topic requires extensive research, each team should record their research in a timely manner in a form where the entire team can provide feedback. That way, you can not get lost in the middle.
Time management is also very important. It's a good idea to set aside time each day to fully invest in the project.
Good news: If you persevere, you will definitely get good results! Even if what you've learned is lost because the Google Colab (Google Colab) runtime is interrupted, or even if you don't know what the paper is talking about, “Why can't I do this?” 👀 Even if you run into an overwhelming situation,... you can persevere and solve it step by step!
everyone! Congratulations in advance for this great opportunity, and I hope everyone finishes the project well!
Good content to watch together
Planting the seeds of tomorrow's AI developers, SOL Project Intern Interview 1Planting the seeds of tomorrow's AI developers, SOL Project Intern Interview 2Planting the seeds of tomorrow's AI developers, SOL Project Intern Interview 3