The SOL project is Twig Farm's LETR team NIA and Multi-campusIt is a program that works hand in hand to support the development of excellent artificial intelligence talents. This is an opportunity for trainees to experience practical business projects and experience the LETR team's research and development culture up close.
In the first phase of the project, 9 prospective developers have been with the LETR team for over 3 months since August. We were divided into 3 groups to work on a team project so that we could experience the actual scene more closely. I also had time to interact and collaborate with LETR team researchers in the industry through project mentoring.
I heard the stories of the 9 participants in the first round, which finished a while ago. I think they all felt differently while actually working on the project. He shared an honest story about what he learned, thought, and felt through this experience.
In the second order, I would like to introduce the stories of three people, Taekhyun, Gu Hong, and Ji Soo, who participated in the “Classification of Profanity and Hate Speech” project. Also, if you haven't watched the last 'Natural Korean' team interview, I recommend reading it together.
Planting the seeds of tomorrow's AI developers, SOL Project Intern Interview 1
Artificial intelligence is the first (?) Because
Hallo Please introduce yourself!
Taekhyun: Hallo I am Kim Taek-hyeon, who participated in the 'Classification of Profanity and Hate Expression' project. I majored in English Language and Literature at Kookmin University, and was very interested in pragmatics and syntactic theory in linguistics. As I came across natural language processing, which has a lot to do with linguistics, I also started studying artificial intelligence.
Gum red: I graduated from the Faculty of International Liberal Arts at Waseda University in Japan. I came back after living abroad for about 10 years in both the US and Japan.
Index: I majored in computer science. I am interested in artificial intelligence, so I took a natural language processing course on multiple campuses.
Why did you become interested in artificial intelligence, and natural language processing in particular?
Taekhyun: Originally, my goal was to go to a master's degree in linguistics, but I learned about natural language processing while taking a software-related liberal arts class. I enjoyed taking syntactic classes at the time, and I was curious about how artificial intelligence could solve the problem of the meaning of sentences. “Is it possible to have a conversation at the level of talking to a real person?” , “Is it possible to make difficult sentences that are expressed differently from the intention of the utterance, such as sarcasm?” I started natural language processing with the same question.
Gum red: I've always been interested in the IT field, but naturally I also became interested in artificial intelligence. I learned natural language processing by taking a multi-campus curriculum.
Index: I happened to be involved in an industry-academia collaboration project in the field of artificial intelligence at school. The results were interesting at the time, and I think I've been interested in artificial intelligence since then. The project I participated in at that time was in the field of vision, and I wanted to experience more diverse fields, so I took a multi-campus natural language processing course.
After experiencing the SOL project
What led you to apply for the SOL project?
Taekhyun: Although high-quality data such as AI Hub and Everyone's Words are publicly available, it seemed that what the public could handle was limited. However, since the business can directly touch corporate data and receive support from computing resources, I was hoping that more diverse research would be possible. From that point of view, I think the SOL project was a great opportunity to develop research capabilities.
Gum red: I was curious about how projects are carried out in the actual business, and I applied because I wanted to experience it firsthand. I thought I could learn more by using company-level resources and working with mentors in the business.
Index: I learned about it while taking a natural language processing course on multiple campuses. During the course, I had the opportunity to talk with mentors from the Twig Farm LETR team. The research field was interesting, and the atmosphere seemed good. I applied for the SOL project because I thought it would be a great opportunity to experience practical work with mentors in the industry.
What was your experience with the SOL project?
Taekhyun: We aimed to be a profanity and hate speech classifier that can respond to hate speech and offensive comments that often appear in community, social media, and news article comments. First, relevant prior research was analyzed, appropriate data was collected, and the classifier was completed by fine-tuning a pre-learning model using a total of 96 million pieces of data into a Korean hate speech dataset.
Also, we uploaded our pre-trained model and the profanity and hate speech classification model to Hugging Face (Hugging Face) and distributed it as an open source so that interested people can use it. We've also created a demo site where users can test the detection of profanity and hate speech by entering their own sentences.
Gum red: A BERT-based pre-learning model was created, and a classifier for profanity and hate speech was created by fine-tuning. We benchmarked Soongsil-BERT made at Soongsil University, and adapted it according to our project by referring to other examples. There were many errors in the middle, and time was delayed, but I was able to solve them one by one to create the output.
Index: We found and analyzed prior studies and datasets to refer to, and studied models to implement pre-learning models. I also finished pre-learning and fine-tuning through EDA (exploratory data analysis) and pre-processing.
Thanks to the development on the server, I was able to access various Linux commands, but the 'screen' command still remains in my memory. Maintaining a session without a server connection was really useful. 👍 Also, the model used for pre-learning was a BERT-based model, and I worked in a GPU environment using PyTorch and TensorFlow, which are representative deep learning frameworks. Ah! The final pre-training model and fine-tuning model are shown on Huggingface. 😊
What did you feel while working on a project with your team members?
Taekhyun: I realized that nothing would just go according to plan. I usually make plans and implement them to some extent, but this time I had to spend a lot of time building a pre-learning model and catching unexpected errors. Still, I was very perplexed when an error occurred at first, but over time I got used to it, and I was able to resolve issues by discussing with team members and mentors. During this process, I was also able to learn how to search for error-related information and how to ask questions in a clear manner.
Gum red: I thought I was still lacking, but I was able to learn a lot with the team members. It was very disappointing because it wasn't very helpful when modifying the pre-learning code to suit the project, but it was nice to feel like I learned a lot in the process. As a result of participating in this project, I decided to further improve my coding skills, which were lacking.
Index: There are always things we miss when working on a project, but thanks to working with the team members, I was able to feel what it means to fill each other's gaps. I was able to share what I know, learn what I don't know, and have a valuable experience of creating good results together.
What was the most memorable thing you did when finishing a project?
Taekhyun: From the first time I met the team members until the last moment, everything was a precious memory that I will never forget. They were friends my age, so they were able to talk well and work in a relaxed atmosphere, and we were able to have many conversations with each other and build a sense of closeness. No one was lazy, and I think we were able to get good results thanks to the fact that we were able to get along well with each other. Thanks to being with passionate and challenging people, I learned a lot and had fun.
Gum red: It was great to be able to present the results of a project I participated in directly. There were a lot of errors during the process, but I remember feeling really proud to see how things worked well in the end with the team members after making corrections until dawn.
Index: What I remember most is having to change my work environment 3 times. There were a lot of big and small errors while building the pre-learning model. Among them, the biggest error was OOM (Out Of Memory), and the memory in the server environment couldn't handle prior learning, so they moved to another server and looked for ways to make the most of their resources. In the end, I applied for a high-performance computing server provided by AI Hub, so I changed the work environment 3 times in total. 😂 Thanks to this, I realized that pre-learning is a task that requires a lot of resources. Thanks again to the LETR team for securing high-performance computing resources and helping us finish well!
The SOL Project and beyond
What fields do you want to challenge and what goals do you want to achieve in the future?
Taekhyun: I want to research a conversation system that allows people to communicate smoothly with artificial intelligence. The goal is to develop a conversation system that can interact based on artificial intelligence that enables discourse at the level of talking to real people, artificial intelligence that even generates sentences using difficult expressions such as sarcasm, and common sense.
Gum red: The fields I want to challenge are machine learning and deep learning. While doing natural language processing, I learned that it was really difficult and didn't suit me. I enjoyed learning machine learning and deep learning on multiple campuses, so I'm planning to go to graduate school in that field.
Index: After personally experiencing both image processing and natural language processing, I became interested in image captioning, which combines these two fields. Examples include automatic subtitle generation and plot summaries of video content! In this way, my dream is to research and develop services that help someone with convenient technology.
Finally, please leave a message for the juniors who will experience the project in the future.
Taekhyun: As it is a free atmosphere, I think individual effort is important. Especially if you don't have a specific goal, you have no choice but to get lost a lot in the beginning. It's important to communicate with team members, define the problem, and first draw the big picture of what results will be produced.
Also, I recommend bothering your mentors a lot! They have more experience in research and development than us, so don't hesitate to ask questions actively.
Gum red: You'll learn and feel a lot by beating around and messing around. If you work while being considerate of each other so as not to harm team members, you will be able to produce good results.
Index: I recommend recording the progress step by step. You'll probably run into various errors, but using a tool like Notion to organize the solutions will definitely help you later.
Good content to watch together
Why does LETR, a language processing engine, focus on text languages?Planting the seeds of tomorrow's AI developers, SOL Project Intern Interview 1