NER's present and future: 01. From concepts to diverse approaches

2024-07-04

‍

This post has been updated to match the latest trends as of 2023, so please refer to the article below.

NER's Present and Future Ver. 2: Korean NER Data Set Summary

‍

‍What is NER?

NER (Named Entity Recognition)Is literally Named Entity (an object with a name) Recognition (recognition)It means doing, Object name recognitionIt's called.

‍

Definition of NERIs Korea ICT AssociationProvided by ICT GlossaryAccording to:

A technique for recognizing words (object names) corresponding to a pre-defined person, company, place, time, unit, etc. from a document and extracting and classifying them. The extracted object names are classified by person (person), place name (location), organization name (organization), and time (time). Object name recognition (NER) begins for the purpose of extracting information and is used for natural language processing, information retrieval, etc.

※ Example: Cheol-su [person name] promised to meet Young-hee [person name] at Seoul Station [name] at 10 o'clock [time].

‍

Meanwhile, in the paper 'A Survey on DeepLearning for Named Entity Recognition' [1], it is explained as follows.

“NER is the process of classified and classified named entities in text into categories of entities.”

In other words, in practical terms, it can be described as “a multi-class classification task that takes a string as input and spits out the corresponding tag for each word.”

Then NE (Named Entity, hereafter NE)What is it?

NE appeared as a meaning encompassing not only institution names, people, and places within strings, but also currency, time, and percentage expressions. Since then, it has varied slightly depending on the researcher, but it has generally been recognized that proper names, such as people's names and place names, correspond to this.

‍

Additionally, there are two types of NE as defined above.

First, Common object names (generic nES)Names such as people or places fall under this category.

Second, Domain-specific entity names (domain-specific nES)This is a term in the field of expertise.

‍

As an example Twig farmIn, the first type is processed through a learned NER algorithm, and the second type is processed through a pre-defined glossary (translation memory, Translation Memory*) to improve translation quality.

‍

* translation memory (TM), https://ko.wikipedia.org/wiki/번역_메모리

‍

Why NER is needed

NER plays an important role throughout natural language processing (NLP). It is used in various fields such as information retrieval and summarization using natural language processing, answering questions, and building a knowledge base. [2] In particular, it improves the quality of machine translation (Machine Translation, hereinafter MT) and helps provide customized translations to users.

For example, if “TWIGFARM” is interpreted literally, it is interpreted as “tree branch farm” rather than “twig farm.” However, this not only causes translation errors, but can also cause an uncomfortable experience for users. On the other hand, if TWIGFARM can be properly recognized as a company name, not only translation quality but also user experience can be improved.

It has been known since the beginning of the introduction of MT that MT quality can be improved through the NER process. According to Babych and Hartley, “If an object name is misinterpreted as a common noun, the sentence itself becomes difficult to understand, and it costs a lot of money to correct it.” [3] Ugawa and others have also visibly demonstrated that the translation quality of results is improved when NER is used in neural network-based machine translation (NMT). [4]

‍

NER performance evaluation index

NER evaluates performance using precision (precision), recall (recall) *, and f1-score*, and evaluates in token (token) * units rather than sentence units.

‍

For example, when there is a sentence called “I work at TWIGFARM.”, the sentence is tokenized* first to indicate that only the specific word “TWIGFARM” is an object name. (There are several ways to tokenize, and results may vary depending on the tool you use.)

The example sentence is divided into 5 elements, such as' I ',' work ',' at ',' TWIGFARM ', and'. ', and each of these 5 will be evaluated.

‍

* Precision and recall, https://en.wikipedia.org/wiki/Precision_and_recall

* F1-score, https://en.wikipedia.org/wiki/F-score

* Token (Token), https://terms.naver.com/entry.naver?docId=857716&cid=42346&categoryId=42346

* Tokenization, https://en.wikipedia.org/wiki/Lexical_analysis#Tokenization

‍

NER tagging systems and labels

‍

NER divides sentences into token units and tags (tagging*) each of these tokens to distinguish whether it is an object name or not. On the other hand, there are cases where a single object name is completed by combining multiple tokens rather than a single token, as an example of a statement in English culture. For example, “Michael Jordan” is one object name rather than two object names.

And it was introduced to group these multiple tokens into a single object name tagging systemThis is it. There are two types of tagging systems: BIO systems and BIESO systems, and BIO systems are mainly used in practice.

First, the BIO system adds “B- (begin)” when the object name starts, “I- (inside)” when the token is in the middle of the object name, and “O (outside)” if the token is not an object name.

Meanwhile, the BIESO system adds “B- (begin)” when the object name starts, “I- (inside)” when the token is in the middle of the object name, and “E- (end)” when it is placed at the end of the object name. Also, when a token is an object name, 'S- (singleton)' will be added, and if the token is not an object name, 'O (outside) 'will be added.

‍

Examples of sentences tagged according to the BIO system

‍

Examples of sentences tagged according to the BIESO system

‍

As above, tags excluding 'O', that is, 'B-', 'I-', 'E-', and 'S-' are followed by a label that can recognize what kind of object name each token is. For example, “PER” is added for people, and “ORG” is added for organizations. However, there is no fixed standard for the types and names of labels, and researchers select them randomly according to the nature of the project.

‍

*tagging (tagging), https://terms.naver.com/entry.naver?docId=865370&cid=42346&categoryId=42346

‍

Various approaches to NER and introduction of deep learning

Even before Deep Learning (Deep Learning) * was introduced, there have been various attempts to solve NER challenges. Although deep learning models are mostly used now, there were also many models that combined traditional approaches. Prior to deep learning, there were the following three typical types of approaches.

‍

(1) Rule-based approach (Rule-based approach)

: Apply a domain-specific (domain-specific) dictionary (gazetteer*) or apply a pattern to access it.
: Shows a low reproduction rate compared to high accuracy. In particular, if you go to a different domain, it doesn't work.

‍

(2) Unsupervised Learning* Approach (Unsupervised Learning)

: Learn by clustering (clustering) * based on contextual similarity.

: They also proposed an unsupervised system (unsupervised system) to create a gazetteer (gazetteer). Compared to supervised learning*, this relies on a glossary, corpus (Corpus) * statistical information (idf or contextvector), or a shallow level of syntactic (syntactic) * knowledge.

‍

(3) Variable-based supervised learning approach (feature-based supervised learning approach)

: Moving on to supervised learning, NER moves to the multi-class classification (multi-class classification) * or sequence labeling task (sequence labeling task) * area.

: Since it's feature-based (feature-based), what the feature will become is a very important issue.

: Hidden Markov Models (HMM) *, Decision Trees*, Maximum Entropy Models*, Support Vector Machines (SVM) *, Conditional Random Fields (CRF) *

: The SVM model does not consider neighboring words when predicting entity labels, while CRF is taken into account.

‍

However, nowadays, most solutions to NER are solved using deep learning, which has more advantages. There is no need for separate variable processing (feature-engineering*), and more complex and sophisticated characteristics can also be learned compared to linear models. In particular, deep learning has the advantage of being able to implement an end-to-end (end-to-end) model where data can be inserted and results can be obtained immediately without going through a series of processes.

‍

* Deep learning (Deep Learning), https://ko.wikipedia.org/wiki/딥_러닝

* gazetteer, https://en.dict.naver.com/#/entry/enko/dd365c3160f64057bd0a10475bb2ea43

* unsupervised learning, https://ko.wikipedia.org/wiki/비지도_학습

* Clustering (Clustering), https://terms.naver.com/entry.naver?docId=817915&cid=50376&categoryId=50376

* Supervised learning(supervised learning), https://ko.wikipedia.org/wiki/지도_학습

* Corpus (Corpus), https://terms.naver.com/entry.naver?docId=2070828&cid=50376&categoryId=50376

* syntactic (traditional), https://ko.dict.naver.com/#/entry/koko/92d960065ec64bb6aeb267b08683b058

* multi-class classification (multi-classclassification), https://en.wikipedia.org/wiki/Multiclass_classification

* sequence labeling task (sequence labeling task), https://en.wikipedia.org/wiki/Sequence_labeling

* Hidden Markov Models (HMM), https://ko.wikipedia.org/wiki/은닉_마르코프_모형

* Decision Trees, https://ko.wikipedia.org/wiki/결정_트리

* Maximum Entropy Models, https://ko.wikipedia.org/wiki/최대_엔트로피_마르코프_모형

* Support Vector Machines (SVM), https://ko.wikipedia.org/wiki/서포트_벡터_머신

* Conditional Random Fields (CRF), https://ko.wikipedia.org/wiki/조건부_무작위장

* Variable processing (feature-engineering), https://en.wikipedia.org/wiki/Feature_engineering

‍

Until now 'NER's present and future“The first topic in the series about”From concepts to diverse approaches“It was. The second topic of this series is'Model structure and data set status', third topic'Future development direction and goalsIt will continue until”

‍