Who is good at it_ (1) Text summary comparison

2024-07-04

‍

Getting started‍

Since humanity began using languages and characters, the amount of textual information has grown exponentially. The advent of the internet here was like pouring fuel on a burnt house. Thanks to this, humanity's knowledge has become vast, but the time and cost of extracting and using necessary information instead has also increased.

‍

Have you ever had trouble with thousands of pages of books or piles of documents in front of you? As an example “At McKinsey, a global consulting firm, employees spend an average of 1.8 hours (9.3 hours per week) each day searching for and collecting work-related information.”^*Let's say that. Collecting and summarizing texts is a time-consuming and labor-intensive task. As a result, I think people in the past also imagined that it would be nice if machines could summarize long articles on behalf of humans.

‍

But as you know, it actually happened. This is thanks to advances in artificial intelligence, especially natural language processing technology, such as ChatGPT and GPT-4, which have been a hot topic recently. More specifically, Automatic Text Summarization (ATS), which we'll look at this time, was an effective solution.

‍

Now let's take a look at ATS in general and compare the results of selecting and testing a few actual ones.

‍

The goal of ATS

· It compresses important ideas in a long post into a short one.

· Produce meaningful and compelling summaries while retaining important information.

· It helps users understand key content without looking at the original text.

‍

‍Use of ATS

· Book Summarization (Books or Novel Summarization) Since it's not easy to grasp context in a short document, ATS is more suitable for summarizing a relatively long document, such as a novel.

· SNS Summarization (Social Posts or Tweets Summarization) A huge amount of messages are generated every day through social media such as Facebook and Twitter. By using ATS, it is possible to identify useful information in such a large number of messages.

· Sentiment analysis (Sentiment Analysis) Sentiment analysis refers to analyzing people's perspectives, feelings, and judgments about an event or situation. ATS can help experts in various fields, such as market analysts, by summarizing the thoughts of many people.

· News Summarization (News Summarization) ATS can extract the most important points from long news stories.

· Email Summarization Most email content isn't well-structured. ATS can extract common noun phrases and generate summaries using linguistic methods and machine learning algorithms.

‍

In addition, it can also be used for various other tasks, such as summaries of documents in specialized fields such as the following.

· Legal Documents Summarization (Legal Documents Summarization)

· Medical Documents Summarization (Biomedical Documents Summarization)

· Summarization of Scientific Papers

‍

Classification of ATS

‍

Source: A Survey of Automatic Text SummarizedProgress, Process and Challenge

‍

Paper published by IEEE (Institute of Electrical and Electronics Engineers, Institute of Electrical and Electronic Engineers)^[1]According to, the summary method and the structure for classifying algorithms are as above. Broadly speaking, they can be classified in two ways: extractive (extractive) and generative (abstractive).

‍

Extractive Text Summarization

Construct a summary using the sentences in the input text. First, the text is cut into sentences, points are assigned to each sentence according to importance, and the summary work is carried out in the order of selecting the sentences with the highest scores. Since the summary results are created by combining sentences in the input text, the results are relatively stable.

‍

Abstractive Text Summarization

The model generates a new summary based on the content of the input text. First, it is conducted by analyzing the main content of the text and making a kind of paraphrase for summary purposes. Therefore, although the summary results contain the content and meaning of the original text, the summary itself consists of words and structures different from the original text.

‍

Compare summary results

‍

We compared the results obtained using several summary APIs, including ChatGPT, which has been attracting attention recently. (ChatGPT, NaverCloud, Microsoft Azure, ActionPower, a graph-based hybrid model of extractive and abstractive methods, etc.)

‍

Also, we selected conversations that were not normal documents as input text, and conversations using messengers from among them. I wanted to check if I could extract context from short texts while understanding expressions that are not grammatical. As a result, we were able to clearly see the difference between the extractive method and the abstractive method.

‍

Note that the conversation used in the analysis used the Korean conversation summary dataset published on AI Hub. Two different conversation contents were used together, and the actual data entered was as follows.

‍

P1: Ah, but I saw cake companies

P1: The central east side is delicious

P1: I think the design is just that

P2: Yes; I think so too

P3: Gojan-dong cake company also hates delivery

P3: Ansan red bean paste cake

P1: Ah, how long is that?

P1: If you don't have the right time, you won't be able to buy it again

P3: Coordinate the time while making a reservation there

P3: If you do it, they'll bring it to you

P2: I think this place will be fine

P1: Oh yeah, I think that's fine too

P4: I need to buy a toothbrush

P4: Should I buy it without payment?

P2: They even asked me what toothbrush to buy lol

P4: Agwang toothbrush

P4: Because I want to buy again

P4: I change my toothbrush every 3 months...

P3: If it's time to change, buy a good one, so change it

P3: I'm not yet

P3: More than dam

P4: Toothbrushes only last 3 months

‍

And the summary results for each model were as follows:

‍

ChatGPT

In a conversation between P1, P2, P3, and P4, P1 saw a cake company and said the taste was good but the design was plain. P2 said they had the same idea as P1, and P3 said that Gojan-dong's cake company would also deliver it, and recommended the red bean paste cake in Ansan. P4 said they should buy a toothbrush, and they wanted to change it to a good one when the time came.

‍

NaverCloud

I think so too P3: Gojan-dong cake company also delivers in Haejundae P3: Ansan Angeum Mochi Cake P1: Ah, how long is that? P1: If the time isn't right, you won't be able to buy it again P3: Adjust the time while making a reservation P3: If you do it, they'll bring it to you P2: I think this place is fine P1: Ah yes\nP2: I even asked what toothbrush to buy lol P4: Ah, that king toothbrush P4: I want to buy it again, so P4: I change toothbrushes every 3 months... P3: I change toothbrushes every 3 months... P3: I change the toothbrush every 3 months. Damn P4: Toothbrushes only last 3 months

‍

MS Azure

P1: Ah, but I saw cake companies, but P3: The cake company in Gojan-dong didn't do the delivery P1: If the time wasn't right, I wouldn't be able to buy it again, and P4: I change my toothbrush every 3 months...

‍

MS Azure (abstractive)

P1: There are a lot of cake companies redistricting.

‍

ActionPower (large)

If you say that the design is just like that, they say they think so too. \nI'm asking how long the time is. ”, “If the time isn't right, they say you won't be able to buy it again, and they say you have to adjust the time while making a reservation. \nI think it would be fine here just because Angan would bring it to me. \nI want to buy it as a shag, but I want to buy it as a shag. ' , “I want to buy a toothbrush, but I'm wondering if I should buy a royal toothbrush. \nWhen it's time to change, ask them to change it because it's a good idea. \nIt is said that even if you use more, it will become a dam.

‍

ActionPower (medium)

I saw cake companies, but the Jungang-dong one was delicious, and the design was just like that. \nI have to adjust the time while making a reservation, so I can't buy it if the time isn't right. ', 'I need to buy a toothbrush, but I want to buy one. \nIt is said that toothbrushes last for 3 months, but you can use them longer.

‍

ActionPower (small)

I looked at cake companies, and they said that the Jungang-dong one was delicious and the design was just like that. \nWhen it's time to change the toothbrush, ask them to buy a good one and change it.

‍

First of all, the summary results of ChatGPT are relatively natural and seem to grasp the context well. However, in the original text, P3 said, “If it's time to change the toothbrush, buy it as a good one,” but the summary results were not accurate.

Since the summary model supported by Naver Cloud and MS Azure is an extractive method, the conversation summary did not show satisfactory results.

MS Azure provides an abstractive method as a preview, and as shown above, the performance of the Korean summary is poor.

Meanwhile, I checked ActionPower's summary model by setting a different number of sentences in the summary results. As a result of the summary, I was able to confirm that when the number of sentences was set to large, awkward sentences were created, such as “I want to live in a hurry, but I want to live in a daze.” and “It is said that even if I write more, it becomes a dam.” Although such sentences disappeared in Medium and Small, there was a common limitation that it was difficult to distinguish between conversation subjects.

‍

Since the above is a summary of Korean conversation, I also compared the results of summarizing English conversation. English conversation data is the DailyDialog dataset^**It was taken from, and the actual input data is as follows:

‍

P1: Oh, my God! I've been cheated!

P2: What? What did you buy?

P1: It's a brick! I can't believe how stupid I was. Damn it!

P2: What is it? Why did you buy a brick?

P1: There was a guy on the sidewalk. He had three new boxes, Panasonic video camera boxes. He said he had to get rid of them quick.

P2: And?

P1: So he said he'd sell me one for twenty dollars. A four-hundred-dollar camera for twenty bucks.

P2: And it's a brick?

P1: Yes.

P2: I Can't Believe How Stupid You Are. Why didn't you open the box?

P1: Iwanted to. But he said, no, the box is still in its plastic wrap from the store. If it's opened, other people won't believe it's new.

P2: So he had the boxes wrapped in plastic.

P1: Yes, it was the kind of perfect plastic wrap that comes with new products.look, it even had a bar-code price sticker on it! It introduced brand new.

P2: But why would a guy be selling new video cameras for twenty dollars?

P1: He said he had to get rid of them. They were Stolen.

P2: Aha! Now I understand. You were trying to buy Stolen Goods on the Sidewalk. So, what I think is, you thought to get cheated!

P1: C'mon, don't blame me! I thought it was a new camera. We could use it. We were even thinking about buying a camera. I don't understand how he could have such a perfect looking box.

P2: That's no problem. He just has a friend who works in a shop that can do plastic wrap. That's easy. And despite it's the same shop he got the boxes from. Maybe his friend works in an electronics store. They sell video cameras thee.they had a couple video camera boxes lying around. So they get the idea of wrapping bricks to sell to fools like you. Twenty dollars a Brick.That's a good profit if they can sell enough bricks.

P1: You're Right, Sarah I was a fool. I can't believe it. How could I fall for such a trick?

‍

The short summary is that P1 bought a box with a $400 video camera on the street and met someone who sold it for $20, but when they got home and unpacked the box, there were bricks inside.

‍

And here's a summary of the results:

‍

ChatGPT

Person 1 was scammed into buying what they thought was a Panasonic video camera for $20 from a man on the sidewalk, but it turnedout to be a brick. Person 2 person 1 for not checking the contents of the box and supposing the man had the theft of the boxes from an electronics store, wrapped bricks in them, and was selling them for a profit. Person 1 realizes they were foolish to fall for the trick.

‍

MS Azure (abstractive)

P1 tells P2 he bought a brick from a guy on the sidewalk. P2 says he says to get cheated.

‍

ActionPower

There was a guy on the sidewalk. He had three newboxes, Panasonic video camera boxes. So he said he'd sell him one for twentydollars. a four-hundred-dollar camera for twenty bucks. Why didn't you open the box? But he said, no, the box is still in its plastic wrap from thestore. but why would a guy be selling new video cameras for twenty dollars? HE JUST HAS A FRIEND WHO WORKS IN A SHOP THAT THEM DO PLASTIC WRAP. SO THEY HAD A COUPLE VIDEO CAMERA BOXES FROM. THAT'S A GOOD PROFIT IF THEY SELL ENOUGH BRICKS.

‍

ChatGPT's summary results looked natural overall (similar to the Korean summary results), but there were also some factual differences.

MS Azure's abstractive summary results showed much better results than in Korean, but the content was so lost that it was difficult to grasp the context.

Meanwhile, ActionPower's model is a source published on GitHub (different from the Korean conversation summary) (https://github.com/seongminp/graph-dialogue-summary) was used. However, the public version was probably developed in the early days, and it seemed more unnatural than the summary of the Korean conversation.

Additionally, I made another request to ChatGPT with a limit on the amount of summary results. I requested a summary of the same conversation, and the prompts I entered were “Summarize the next conversation in three lines in Korean” and “Summarize this conversation to three discussions.”

‍

ChatGPT

1. P1 I saw a cake company that was delicious and had just the right design. 2. For P2 and P1, I think a cake company on the central east side would be fine. 3. P3 will also deliver to a cake company in Gojan-dong and bring it to you when the time is stable. I need to buy a P4 toothbrush, but I want to change it because it lasts 3 months.

‍

ChatGPT

Person 1 (P1) bought a product that was meant to be a video camera for $20 from a guy on the sidewalk. The product was in ABRAND NEW BOX AND WRAPPED IN PLASTIC WITH A BARCODE PRICE STICKER. Person 2 (P2) Sweets P1 for trying to buy Stolen Goods and Being Foolish for Notchecking the Contents of the Box.

‍

As above, you can see that it was tailored to the user's requirements as much as possible. However, we were also able to confirm some factual distortions or omissions.

‍

While finishing

‍

So far, we've compared summary results using several models, including ChatGPT. I was able to confirm that it provided plausible summary results even though it was an everyday conversation rather than in prose form. However, we were able to confirm that ChatGPT, which is currently considered one of the most advanced artificial intelligences, is not perfect yet, and the other models have their own unique strengths. As a result, it seems difficult to determine which model is the best, and an approach of selecting and applying the one that suits the usage situation is necessary.

‍

* Quote https://www.aitimes.com/news/articleView.html?idxno=140791

** http://yanran.li/dailydialog

‍