跳转至

NLP文本摘要实操 - 具有生成示例的实用指南

NLP 文本摘要是汇总大文本中信息以加快消费的过程。在本文中,我将引导您通过传统的提取方法以及先进的生成方法,在 Python 中实现文本总结。

内容

  1. 简介
  2. 文本摘要类型
  3. 使用Gensim做文本摘要
  4. 使用sumy做文本摘要
    • LexRank
    • LSA (延迟语义分析 )
    • Luhn
    • KL-Sum
  5. 什么是生成式文本摘要
  6. T5 Transformers文本摘要
  7. BART Transformers文本摘要
  8. GPT-2 Transformers文本摘要
  9. XLM Transformers文本摘要

Spacy NLP 课程:使用 Spacy 进行行业级自然语言处理。了解如何设置 Spacy、NLP 中的令牌化、基于规则的匹配、POS 标记和 Word 2 矢量。训练 Nlp 模型, 并与斯派西和拉萨建立聊天机器人。

介绍

打开新闻网站,是否开始阅读每篇新闻文章?可能不是。我们通常浏览简短的新闻摘要,然后阅读更多细节(如果有兴趣)。简短、翔实的新闻摘要现在无处不在,如杂志、新闻聚合器应用程序、研究网站等。

嗯,可以自动创建摘要,因为新闻来自世界各地各种来源。

从原始巨文中提取这些摘要而不丢失重要信息的方法称为 文本摘要 。摘要要流畅、连续、生动地描述重要性,这一点至关重要。

事实上,谷歌新闻,短片应用程序和各种其他新闻聚合器应用程序利用文本摘要算法。

在这篇文章中,我讨论并使用各种传统和先进的方法来实施自动文本摘要。

文本摘要类型

文本摘要方法可分为两大类: 提取和生成。

  • 提取文本摘要

这是先开发的传统方法。主要目标是确定案文的重要句子,并将其添加到摘要中。您需要注意,获得的摘要包含原始文本中的准确句子。

  • 生成式文本摘要

这是一个更先进的方法,许多进步不断出现频繁(我会涵盖一些最好的在这里)。方法是确定重要部分,解释上下文,并以新的方式再现。这确保了核心信息通过尽可能短的文本传达。请注意,这里生成的摘要中的句子,而不仅仅是从原始文本中提取的。

在接下来的章节中,我将讨论不同的提取和生成方法。最后,您可以比较结果,并了解每个方法的优势和局限性。

文本摘要使用Gensim与TextRANK

gensim 是一个非常方便的Python库, 用于执行 Nlp 任务。使用库的文本摘要过程 gensim 基于 TextRANK算法

什么是 TextRANK算法

TextRANK是一种提取摘要技术。它基于这样一个概念,即高频词具有重大意义。因此,包含频繁用语的句子很重要。

在此基础上,算法为文本中的每一个句子分配分数。排名靠前的句子可以汇总。

考虑下面关于垃圾食品的文章。

original_text = 'Junk foods taste good that’s why it is mostly liked by everyone of any age group especially kids and school going children. They generally ask for the junk food daily because they have been trend so by their parents from the childhood. They never have been discussed by their parents about the harmful effects of junk foods over health. According to the research by scientists, it has been found that junk foods have negative effects on the health in many ways. They are generally fried food found in the market in the packets. They become high in calories, high in cholesterol, low in healthy nutrients, high in sodium mineral, high in sugar, starch, unhealthy fat, lack of protein and lack of dietary fibers. Processed and junk foods are the means of rapid and unhealthy weight gain and negatively impact the whole body throughout the life. It makes able a person to gain excessive weight which is called as obesity. Junk foods tastes good and looks good however do not fulfil the healthy calorie requirement of the body. Some of the foods like french fries, fried foods, pizza, burgers, candy, soft drinks, baked goods, ice cream, cookies, etc are the example of high-sugar and high-fat containing foods. It is found according to the Centres for Disease Control and Prevention that Kids and children eating junk food are more prone to the type-2 diabetes. In type-2 diabetes our body become unable to regulate blood sugar level. Risk of getting this disease is increasing as one become more obese or overweight. It increases the risk of kidney failure. Eating junk food daily lead us to the nutritional deficiencies in the body because it is lack of essential nutrients, vitamins, iron, minerals and dietary fibers. It increases risk of cardiovascular diseases because it is rich in saturated fat, sodium and bad cholesterol. High sodium and bad cholesterol diet increases blood pressure and overloads the heart functioning. One who like junk food develop more risk to put on extra weight and become fatter and unhealthier. Junk foods contain high level carbohydrate which spike blood sugar level and make person more lethargic, sleepy and less active and alert. Reflexes and senses of the people eating this food become dull day by day thus they live more sedentary life. Junk foods are the source of constipation and other disease like diabetes, heart ailments, clogged arteries, heart attack, strokes, etc because of being poor in nutrition. Junk food is the easiest way to gain unhealthy weight. The amount of fats and sugar in the food makes you gain weight rapidly. However, this is not a healthy weight. It is more of fats and cholesterol which will have a harmful impact on your health. Junk food is also one of the main reasons for the increase in obesity nowadays.This food only looks and tastes good, other than that, it has no positive points. The amount of calorie your body requires to stay fit is not fulfilled by this food. For instance, foods like French fries, burgers, candy, and cookies, all have high amounts of sugar and fats. Therefore, this can result in long-term illnesses like diabetes and high blood pressure. This may also result in kidney failure. Above all, you can get various nutritional deficiencies when you don’t consume the essential nutrients, vitamins, minerals and more. You become prone to cardiovascular diseases due to the consumption of bad cholesterol and fat plus sodium. In other words, all this interferes with the functioning of your heart. Furthermore, junk food contains a higher level of carbohydrates. It will instantly spike your blood sugar levels. This will result in lethargy, inactiveness, and sleepiness. A person reflex becomes dull overtime and they lead an inactive life. To make things worse, junk food also clogs your arteries and increases the risk of a heart attack. Therefore, it must be avoided at the first instance to save your life from becoming ruined.The main problem with junk food is that people don’t realize its ill effects now. When the time comes, it is too late. Most importantly, the issue is that it does not impact you instantly. It works on your overtime; you will face the consequences sooner or later. Thus, it is better to stop now.You can avoid junk food by encouraging your children from an early age to eat green vegetables. Their taste buds must be developed as such that they find healthy food tasty. Moreover, try to mix things up. Do not serve the same green vegetable daily in the same style. Incorporate different types of healthy food in their diet following different recipes. This will help them to try foods at home rather than being attracted to junk food.In short, do not deprive them completely of it as that will not help. Children will find one way or the other to have it. Make sure you give them junk food in limited quantities and at healthy periods of time. '

import gensim 后,第一步是 summarizegensim.summarization 。它是一个内置的功能,实现TextRANK。

1
2
3
# Importing package and summarizer
import gensim
from gensim.summarization import summarize

接下来,将文本语料库作为输入来 summarize 函数

1
2
3
# Passing the text corpus to summarizer
short_summary = summarize(original_text)
print(short_summary)

They become high in calories, high in cholesterol, low in healthy nutrients, high in sodium mineral, high in sugar, starch, unhealthy fat, lack of protein and lack of dietary fibers.
Processed and junk foods are the means of rapid and unhealthy weight gain and negatively impact the whole body throughout the life.
Junk foods tastes good and looks good however do not fulfil the healthy calorie requirement of the body.
It is found according to the Centres for Disease Control and Prevention that Kids and children eating junk food are more prone to the type-2 diabetes.
Eating junk food daily lead us to the nutritional deficiencies in the body because it is lack of essential nutrients, vitamins, iron, minerals and dietary fibers.
It increases risk of cardiovascular diseases because it is rich in saturated fat, sodium and bad cholesterol.
High sodium and bad cholesterol diet increases blood pressure and overloads the heart functioning.
One who like junk food develop more risk to put on extra weight and become fatter and unhealthier.
Junk foods contain high level carbohydrate which spike blood sugar level and make person more lethargic, sleepy and less active and alert.
For instance, foods like French fries, burgers, candy, and cookies, all have high amounts of sugar and fats.

似乎太长了, 对!

是的,但您可以控制您摘要的文本应该有多长。

您可以根据您的要求更改函数的默认参数 summarize

参数包括:

  1. ratio: 它可能需要 0 到 1 之间的值。它表示与原始文本相比的摘要比例。

  2. word_count:它决定了摘要中的单词数量。

让我向您展示如何使用上面示例中的参数。

1
2
3
# Summarization by ratio
summary_by_ratio=summarize(original_text,ratio=0.1)
print(summary_by_ratio)

1
2
3
4
5
They become high in calories, high in cholesterol, low in healthy nutrients, high in sodium mineral, high in sugar, starch, unhealthy fat, lack of protein and lack of dietary fibers.
Processed and junk foods are the means of rapid and unhealthy weight gain and negatively impact the whole body throughout the life.
Eating junk food daily lead us to the nutritional deficiencies in the body because it is lack of essential nutrients, vitamins, iron, minerals and dietary fibers.
It increases risk of cardiovascular diseases because it is rich in saturated fat, sodium and bad cholesterol.
High sodium and bad cholesterol diet increases blood pressure and overloads the heart functioning.

在上述输出中,您可以注意到只有 10% 的原始文本被视为摘要。

同样,您可以使用 word_count

1
2
3
# Summarization by word count
summary_by_word_count=summarize(article_text,word_count=30)
print(summary_by_word_count)

They become high in calories, high in cholesterol, low in healthy nutrients, high in sodium mineral, high in sugar, starch, unhealthy fat, lack of protein and lack of dietary fibers.

与 TextRank 类似,还有其他各种算法可以执行摘要。让我们逐一看看。

文本摘要与Sumy

sumy 包为您提供了实现文本摘要的多个算法。只需导入所需的算法,而无须自行编写代码。

在本节中,我将讨论以下算法的实现,以便使用以下算法进行摘要:

  1. LexRank
  2. Luhn
  3. 潜伏语义分析,LSA
  4. KL-Sum

首先,通过以下命令安装包

1
2
3
    # Installing and Importing sumy
    !pip install sumy
    import sumy

您可以通过模块提供不同的摘要 sumy.summarizers

sumy.summarizers

<module ‘sumy.summarizers’ from ‘/usr/local/lib/python3.6/dist-packages/sumy/summarizers/init.py’>

LexRank

首先,让我向您介绍一下 LexRank

LexRank是如何工作的?

与文本中的许多其他句子相似的句子极有可能很重要。LexRank 的方法是,其他类似句子推荐特定句子,因此排名较高。

级别越高,被纳入摘要的优先级就越高。

我将就如何摘要以下文本进行分步演示。

接下来,import PlaintextParser 。在这里,我们有一篇文章存储作为一个字符串,所以我们使用它。在使用网站源等的情况下,还有其他解析器可用。除了解析器,您还必须导入 Tokenizer 以将原始文本分割成令牌。

1
2
3
# Importing the parser and tokenizer
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer

您可以访问可用的摘要 sumy.summarizers 。在这里,我已经导入 LexRankSummarizer

# Import the LexRank summarizer
from sumy.summarizers.lex_rank import LexRankSummarizer

由于此处的文本源是一个字符串,因此您需要使用 PlainTextParser.from_string() 函数来初始化解析器。您可以指定用作输入的语言 Tokenizer

语法: PlaintextParser.from_string(cls, string, tokenizer)

# Initializing the parser
my_parser = PlaintextParser.from_string(original_text,Tokenizer('english'))

接下来创建一个摘要模型 lex_rank_summarizer ,以适应您的文本。语法是: lex_rank_summarizer(document, sentences_count) .

您可以通过参数在摘要中决定您想要的句子数量 sentences_count

1
2
3
4
5
6
7
    # Creating a summary of 3 sentences.
    lex_rank_summarizer = LexRankSummarizer()
    lexrank_summary = lex_rank_summarizer(my_parser.document,sentences_count=3)

    # Printing the summary
    for sentence in lexrank_summary:
       print(sentence)

1
2
3
It is found according to the Centres for Disease Control and Prevention that Kids and children eating junk food are more prone to the type-2 diabetes.
It is more of fats and cholesterol which will have a harmful impact on your health.
Children will find one way or the other to have it.

与LexRank类似,有更多的文本摘要支持 sumy 。在下一节中,让我们做 LSA。

LSA(延迟语义分析)

潜伏语义分析是一种无人监督的学习算法,可用于提取文本摘要。

它通过将奇异值分解 (SVD) 应用于术语文档频率矩阵来提取具有语意义的句子。要了解有关此算法的更多信息,请查看 此处

让我演示如何使用 LSA 进行摘要。首先,从 sumy

# Import the summarizer
from sumy.summarizers.lsa import LsaSummarizer

导入parsertokenizer,以使文档进行令牌化。

1
2
3
4
# Parsing the text string using PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.parsers.plaintext import PlaintextParser
parser=PlaintextParser.from_string(original_text,Tokenizer('english'))

parser已经创建。是时候初始化摘要模型,并通过您的文档,并且不需要句子作为输入。

1
2
3
4
5
6
7
    # creating the summarizer
    lsa_summarizer=LsaSummarizer()
    lsa_summary= lsa_summarizer(parser.document,3)

    # Printing the summary
    for sentence in lsa_summary:
      print(sentence)

1
2
3
Junk foods taste good that’s why it is mostly liked by everyone of any age group especially kids and school going children.
To make things worse, junk food also clogs your arteries and increases the risk of a heart attack.
Therefore, it must be avoided at the first instance to save your life from becoming ruined.The main problem with junk food is that people don’t realize its ill effects now.

Luhn

Luhn 摘要算法的方法基于 TF-IDF(术语频率反向文档频率)。当非常低的常用单词和高度频繁的单词(句号)都不显著时,它很有用。

在此基础上,进行句子评分,高级句子进行汇总。

# Import the summarizer
from sumy.summarizers.luhn import LuhnSummarizer

与以前的方法一样,通过以下代码初始化解析器。

1
2
3
4
# Creating the parser
from sumy.nlp.tokenizers import Tokenizer
from sumy.parsers.plaintext import PlaintextParser
parser=PlaintextParser.from_string(original_text,Tokenizer('english'))

接下来,用你的文本来倒入摘要模型。您可以通过 sentences_count 参数在摘要中决定句子数量。

1
2
3
4
5
6
7
    # Creating the summarizer
    luhn_summarizer=LuhnSummarizer()
    luhn_summary=luhn_summarizer(parser.document,sentences_count=3)

    # Printing the summary
    for sentence in luhn_summary:
      print(sentence)

1
2
3
They become high in calories, high in cholesterol, low in healthy nutrients, high in sodium mineral, high in sugar, starch, unhealthy fat, lack of protein and lack of dietary fibers.
It is found according to the Centres for Disease Control and Prevention that Kids and children eating junk food are more prone to the type-2 diabetes.
Eating junk food daily lead us to the nutritional deficiencies in the body because it is lack of essential nutrients, vitamins, iron, minerals and dietary fibers.

KL-Sum

另一种提取方法是 KL-Sum 算法。

它根据单词分布与原始文本的相似性来选择句子。它旨在降低KL散度(了解更多)。它使用贪婪的优化方法,并不断添加句子,直到KL背离减少。

让我在这里向你展示它的表现。首先从进口 sumy

from sumy.summarizers.kl import KLSummarizer

接下来,创建从原始文本中读取解析器

1
2
3
4
# Creating the parser
from sumy.nlp.tokenizers import Tokenizer
from sumy.parsers.plaintext import PlaintextParser
parser=PlaintextParser.from_string(original_text,Tokenizer('english'))

对摘要进行简单化,并通过属性传递文本 parser.document

1
2
3
4
5
6
7
    # Instantiating the KLSummarizer
    kl_summarizer=KLSummarizer()
    kl_summary=kl_summarizer(parser.document,sentences_count=3)

    # Printing the summary
    for sentence in kl_summary:
    print(sentence)

1
2
3
It is found according to the Centres for Disease Control and Prevention that Kids and children eating junk food are more prone to the type-2 diabetes.
High sodium and bad cholesterol diet increases blood pressure and overloads the heart functioning.
Junk food is the easiest way to gain unhealthy weight.

什么是生成文本摘要?

生成摘要是一种新的抽象方法,它产生新的句子,最能代表整个文本。这比提取方法更好,因为句子只是从原始文本中选择用于摘要。

如何轻松实现生成摘要?

一种简单而有效的方法是通过 Huggingfacetransformers 包。

!pip install transformers    

将一起安装包:sacremoses, sentencepiece, tokenizers, transformers

Huggingface 支持最先进的模型,以实现摘要,分类等任务。一些常用的模型是 GPT-2,GPT-3,BERT ,OpenAI,GPT,T5。

另一个很酷的功能是 transformers 它提供了经过预先训练的权重模型,可以通过方法 from_pretrained() 轻松实例化。

您可以在此处查看当前可用的预训练模型列表

本节将通过不同的库模型向您展示文本摘要 transformers

T5 Transformers的摘要

T5 是编码器解码器模型。它将所有语言问题转换为文本到文本的格式。

首先,您需要通过以下命令导入 tokenizer和相应的模型。

T5ForConditionalGeneration当输入和输出都是sequences时,最好使用模型。

# Importing requirements
from transformers import T5Tokenizer, T5Config, T5ForConditionalGeneration

您可以通过.from_pretrained方法使用预训的t5-small模型。语法如下所述。

T5ForConditionalGeneration.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)

1
2
3
# Instantiating the model and tokenizer
my_model = T5ForConditionalGeneration.from_pretrained('t5-small')
tokenizer = T5Tokenizer.from_pretrained('t5-small')
1
2
3
4
5
HBox(children=(FloatProgress(value=0.0, description=’Downloading’, max=1197.0, style=ProgressStyle(description…

HBox(children=(FloatProgress(value=0.0, description=’Downloading’, max=242136741.0, style=ProgressStyle(descri…

HBox(children=(FloatProgress(value=0.0, description=’Downloading’, max=791656.0, style=ProgressStyle(descripti…

接下来是你不应该忘记的最重要步骤。你必须添加字符串"summarize:"在你文章的开头。T5 Transformers通过在输入文本上预填充特定前缀来执行不同的任务。

1
2
3
# Concatenating the word "summarize:" to raw text
text = "summarize:" + original_text
text

如果你还记得,T5是一个编码器解码器模式,因此输入序列应该是以 ID 序列 的形式,或 input-ids

如何将输入文本转换为input-ids

这个过程被称为编码文本,可以通过 encode() 方法实现

# encoding the input text
input_ids=tokenizer.encode(text, return_tensors='pt', max_length=512)

接下来,您可以将 input_ids 该函数传递给该函数 generate() ,该函数将返回与摘要对应的一系列 ID。

语法将是: transformers.PreTrainedModel.generate (input_ids=None, max_length=None, min_length=None, num_beams=None)

input_ids, 其他参数可选,可根据摘要要求设置。

1
2
3
# Generating summary ids
summary_ids = my_model.generate(input_ids)
summary_ids

tensor([[ 0, 11797, 4371, 33, 8, 1391, 13, 6900, 11537, 257,
11, 119, 6716, 114, 8363, 6, 842, 29939, 6, 842]])

你可以看到该模型已返回一个带有 ID 序列的张量。现在,使用该 decode() 功能从这些 ID 生成摘要文本。

它只是执行功能的反向 encode()

1
2
3
# Decoding the tensor and printing the summary.
t5_summary = tokenizer.decode(summary_ids[0])
print(t5_summary)

junk foods are the source of constipation and other diseases like diabetes, heart ailments, heart

您可以观察摘要并发现与提取方法不同的新框句子。与提取方法不同,上述汇总输出不是原始文本的一部分。

用BART Transformers 做摘要

Huggingface 的transformers 支持使用BART模型做摘要。

导入 model和tokenizer。对于需要生成序列的问题,最好使用 BartForConditionalGeneration 模型。

# Importing the model
from transformers import BartForConditionalGeneration, BartTokenizer, BartConfig

bart-large-cnn 是一种预先训练的模型,为摘要任务做了优化。您可以使用 from_pretrained() 下面所示的方法加载模型。

1
2
3
4
# Loading the model and tokenizer for bart-large-cnn

tokenizer=BartTokenizer.from_pretrained('facebook/bart-large-cnn')
model=BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')
1
2
3
4
5
6
7
8
HBox(children=(FloatProgress(value=0.0, description=’Downloading’, max=898823.0, style=ProgressStyle(descripti…


HBox(children=(FloatProgress(value=0.0, description=’Downloading’, max=456318.0, style=ProgressStyle(descripti…

HBox(children=(FloatProgress(value=0.0, description=’Downloading’, max=1300.0, style=ProgressStyle(description…

HBox(children=(FloatProgress(value=0.0, description=’Downloading’, max=1625270765.0, style=ProgressStyle(descr…

您需要以 ID 序列的形式传递输入文本。

为此,请使用 batch_encode_plus() 令牌器的功能。此功能返回包含编码序列, 或序列对和其他附加信息的字典。

现在,如何限制返回序列的最大长度?

设置 max_length 参数 batch_encode_plus()

接下来,通过 input_idsmodel.generate() 函数生成汇总输出的 ID。

1
2
3
# Encoding the inputs and passing them to model.generate()
inputs = tokenizer.batch_encode_plus([original_text],return_tensors='pt')
summary_ids = model.generate(inputs['input_ids'], early_stopping=True)

model.generate() 返回了与原始文本摘要对应的 ID 序列。您可以通过方法将 ID 序列转换为文本 decode()

1
2
3
# Decoding and printing the summary
bart_summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print(bart_summary)

Junk foods taste good that’s why it is mostly liked by everyone of any age group especially kids and school going children. They generally ask for the junk food daily because they have been trend so by their parents from the childhood. According to the research by scientists, it has been found that junk foods have negative effects on the health in many ways.

GPT-2 Transformers的摘要

GPT-2 Transformers是 OpenAI 推出的文本摘要中的另一个主要参与者。多亏 transformers 了,所遵循的过程就像BART Transformers一样。

首先,您必须导入tokenizer 和 model。请确保导入 LM 头类型模型,因为生成序列是必要的。接下来,加载预训练 gpt-2 模型和令牌。

加载模型后,您必须对输入文本进行编码,并将其作为输入传递给 model.generate()

# Importing model and tokenizer
from transformers import GPT2Tokenizer,GPT2LMHeadModel

# Instantiating the model and tokenizer with gpt-2
tokenizer=GPT2Tokenizer.from_pretrained('gpt2')
model=GPT2LMHeadModel.from_pretrained('gpt2')

# Encoding text to get input ids & pass them to model.generate()
inputs=tokenizer.batch_encode_plus([original_text],return_tensors='pt',max_length=512)
summary_ids=model.generate(inputs['input_ids'],early_stopping=True)

summary_ids包含与文本摘要对应的 ID 序列。您可以解码并打印摘要

1
2
3
4
# Decoding and printing summary

GPT_summary=tokenizer.decode(summary_ids[0],skip_special_tokens=True)
print(GPT_summary)

XLM Transformers的摘要

另一种可用于摘要的Transformers类型是 XLM Transformers。

您可以导入 XLMWithLMHeadModel 支持序列生成的序列。您可以使用方法加载预训练 xlm-mlm-en-2048 模型和用权重的令牌。 from_pretrained()

接下来的步骤与前三个案例相同。编码输入文本通过 generate() 返回 ID 序列进行功能处理。您可以解码并打印摘要。

以下代码逐步演示。

# Importing model and tokenizer
from transformers import XLMWithLMHeadModel, XLMTokenizer

# Instantiating the model and tokenizer
tokenizer=XLMTokenizer.from_pretrained('xlm-mlm-en-2048')
model=XLMWithLMHeadModel.from_pretrained('xlm-mlm-en-2048')

# Encoding text to get input ids & pass them to model.generate()
inputs=tokenizer.batch_encode_plus([original_text],return_tensors='pt',max_length=512)
summary_ids=model.generate(inputs['input_ids'],early_stopping=True)

# Decode and print the summary
XLM_summary=tokenizer.decode(summary_ids[0],skip_special_tokens=True)
print(XLM_summary)

HBox(children=(FloatProgress(value=0.0, description=’Downloading’, max=2668507970.0, style=ProgressStyle(descr…

你可以注意到,XLM_summary不是很好。这是因为,即使它支持总结,模型并没有为这项任务进行优化。

我们采用了从文本范围到Transformers等多种方法进行总结。您可以分析我们在每个方法的末尾得到的摘要,并选择最好的方法。

总的说来,使用Huggingface Transformers进行生成摘要是目前最先进的方法。

参考学习文章:

spaCy Tutorial – Complete Writeup

Complete Guide to Natural Language Processing (NLP)

Building chatbot with Rasa and spaCy

How to Train Text Classification Model in spaCy?

How to Train spaCy to Autodetect New Entities (NER)

Topic Modeling with Gensim (Python)

Lemmatization Approaches with Examples in Python

101 NLP Exercises (using modern libraries)

LDA in Python – How to grid search best topic models?

Python Regular Expressions Tutorial and Examples: A Simplified Guide

Topic modeling visualization – How to present the results of LDA models?

凡本网注明"来源:XXX "的文/图/视频等稿件,本网转载出于传递更多信息之目的,并不意味着赞同其观点或证实其内容的真实性。如涉及作品内容、版权和其它问题,请与本网联系,我们将在第一时间删除内容!
作者: Shrivarsheni
来源: https://www.machinelearningplus.com/nlp/text-summarization-approaches-nlp-example/