A Python Tool for Effective Text Processing¶
Text Processing mainly requires Natural Language Processing( NLP), which is processing the data in a useful way so that the machine can understand the Human Language with the help of an application or product. Using NLP we can derive some information from the textual data such as sentiment, polarity, etc. which are useful in creating text processing based applications.
Python provides different open-source libraries or modules which are built on top of NLTK and helps in text processing using NLP functions. Different libraries have different functionalities that are used on data to gain meaningful results. One such Library is Pattern.
Pattern is an open-source python library and performs different NLP tasks. It is mostly used for text processing due to various functionalities it provides. Other than text processing Pattern is used for Data Mining i.e we can extract data from various sources such as Twitter, Google, etc. using the data mining functions provided by Pattern.
In this article, we will try and cover the following points:
- NLP Functionalities of Pattern
- Data Mining Using Pattern
Implementation:¶
We will start by installing Pattern using the pip install pattern.
-
Importing required library
Different functionalities are defined under different functions we will import them as and when required as we move ahead in this article. We will be working in the English language so we will be using ‘en’ for the English module. Let us start with some basic functionalities of Pattern for NLP operations -
NLP Operations using Pattern
We will go through some of the most used and most important functionalities which are provided by Pattern. Starting with parsing a sentence. -
Parsing
Here we can see the output of the parse function differentiate the words in the sentence as a noun, verb, subject, or subject. We can also use the ‘pprint’ function defined in the pattern library to display the parsed sentence in a clear manner. Also, we can set different parameters for parses such as lemmata, tokenize, encoding, etc. All these parameters can be used in parsing only so that we do not have to use a separate function for different properties.
-
N-Grams
N-Gram function is used to find all the n-grams in a given text string.
-
Sentiment Analysis
Sentiment function tries to identify the opinion or view that is held by the particular text string. Sentiment function returns both polarity and the subjectivity of the given text. The Polarity value ranges between 1(Highly Positive) to -1(Highly Negative) and subjectivity value ranges between 0(Objective) to 1(Subjective).
We can see that the sentiment analysis says that the sentence is negative with high subjectivity.
- Modality
Modality is one such function that makes it different from other python libraries based on NLP. The modality function is used to find the degree of certainty in a particular sentence. Its value ranges from -1 to 1. As defined in the Pattern library we can state that a sentence with a modality of 0.5 and above can be stated as a fact.
The modality comes out to be zero which means that the sentence is neutral.
- Suggest
Suggest function is used for spelling corrections but it is more than that. It not only checks the spelling it also gives you suggestions of what might be the correct word with their probabilities. This function also distinguishes pattern from other libraries.
- Quantify
Quantify function is used to provide a word count estimation of the words given.
-
Data Mining using Pattern
One of the most important features of Pattern is that it can be used for data mining through different platforms like Google, Twitter, Wikipedia, etc. Let us explore the data mining operations of the pattern library and extract some data using it. We will start by mining data using Google by entering a keyword that we want to search for and display the text along with the URL that is there in the search result. -
Google Mining
-
Twitter Mining
We can also use twitter for mining data which we require. Let us explore it through an example.
-
Flickr Mining
Flickr is an American image hosting and video hosting service, as well as an online community. Pattern can be used to extract data from Flickr.Similarly, Pattern provides a large number of online data mining using different platforms and we can use them accordingly.
Any text/graphics/videos and other articles on this website that indicate "Source: xxx" are reprinted on this website for the purpose of transmitting more information, which does not mean that we agree with their views or confirm the authenticity of their content. If you are involved in the content of the work, copyright and other issues, please contact this website, we will delete the content in the first time!
Author: Himanshu Sharma
Source: https://analyticsindiamag.com/hands-on-guide-to-pattern-a-python-tool-for-effective-text-processing-and-data-mining/