跳转至

A Python Tool for Effective Text Processing

Text Processing mainly requires Natural Language Processing( NLP), which is processing the data in a useful way so that the machine can understand the Human Language with the help of an application or product. Using NLP we can derive some information from the textual data such as sentiment, polarity, etc. which are useful in creating text processing based applications.

Python provides different open-source libraries or modules which are built on top of NLTK and helps in text processing using NLP functions. Different libraries have different functionalities that are used on data to gain meaningful results. One such Library is Pattern.

Pattern is an open-source python library and performs different NLP tasks. It is mostly used for text processing due to various functionalities it provides. Other than text processing Pattern is used for Data Mining i.e we can extract data from various sources such as Twitter, Google, etc. using the data mining functions provided by Pattern.

In this article, we will try and cover the following points:

  • NLP Functionalities of Pattern
  • Data Mining Using Pattern

Implementation:

We will start by installing Pattern using the pip install pattern.

  1. Importing required library
    Different functionalities are defined under different functions we will import them as and when required as we move ahead in this article. We will be working in the English language so we will be using ‘en’ for the English module. Let us start with some basic functionalities of Pattern for NLP operations

  2. NLP Operations using Pattern
    We will go through some of the most used and most important functionalities which are provided by Pattern. Starting with parsing a sentence.

  3. Parsing

    from pattern.en import parse
    parse('Hello Everyone and Welcome to Analytics India Magazine')
    

    Here we can see the output of the parse function differentiate the words in the sentence as a noun, verb, subject, or subject. We can also use the ‘pprint’ function defined in the pattern library to display the parsed sentence in a clear manner. Also, we can set different parameters for parses such as lemmata, tokenize, encoding, etc. All these parameters can be used in parsing only so that we do not have to use a separate function for different properties.

    from pattern.en import pprint
    pprint(parse('Hello Everyone and Welcome to Analytics India Magazine', relations = True,tokenize= True, lemmata= True))
    
  4. N-Grams

    N-Gram function is used to find all the n-grams in a given text string.

    from pattern.en import ngrams
    print(ngrams("Hello Everyone and Welcome to Analytics India Magazine", n=3))
    
  5. Sentiment Analysis

Sentiment function tries to identify the opinion or view that is held by the particular text string. Sentiment function returns both polarity and the subjectivity of the given text. The Polarity value ranges between 1(Highly Positive) to -1(Highly Negative) and subjectivity value ranges between 0(Objective) to 1(Subjective).

from pattern.en import sentiment
print(sentiment("He is a good boy but sometimes he behaves miserably"))

We can see that the sentiment analysis says that the sentence is negative with high subjectivity.

  1. Modality
    Modality is one such function that makes it different from other python libraries based on NLP. The modality function is used to find the degree of certainty in a particular sentence. Its value ranges from -1 to 1. As defined in the Pattern library we can state that a sentence with a modality of 0.5 and above can be stated as a fact.
1
2
3
4
from pattern.en import modality
text = parse('He is a good boy but sometimes he behaves miserably')
text= Sentence(text)
print(modality(text))

The modality comes out to be zero which means that the sentence is neutral.

  1. Suggest

Suggest function is used for spelling corrections but it is more than that. It not only checks the spelling it also gives you suggestions of what might be the correct word with their probabilities. This function also distinguishes pattern from other libraries.

from pattern.en import suggest
print(suggest("Heroi"))
  1. Quantify
    Quantify function is used to provide a word count estimation of the words given.
1
2
3
from pattern.en import quantify
a = quantify(['Pencil', 'Pencil', 'Eraser', 'Sharpener', 'Sharpener', 'Sharpener', 'Scale', 'Compass'])
print(a)
  1. Data Mining using Pattern
    One of the most important features of Pattern is that it can be used for data mining through different platforms like Google, Twitter, Wikipedia, etc. Let us explore the data mining operations of the pattern library and extract some data using it. We will start by mining data using Google by entering a keyword that we want to search for and display the text along with the URL that is there in the search result.

  2. Google Mining

    1
    2
    3
    4
    5
    from pattern.web import Google
    google = Google()
    for results in google.search('Analytics India Magazine'):
      print(results.url)
      print(results.text)
    
  3. Twitter Mining

    We can also use twitter for mining data which we require. Let us explore it through an example.

    1
    2
    3
    4
    5
    from pattern.web import Twitter
    twitter = Twitter()
    for results in twitter.search('Analytics India Magazine'):
      print(results.url)
      print(results.text)
    
  4. Flickr Mining
    Flickr is an American image hosting and video hosting service, as well as an online community. Pattern can be used to extract data from Flickr.

    1
    2
    3
    4
    5
    from pattern.web import Flickr
    flickr = Flickr(license=None)
    for result in flickr.search('Analytics India Magazine'):
      print(result.url)
      print(result.text)
    

    Similarly, Pattern provides a large number of online data mining using different platforms and we can use them accordingly.


Any text/graphics/videos and other articles on this website that indicate "Source: xxx" are reprinted on this website for the purpose of transmitting more information, which does not mean that we agree with their views or confirm the authenticity of their content. If you are involved in the content of the work, copyright and other issues, please contact this website, we will delete the content in the first time!
Author: Himanshu Sharma
Source: https://analyticsindiamag.com/hands-on-guide-to-pattern-a-python-tool-for-effective-text-processing-and-data-mining/