NLP

PySpacy – Named Entity Extractor using Spacy

Google+ Pinterest LinkedIn Tumblr

Hello friends, today we are going to learn something about Natural Language Processing (NLP) and one of the library which we used for NLP. Nature Language Processing is one of the most interesting sub-fields of data science, and data scientists are increasingly expected to be able to whip up solutions that involve the exploitation of unstructured text data.

Our agenda for this post to make a Named Entity Extractor using Spacy library. Let’s learn about spacy first.

What is Spacy ?

spaCy is a relatively new package for “Industrial strength NLP in Python” developed by Matt Honnibal at Explosion AI. It is designed with the applied data scientist in mind, meaning it does not weigh the user down with decisions over what esoteric algorithms to use for common tasks and it’s fast. Incredibly fast (it’s implemented in Cython). If you are familiar with the Python data science stack, spaCy is reasonably low-level, but very intuitive and performant.

What is use of Spacy ?

Spacy provides multiple tasks commonly used in any NLP project, including:

  • Tokenisation
  • Lemmatisation
  • Part-of-speech tagging
  • Entity recognition
  • Dependency parsing
  • Sentence recognition
  • Word-to-vector transformations
  • Many convenience methods for cleaning and normalising text

Our goal is to use Entity Recognition from spacy library in this post. Entity recognition is the process of classifying named entities found in a text into pre-defined categories, such as persons, places, organizations, dates, etc.

We have created project with Flask and Spacy to extract named entity from provided text. First of all let’s setup environment and start our project.

# create virtaul environment
python3 -m venv env

# activate virtual envionment
source env/bin/activate

# install flask 
pip install -r requirements.txt

Here, we have created one folder and setup virtual environment. We have loaded requirements from provided requirements.txt file which installed all dependencies.

# imports 
from flask import Flask, render_template, url_for, request
import re
import pandas as pd
import spacy
import en_core_web_sm

# Load spacy model 
nlp = spacy.load('en_core_web_sm')

Once have loaded all require imports we have to load spacy model as well.

app = Flask(__name__)


@app.route('/')
def index():
    return render_template("index.html")


@app.route('/process', methods=["POST"])
def process():
    if request.method == 'POST':
        rawtext = request.form['rawtext']

        # get doc out of text using spacy function
        doc = nlp(rawtext)
        d = []
        
        # loop over and append it in list
        # ent.label_ provides Entity 
        for ent in doc.ents:
            text = ent.label_ + "   :   " + ent.text + \
                "   :   " + spacy.explain(ent.label_)
            d.append(text)

    results = d
    num_of_results = len(results)
    # render output into html file 
    return render_template("index.html", results=results, num_of_results=num_of_results)

# app start point
if __name__ == '__main__':
    app.run(debug=True)

We have used flask for this project. It has two routes one is simple which load html page with GET and other is /process which used POST with data which we pass from text and it will process data using spacy library and provide us entities and details from text.

Basically , nlp() function is used to load text and it will convert text into spacy document. We can use spacy document and get information from that document.

You can see output as below

Output

As we can see in output our project has extracted multiple entities from text we provided like it found Google as ORG (organization) , January,2006 as DATE. Interesting right !! You can use spacy and explore it more and get amazing results out of text.

You can find download complete code from Github

Write A Comment