Hello friends, today we are going to learn something about Natural Language Processing (NLP) and one of the library which we used for NLP. Nature Language Processing is one of the most interesting sub-fields of data science, and data scientists are increasingly expected to be able to whip up solutions that involve the exploitation of unstructured text data.
Our agenda for this post to make a Named Entity Extractor using Spacy library. Let’s learn about spacy first.
What is Spacy ?
spaCy is a relatively new package for “Industrial strength NLP in Python” developed by Matt Honnibal at Explosion AI. It is designed with the applied data scientist in mind, meaning it does not weigh the user down with decisions over what esoteric algorithms to use for common tasks and it’s fast. Incredibly fast (it’s implemented in Cython). If you are familiar with the Python data science stack, spaCy is reasonably low-level, but very intuitive and performant.
What is use of Spacy ?
Spacy provides multiple tasks commonly used in any NLP project, including:
- Tokenisation
- Lemmatisation
- Part-of-speech tagging
- Entity recognition
- Dependency parsing
- Sentence recognition
- Word-to-vector transformations
- Many convenience methods for cleaning and normalising text
Our goal is to use Entity Recognition from spacy library in this post. Entity recognition is the process of classifying named entities found in a text into pre-defined categories, such as persons, places, organizations, dates, etc.
We have created project with Flask and Spacy to extract named entity from provided text. First of all let’s setup environment and start our project.
# create virtaul environment python3 -m venv env # activate virtual envionment source env/bin/activate # install flask pip install -r requirements.txt
Here, we have created one folder and setup virtual environment. We have loaded requirements from provided requirements.txt file which installed all dependencies.
# imports from flask import Flask, render_template, url_for, request import re import pandas as pd import spacy import en_core_web_sm # Load spacy model nlp = spacy.load('en_core_web_sm')
Once have loaded all require imports we have to load spacy model as well.
app = Flask(__name__) @app.route('/') def index(): return render_template("index.html") @app.route('/process', methods=["POST"]) def process(): if request.method == 'POST': rawtext = request.form['rawtext'] # get doc out of text using spacy function doc = nlp(rawtext) d = [] # loop over and append it in list # ent.label_ provides Entity for ent in doc.ents: text = ent.label_ + " : " + ent.text + \ " : " + spacy.explain(ent.label_) d.append(text) results = d num_of_results = len(results) # render output into html file return render_template("index.html", results=results, num_of_results=num_of_results) # app start point if __name__ == '__main__': app.run(debug=True)
We have used flask for this project. It has two routes one is simple which load html page with GET and other is /process which used POST with data which we pass from text and it will process data using spacy library and provide us entities and details from text.
Basically , nlp() function is used to load text and it will convert text into spacy document. We can use spacy document and get information from that document.
You can see output as below
As we can see in output our project has extracted multiple entities from text we provided like it found Google as ORG (organization) , January,2006 as DATE. Interesting right !! You can use spacy and explore it more and get amazing results out of text.
You can find download complete code from Github