NLP

Article scraping & curation using newspaper Python

Google+ Pinterest LinkedIn Tumblr

Newspaper is an amazing python library for extracting & curating articles.

Newspaper is python module used to extracting and parsing newspaper article or any other contents. Newspaper uses advanced algorithms with web scrapping functions to extract all useful text from given website. It work really amazing and provide useful information about any article or website.

Newspaper internally uses NLTK libraries and their modules. So we may need to first setup NLTK library and download their module punkt (Sentence Tokenizer)

pip install nltk

Once nltk installed, We need to use that in our code as mentioned below

import nltk
nltk.download('punkt')

This code will download required zip file in our nltk folder that we can use in our future projects or scripts. Now we are going to install newspaper library using pip command

pip install newspaper

Now we have newspaper library install and all set to write code to get information from articles. Let’s Start

# import required libraries
import newspaper
from newspaper import Article

# you can change any URL here
url = 'https://medium.com/analytics-vidhya/6-useful-programming-languages-for-data-science-you-should-learn-that-are-not-r-and-python-2d5bc79873a2'

# Create Article object with URL
toi = Article(url)

# download article from URL
toi.download()

# parse the article
toi.parse()

# apply nlp on article
toi.nlp()

# Extract Title from article
print("Article's Title : ")
print(toi.title)
print("\n")

print("Article's Text : ")
print(toi.text)
print('\n')

print("Article's image")
print(toi.top_image)
print('\n')

print("Article's Summary")
print(toi.summary)
print('\n')

print("Article's keywords")
print(toi.keywords)
print('\n')

Output :

Article’s Title :
6 Useful Programming Languages for Data Science You Should Learn (that are not R and Python)

Article’s Text :
6 Useful Programming Languages for Data Science You Should Learn (that are not R and Python) Creative Kaksha Follow Jun 24 · 9 min read
Overview
Which programming language should you pick for data science? Here’s a list of 6 powerful ones that are not Python or R……..

Article’s image : 
https://miro.medium.com/max/1200/0*k_SfHuLLKbsCqdwT.jpg

Article’s Summary : 
6 Useful Programming Languages for Data Science You Should Learn (that are not R and Python) Creative Kaksha Follow Jun 24 · 9 min readOverviewWhich programming language should you pick for data science?
We will cover 6 powerful and useful programming languages for data science that I feel every data scientist should learn (or at least be aware of).
Spark can perform various data science and data engineering tasks, such as:Exploratory data analysisFeature extractionSupervised learningModel evaluationBuilding and debugging Spark applications, etc.
Github link: Learn more about Spark NLPEnd NotesDon’t you love how vast the field is for data science languages?
But my aim here was to bring out other languages that we can use to perform data science tasks.

Article’s keywords :
[‘learn’, ‘python’, ‘link’, ‘science’, ‘languages’, ‘library’, ‘learning’, ‘programming’, ‘github’, ‘useful’, ‘language’, ‘machine’, ‘data’, ‘r’

Reference : https://newspaper.readthedocs.io/en/latest/

Write A Comment