Newspaper is an amazing python library for extracting & curating articles.
Newspaper is python module used to extracting and parsing newspaper article or any other contents. Newspaper uses advanced algorithms with web scrapping functions to extract all useful text from given website. It work really amazing and provide useful information about any article or website.
Newspaper internally uses NLTK libraries and their modules. So we may need to first setup NLTK library and download their module punkt (Sentence Tokenizer)
pip install nltk
Once nltk installed, We need to use that in our code as mentioned below
import nltk nltk.download('punkt')
This code will download required zip file in our nltk folder that we can use in our future projects or scripts. Now we are going to install newspaper library using pip command
pip install newspaper
Now we have newspaper library install and all set to write code to get information from articles. Let’s Start
# import required libraries import newspaper from newspaper import Article # you can change any URL here url = 'https://medium.com/analytics-vidhya/6-useful-programming-languages-for-data-science-you-should-learn-that-are-not-r-and-python-2d5bc79873a2' # Create Article object with URL toi = Article(url) # download article from URL toi.download() # parse the article toi.parse() # apply nlp on article toi.nlp() # Extract Title from article print("Article's Title : ") print(toi.title) print("\n") print("Article's Text : ") print(toi.text) print('\n') print("Article's image") print(toi.top_image) print('\n') print("Article's Summary") print(toi.summary) print('\n') print("Article's keywords") print(toi.keywords) print('\n')
Output :
Article’s Title :
6 Useful Programming Languages for Data Science You Should Learn (that are not R and Python)
Article’s Text :
6 Useful Programming Languages for Data Science You Should Learn (that are not R and Python) Creative Kaksha Follow Jun 24 · 9 min read
Overview
Which programming language should you pick for data science? Here’s a list of 6 powerful ones that are not Python or R……..
Article’s image :
https://miro.medium.com/max/1200/0*k_SfHuLLKbsCqdwT.jpg
Article’s Summary :
6 Useful Programming Languages for Data Science You Should Learn (that are not R and Python) Creative Kaksha Follow Jun 24 · 9 min readOverviewWhich programming language should you pick for data science?
We will cover 6 powerful and useful programming languages for data science that I feel every data scientist should learn (or at least be aware of).
Spark can perform various data science and data engineering tasks, such as:Exploratory data analysisFeature extractionSupervised learningModel evaluationBuilding and debugging Spark applications, etc.
Github link: Learn more about Spark NLPEnd NotesDon’t you love how vast the field is for data science languages?
But my aim here was to bring out other languages that we can use to perform data science tasks.
Article’s keywords :
[‘learn’, ‘python’, ‘link’, ‘science’, ‘languages’, ‘library’, ‘learning’, ‘programming’, ‘github’, ‘useful’, ‘language’, ‘machine’, ‘data’, ‘r’
Reference : https://newspaper.readthedocs.io/en/latest/