Useful NLP Techniques: 1. Named Entity Recognition
Useful NLP Techniques: 1. Named Entity Recognition
The NLP techniques which can be useful in information techniques are as follow:
1. Named Entity Recognition
The most basic and useful technique in NLP is extracting the entities in the text. It highlights the
fundamental concepts and references in the text. Named entity recognition (NER) identifies
entities such as people, locations, organizations, dates, etc. from the text.NER is generally based
on grammar rules and supervised models. However, there are NER platforms such as open NLP
that have pre-trained and built-in NER models.
2. Sentiment Analysis
The most widely used technique in NLP is sentiment analysis. Sentiment analysis is most useful
in cases such as customer surveys, reviews and social media comments where people express
their opinions and feedback. The simplest output of sentiment analysis is a 3-point scale:
positive/negative/neutral. In more complex cases the output can be a numeric score that can be
bucketed into as many categories as required.
Sentiment Analysis can be done using supervised as well as unsupervised techniques. The most
popular supervised model used for sentiment analysis is naïve Bayes. It requires a training
corpus with sentiment labels, upon which a model is trained which is then used to identify the
sentiment. Naive Bayes is not the only tool out there - different machine learning techniques
like random forest or gradient boosting can also be used.
The unsupervised techniques also known as the lexicon-based methods require a corpus of words
with their associated sentiment and polarity. The sentiment score of the sentence is calculated
using the polarities of the words in the sentence.
3. Text Summarization
As the name suggests, there are techniques in NLP that help summarize large chunks of
text. Text summarization is mainly used in cases such as news articles and research articles.
Two broad approaches to text summarization are extraction and abstraction. Extraction methods
create a summary by extracting parts from the text. Abstraction methods create summary by
generating fresh text that conveys the crux of the original text. There are various algorithms that
can be used for text summarization like LexRank, TextRank, and Latent Semantic Analysis. To
take the example of LexRank, this algorithm ranks the sentences using similarity between them.
A sentence is ranked higher when it is similar to more sentences, and these sentences are in turn
similar to other sentences.
4. Aspect Mining
Aspect mining identifies the different aspects in the text. When used in conjunction with
sentiment analysis, it extracts complete information from the text. One of the easiest methods of
aspect mining is using part-of-speech tagging.
When aspect mining along with sentiment analysis is used on the sample text, the output conveys
the complete intent of the text:
5. Topic Modeling
Topic modeling is one of the more complicated methods to identify natural topics in the text. A
prime advantage of topic modeling is that it is an unsupervised technique. Model training and a
labeled training dataset are not required.
There are quite a few algorithms for topic modeling:
Latent Semantic Analysis (LSA)
Probabilistic Latent Semantic Analysis (PLSA)
Latent Dirichlet Allocation (LDA)
Correlated Topic Model (CTM).
One of the most popular methods is latent Dirichlet allocation. The premise of LDA is that each
text document comprises of several topics and each topic comprises of several words. The input
required by LDA is merely the text documents and the expected number of topics.