Natural Language Processing: A Beginner's Guide To Fundamentals of
Natural Language Processing: A Beginner's Guide To Fundamentals of
Fundamentals of
NATURAL
LANGUAGE
PROCESSING
RAHO AMBITIOUS
INTRODUCTION 1
CONCLUSION 11
In a layman’s terms, natural language
processing is the science that aims to
b r i d g e t h e g a p b e t we e n h u m a n
communication (natural language) and
the machine’s understanding of it. When
humans communicate with each other in
their native language, we are hardly able
The concept of Natural to understand how our brains process so
Language Processing (NLP) much unstructured data so easily into
meaningful information that makes
has been the talk of the day sense to us. The human speech and
lately along with Artificial language, however, is very difficult for the
Intelligence (AI) and Machine computer to understand and interpret
since it can only understand one
Learning (ML), all of which language - binary. That’s where natural
aim to create smart machines language processing comes in.
that are able to understand
Intriguing as it is, natural language
human language and interact processing is a branch of AI that aims to
with humans with a certain create and design such machines that
degree of intelligence. can understand, interpret, as well as
manipulate human languages. Natural
language processing emanated out of
However, do you know what the collaboration of three disciplines:
the concept of NLP actually Computer Science, Artificial Intelligence,
and Computational Linguistics.
is?
RAHO AMBITIOUS 1
In 1954, at the IBM-Georgetown
Demonstration exhibited the automatic
translation of over sixty Russian
sentences to English, albeit in a very
rudimentary form. The year 1957
marked the revolution in Linguistics
Although the concept of NLP has recently
with Chomsky’s “Syntactic Structures,”
garnered an increasing amount of attention
a universal rulebook of syntactic
in the domains of Computer Science and
structures. Then again in 1969 came
AI, it is not exactly a “new field”. Deeply
Roger Schank’s Conceptual Dependency
rooted in Linguistics, the inception of
Theory for natural language interpretation
natural language processing happened
and understanding.
somewhere along the mid-twentieth
century when new and innovative advances
Basically, until the early 1980s, natural
were being made in the statistical analysis
language processing was driven by
and machine translation.
complicated, hand-written rules. It
was only in the late 1980s that ML
The real progress and experimentation in
algorithms made their way into the
the field of NLP began shortly after the
world of NLP. The integration of statistics,
World War II. As the tension continued to
probability, and ML algorithms began
mount between the US and Soviet Russia
to transform NLP to a purely statistical
and the fear of a nuclear war was in the air,
domain. SHRDLU and Jabberwacky
natural language processing came to be
a re t wo o f t h e e a r l i e st a n d m o st
the most helpful weapon during that time
successful steps in the field of NLP.
by helping translate the Russian language
Fast forward to today, we have smart
into English. It was during the 1950s that a
personal assistants such as Siri, Alexa,
successful breakthrough in NLP was made
and Cortana that can interact with us
- Alan Turing published the acclaimed
just like another human being!
article “Computing Machinery & Intelligence"
which laid the foundation of the universal
criterion for judging one’s intelligence, the
Turing Test.
RAHO AMBITIOUS 2
Though the initial years of the evolution of NLP witnessed
progress at a slow pace, at present, the market for natural
language processing is growing steadily at an impressive
rate, thanks to the massive upsurge in the adoption of
s m a r t d e v i c e s , t h e I o T, t h e r a p i d g r o w t h i n A I a n d
machine-to-machine technologies, and the ever-increasing
demand for enhanced customer experience. According to
the report Natural Language Processing Market by Type, it
is estimated that the NLP market will leap from 7.63 billion
USD to 16.07 billion USD by 2021. Another report by Infoholic
Research suggests that between 2017-23, the global Natural
Language Processing market will grow at a Compound
Annual Growth Rate (CAGR) of 18.78%.
In the global scenario, the market for natural language processing is predominantly
focused in Europe and North America as these regions have a strong concentration of NLP
vendors and also generate the maximum revenue. However, the Asia Pacific region, the
Middle East and Africa (MEA) are also not far behind and is expected to achieve the fastest
growth rate in NLP by the end of the forecast period.
H a rd wa re
S o f t wa re
S e r v i ce s
T h i s g ra p h d e p i c t s t h e s e g m e n t e d a n a l y s i s a n d t h e s i z e o f
t h e g l o b a l N L P m a r ke t f r o m 2 0 1 6 - 2 0 2 5 . ( S o u r c e)
RAHO AMBITIOUS 3
Between the period 2016-21, the professional services segment is estimated to have the
highest CAGR primarily due to the fact that more and more organizations across the
globe are creating the demand for innovative and efficient NLP solutions. As of now, the
NLP market scenario is dominated by giants like Apple Inc., Google, Microsoft Corporation,
IBM, Hewlett-Packard, SAS Institute Inc., Netbase Solutions, 3M Company, Dolbey
Systems, and Verint Systems.
As the market for NLP continues to boom in the present scenario, the
choices of tools and resources for the field are also evolving. Previously,
programmers and researchers working on NLP had to rely largely on
an assortment of utility programs to accomplish the basic NLP tasks
and functions. And needless to say, these programs were poorly
drafted and lacked the stability and efficiency required for NLP.
Today, however, the scenario has changed for the better as we now
have access to a wide range of open-source software libraries that
facilitate both scalable and efficient NLP functionalities. The Natural
Language Toolkit for Python (NLTK) is a great example (you can
also use Spacey). With NLTK, researchers/programmers can dedicate
their time in developing the application logic rather than focusing
most of their energy and resources on debugging dates issues, for
instance, a conventional method for sentence segmentation. When it
comes to natural language processing, NLTK offers the basic classes
required for representing data.
RAHO AMBITIOUS 4
Now that we’ve discussed the fundamentals of NLP, it’s time to look
at how it is helping transform the world around us. The drastic evolution
of natural language processing over the years has given birth to
numerous useful real-world applications that are being used by
businesses and organizations across the globe. Let’s have a look at
some of the most crucial applications of NLP in the present day.
1. Sentiment Analysis
Tech giants like Intel, IBM, and Twitter use sentiment analysis to
identify and delineate matters concerning employees. For instance,
Intel is using a Kanjoya software that uses ML algorithms to gain
emotions from texts while IBM performs sentiment analysis on
employee posts to analyze the emotions behind it.
RAHO AMBITIOUS 5
2. Information Extraction
In the financial sector, companies are increasingly making a shift towards algorithmic
trading from human-based administration and control. Since a majority of financial news
and information is in English or other dominant languages, NLP comes in the scene to
convert these textual announcements an extract meaningful information from them to
influence algorithm-driven trading decisions. For instance, the declaration of a merger
between two business giants could have a major impact on the market trading decisions.
Using a trading algorithm, we can find out how the merger will affect the market prices,
the other players in the market, profit margins, and much more.
3. Content Categorization/Classification
Content categorization or classification refers to the process of segmenting a given set of documents
or files into one or more classes that are further contained within multiple classes. It combines
both text mining and natural language processing and can be used for various purposes like the
hierarchical categorization of web pages, document indexing and organization, and so on.
Content classification has been a boon for the media industry. While previously, professionals had
to manually tag content including news pieces, blog posts, and articles, they can now tag all this
content with a single API call, thanks to Natural Language API. Today, content classification is
being leveraged not only by the news and media industry but also by the government,
NGOs, e-commerce platforms, law practitioners, and social researchers, to name a few.
4. Text Summarization
RAHO AMBITIOUS 6
“ Te x t s u m m a r i z a ti o n i s t h e p r o c e s s o f d i s ti l l i n g t h e m o s t i m p o r t a n t i n f o r m a ti o n f r o m a
s o u r c e (o r s o u r c e s) t o p r o d u c e a n a b r i d g e d v e r s i o n f o r a p a r ti c u l a r u s e r (o r u s e r s) a n d
t a s k (o r t a s k s) .”
A l s o o f te n k n ow n a s a u to m a t i c tex t s u m m a r i z a t i o n , i t b r i n g s w i t h i t a h o s t
o f b e n e fi t s s u c h as reduced reading time, improved e ffi c a c y o f d o c u m e n t
i n d ex i n g , a n d b e t te r a n d co nve n i e n t selection process, among others.
5. Question Answering
While it is true that we can access any information about anything on the Internet, there are
times when search engines are at a loss of suggestions for certain specific queries. This is
precisely what the NLP technique of Question Answering (QA) seeks to solve. QA amalgamates
two fields - NLP and information retrieval, to automatically find the answers to the questions
asked by humans in their natural languages. This technique generally uses a program to provide
answers generated from a knowledge base.
Google has been making sincere attempts in the field of NLP to enhance its search results
by identifying and understanding natural language queries posed by humans and the result
is for everyone to see - the search results on Google are now much more optimized than
before.
Now that we’ve taken you through the various real-world applications of Natural Language
Processing, it’s time to have a look at the two main components of Natural Language Processing.
RAHO AMBITIOUS 7
Considering text as the main input, we can divide NLP into
the following components:
Entity extraction
For instance, assume an entity called Sam Dunn. In any written piece,
there could be a number of variations of this entity - ‘Sam’, ‘Mr. Dunn’,
‘Mr. Sam Dunn’, etc. The algorithm should be able to efficiently identify
all of the variations.
“ T h e Fe l l o w s h i p o f t h e R i n g i s t h e fi r s t p a r t o f t h e Lo r d o f t h e
Rings series, and it begins with a summary of the history of the
r i n g o f p o w e r - t h e o n e m a d e b y t h e D a r k Lo r d S a u r o n , i n Mo r d o r.”
RAHO AMBITIOUS 8
The entities that can be extracted from • perilous: modifier
the above sentence might be: • journey: object
• Lord of the rings • to find: verb
• Dark Lord • the ring: noun
• Sauron • and: conjunction
• Mordor • destroy it: verb
• ! : punctuation
Each of these entities will be given a
type, for instance - Sauron will be given Semantic analysis
the entity type “person”, Mordor a
“place”, and so on. Further, each of Once a sentence has been parsed into
these entities will also have a salience entities and the syntax understood,
value depending on the number of semantic analysis explains the meaning
times they occur in the entire text. of the sentence in a context-free form
- as an independent sentence. The
Syntactic analysis inferred meaning may not be the
actual intent of the implied meaning.
Once we’ve extracted the entities, next
comes syntactic analysis. It involves For example, from the sentence
parsing a sentence to understand the “ G o l l u m h a d t h e r i n g ” , the computer
grammar, and co-relate the words used concludes that “had” means “owns”.
in that sentence. This step involves a Hence, it may perceive “G o l l u m h a d a
"context-free grammar check" which is m a n g o” as “ G o l l u m o w n e d a m a n g o”,
independent of the overall context of and not “Gollum ate a mango”. What
the text. However necessary, this isn't this essentially means is that the
enough to understand the complete computer can get confused because
meaning of the sentence. Let’s look at of the grammar rules. It requires a
the syntactic analysis of the following certain knowledge and understanding
s e n t e n c e : “ Fr o d o e m b a r k e d u p o n a of the world to arrive at the correct
perilous journey to find the ring and meaning of the sentence.
d e s t r oy i t ! ”
As you saw in Syntactic analysis, the
Syntactic analysis will parse this sentence system inter-connects the words. For
and break it into the various “parts of instance, the computer will identify
speech” depending on general grammar the root verb and links it to the nouns
rules in the language. Further, this present in the sentence. Semantic
stage also interconnects the various analysis, on the other hand, deals with
parts of the sentence. So, the result lexical semantics which determines
might be something like: the connection between words and
• Frodo: noun (subject) deduce the meaning of the whole
• Embarked: verb sentence.
• Upon: preposition
• a: determinant
RAHO AMBITIOUS 9
Sentiment Analysis
Once we’re done with syntactic and semantic understanding, we look to understand the
sentiment behind each sentence. There are two important terms that you should be
familiar with to better understand sentiment analysis:
Polarity: Polarity is either -1 or +1 and is used to understand the negative or positive emotions
in a sentence. Different APIs use different logic, but most of the logic mainly consider the
negations and sentiment of root verbs to understand the sentence polarity.
Magnitude: Magnitude can range from 0 to infinity and it defines the weight of the
assigned polarity.
For instance, consider the sentence “I saw the new Nolan movie, but I didn’t enjoy it that
much”. The polarity of this sentence will be -1 (because it denotes a negative sentiment).
The magnitude, on the other hand, will depend on the entire context of the statement,
and will go on to define the weight of whatever polarity has been assigned to the sentence.
The combination of both will give us an idea of the sentiment and the intensity of this
sentiment in the overall text piece.
Pragmatic analysis
If you go to any content writer and ask them to restructure/rephrase one line, their first
question to you will be “What is the context?”
Most of the times, due to the flexibility of the language, complexities can arise while
deducing the meaning of an isolated sentence. The Pragmatic analysis is aimed at analyzing
the statement in relation to the statements that precede or succeed it, or even the overall
paragraph of which the sentence is a part.
For instance, consider the sentence: “I made a sandwich today. However, I forgot to
bring it”
In this case, the pronoun ‘it’ refers to ‘sandwich’. For a computer to understand what the
narrator actually forgot, it is important for it to precisely understand the previous statement.
Oft times, more knowledge about the situation may be required to understand the exact
intent of the writer.
Sentiment analysis is often used to understand the impact of a certain business or political
decision. NLP is applied on a large set of content that is published in the media and sentiment
are tracked out and averaged.
RAHO AMBITIOUS 10
Average Settlement
1
0.5
0
12 Aug 13 Aug 14 Aug 15 Aug 16 Aug 17 Aug 18 Aug 19 Aug
-0.5
-1
Fo r e xa m p l e , h a v e a l o o k a t t h e g ra p h a b ov e . I t r e p r e s e n t s t h e s e n ti m e n t s f o r
m e d i a a r ti c l e s o f a c o m p a n y A B C a ft e r t h e l a u n c h o f p r o d u c t X a l l t h e d a t a
o f t h e p r e s s r e l e a s e . - 1 a n d + 1 r e p r e s e n t t h e e x t r e m e p o l a r i ti e s . T h e a v e ra g e
s e n ti m e n t s h a v e b e e n c a l c u l a t e d u s i n g t h e t w o t e r m s ( p o l a r i t y a n d
m a g n i t u d e) w e t a l ke d a b o u t e a r l i e r.
( S o u r c e)
If you feel this is the field for you, we recommend you get your
hands dirty and jump from theoretical knowledge to practical applications.
Dig around, research a few tools, and get started with them. A practical
knowledge of what we discussed will help you understand things
better, and who knows, maybe you’ll be the one to propose the next
breakthrough in the field of Natural Language Processing!
RAHO AMBITIOUS 11
&
RAHO AMBITIOUS
G e t a P G Ce r ti fi c a ti o n i n
Ma c h i n e Le a r n i n g & Na t u ra l L a n g u a g e P r o c e s s i n g
from IIIT Bangalore
FIND US HERE:
Ha v e Q u e s ti o n s?
P l e a s e f e e l f r e e t o d r o p u s a l i n e a t i n f o @ u p g ra d . c o m
and we will be there to help you.
CO P Y R I G H T @ U P G R A D E D U C AT I O N P R I VAT E L I M I T E D