0% found this document useful (0 votes)
318 views14 pages

Natural Language Processing: A Beginner's Guide To Fundamentals of

The document discusses the history and evolution of natural language processing including key developments from the 1950s to today. It also discusses the growing market for NLP and some of the major players. The global NLP market is expected to grow significantly in coming years driven by increased adoption of AI technologies.

Uploaded by

Nilesh Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
318 views14 pages

Natural Language Processing: A Beginner's Guide To Fundamentals of

The document discusses the history and evolution of natural language processing including key developments from the 1950s to today. It also discusses the growing market for NLP and some of the major players. The global NLP market is expected to grow significantly in coming years driven by increased adoption of AI technologies.

Uploaded by

Nilesh Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

A Beginner ’s Guide to

Fundamentals of

NATURAL
LANGUAGE
PROCESSING

RAHO AMBITIOUS
INTRODUCTION 1

EVOLUTION OF NATURAL LANGUAGE PROCESSING 2

CURRENT MARKET SCENARIO 3

PYTHON FOR NATURAL LANGUAGE PROCESSING 4

APPLICATIONS OF NATURAL LANGUAGE PROCESSING 5

COMPONENTS OF NATURAL LANGUAGE PROCESSING 8

CONCLUSION 11
In a layman’s terms, natural language
processing is the science that aims to
b r i d g e t h e g a p b e t we e n h u m a n
communication (natural language) and
the machine’s understanding of it. When
humans communicate with each other in
their native language, we are hardly able
The concept of Natural to understand how our brains process so
Language Processing (NLP) much unstructured data so easily into
meaningful information that makes
has been the talk of the day sense to us. The human speech and
lately along with Artificial language, however, is very difficult for the
Intelligence (AI) and Machine computer to understand and interpret
since it can only understand one
Learning (ML), all of which language - binary. That’s where natural
aim to create smart machines language processing comes in.
that are able to understand
Intriguing as it is, natural language
human language and interact processing is a branch of AI that aims to
with humans with a certain create and design such machines that
degree of intelligence. can understand, interpret, as well as
manipulate human languages. Natural
language processing emanated out of
However, do you know what the collaboration of three disciplines:
the concept of NLP actually Computer Science, Artificial Intelligence,
and Computational Linguistics.
is?

RAHO AMBITIOUS 1
In 1954, at the IBM-Georgetown
Demonstration exhibited the automatic
translation of over sixty Russian
sentences to English, albeit in a very
rudimentary form. The year 1957
marked the revolution in Linguistics
Although the concept of NLP has recently
with Chomsky’s “Syntactic Structures,”
garnered an increasing amount of attention
a universal rulebook of syntactic
in the domains of Computer Science and
structures. Then again in 1969 came
AI, it is not exactly a “new field”. Deeply
Roger Schank’s Conceptual Dependency
rooted in Linguistics, the inception of
Theory for natural language interpretation
natural language processing happened
and understanding.
somewhere along the mid-twentieth
century when new and innovative advances
Basically, until the early 1980s, natural
were being made in the statistical analysis
language processing was driven by
and machine translation.
complicated, hand-written rules. It
was only in the late 1980s that ML
The real progress and experimentation in
algorithms made their way into the
the field of NLP began shortly after the
world of NLP. The integration of statistics,
World War II. As the tension continued to
probability, and ML algorithms began
mount between the US and Soviet Russia
to transform NLP to a purely statistical
and the fear of a nuclear war was in the air,
domain. SHRDLU and Jabberwacky
natural language processing came to be
a re t wo o f t h e e a r l i e st a n d m o st
the most helpful weapon during that time
successful steps in the field of NLP.
by helping translate the Russian language
Fast forward to today, we have smart
into English. It was during the 1950s that a
personal assistants such as Siri, Alexa,
successful breakthrough in NLP was made
and Cortana that can interact with us
- Alan Turing published the acclaimed
just like another human being!
article “Computing Machinery & Intelligence"
which laid the foundation of the universal
criterion for judging one’s intelligence, the
Turing Test.

RAHO AMBITIOUS 2
Though the initial years of the evolution of NLP witnessed
progress at a slow pace, at present, the market for natural
language processing is growing steadily at an impressive
rate, thanks to the massive upsurge in the adoption of
s m a r t d e v i c e s , t h e I o T, t h e r a p i d g r o w t h i n A I a n d
machine-to-machine technologies, and the ever-increasing
demand for enhanced customer experience. According to
the report Natural Language Processing Market by Type, it
is estimated that the NLP market will leap from 7.63 billion
USD to 16.07 billion USD by 2021. Another report by Infoholic
Research suggests that between 2017-23, the global Natural
Language Processing market will grow at a Compound
Annual Growth Rate (CAGR) of 18.78%.

In the global scenario, the market for natural language processing is predominantly
focused in Europe and North America as these regions have a strong concentration of NLP
vendors and also generate the maximum revenue. However, the Asia Pacific region, the
Middle East and Africa (MEA) are also not far behind and is expected to achieve the fastest
growth rate in NLP by the end of the forecast period.

H a rd wa re
S o f t wa re
S e r v i ce s

T h i s g ra p h d e p i c t s t h e s e g m e n t e d a n a l y s i s a n d t h e s i z e o f
t h e g l o b a l N L P m a r ke t f r o m 2 0 1 6 - 2 0 2 5 . ( S o u r c e)

RAHO AMBITIOUS 3
Between the period 2016-21, the professional services segment is estimated to have the
highest CAGR primarily due to the fact that more and more organizations across the
globe are creating the demand for innovative and efficient NLP solutions. As of now, the
NLP market scenario is dominated by giants like Apple Inc., Google, Microsoft Corporation,
IBM, Hewlett-Packard, SAS Institute Inc., Netbase Solutions, 3M Company, Dolbey
Systems, and Verint Systems.

As the market for NLP continues to boom in the present scenario, the
choices of tools and resources for the field are also evolving. Previously,
programmers and researchers working on NLP had to rely largely on
an assortment of utility programs to accomplish the basic NLP tasks
and functions. And needless to say, these programs were poorly
drafted and lacked the stability and efficiency required for NLP.

Today, however, the scenario has changed for the better as we now
have access to a wide range of open-source software libraries that
facilitate both scalable and efficient NLP functionalities. The Natural
Language Toolkit for Python (NLTK) is a great example (you can
also use Spacey). With NLTK, researchers/programmers can dedicate
their time in developing the application logic rather than focusing
most of their energy and resources on debugging dates issues, for
instance, a conventional method for sentence segmentation. When it
comes to natural language processing, NLTK offers the basic classes
required for representing data.

Just as in the programming domain, Python has also come to be a


popular choice for natural language processing. This ever-increasing
popularity of Python is primarily due to the fact that it is a fairly easy
language to learn and understand, and it offers a host of open-source
software libraries such as NumPy, SciPy, Scikit-learn, Pandas, Matplotlib,
and NetworkX that are excellent for data analysis and scientific computing.
Most importantly, Python is the best choice for natural language
processing since its syntax and semantics are highly transparent. It
allows you to build both intuitive and appealing frameworks incorporated
with the right set of building blocks which are immensely helpful in
building a strong foundational knowledge about the basics of NLP.

RAHO AMBITIOUS 4
Now that we’ve discussed the fundamentals of NLP, it’s time to look
at how it is helping transform the world around us. The drastic evolution
of natural language processing over the years has given birth to
numerous useful real-world applications that are being used by
businesses and organizations across the globe. Let’s have a look at
some of the most crucial applications of NLP in the present day.

1. Sentiment Analysis

Sentiment analysis is a pivotal aspect of digital marketing. It is a


method adopted by marketers to gain opinions of the customers or
the target audience to understand how they are interacting with
their products/services. By leveraging relevant algorithms, sentiment
analysis is used to monitor conversations on the social media platforms
to analyze and interpret the language used by and the voice inflections
of individuals on those platforms when they opine about any
product/service or any topic that’s relevant to the business. Thus, it
is also often known as opinion mining. It helps marketers to understand
the attitudes and emotions (both positive and negative) of the customers.
Accordingly, companies can make adjustments to their product
designs, marketing strategies, and so on.

Tech giants like Intel, IBM, and Twitter use sentiment analysis to
identify and delineate matters concerning employees. For instance,
Intel is using a Kanjoya software that uses ML algorithms to gain
emotions from texts while IBM performs sentiment analysis on
employee posts to analyze the emotions behind it.

RAHO AMBITIOUS 5
2. Information Extraction

Information Extraction (IE) refers to the technique of automatically extracting valuable


and structured information from semi-structured or unstructured data through NLP. Usu-
ally, most of the important news and pieces of information on the Internet are written in
natural human languages including English, Spanish, Russian, and the like. However, for a
machine to extract resourceful information from such content written in natural languages,
one has to rely on text mining and NLP techniques. A complete Information Extraction
process using NLP has five basic steps:

Entity Extraction (company names, dollar amounts, core strategies, etc.)


Content Categorization (perform sentiment analysis to categorize content by industry,
by function, by intention, and other relevant categories)
Content Clustering (identifying the fundamental topics of discourse and discover
new topics)
Fact Extraction (feed structured information into databases for analysis and visualization)
Relationship Extraction (explore real-world relationships through graph databases)

In the financial sector, companies are increasingly making a shift towards algorithmic
trading from human-based administration and control. Since a majority of financial news
and information is in English or other dominant languages, NLP comes in the scene to
convert these textual announcements an extract meaningful information from them to
influence algorithm-driven trading decisions. For instance, the declaration of a merger
between two business giants could have a major impact on the market trading decisions.
Using a trading algorithm, we can find out how the merger will affect the market prices,
the other players in the market, profit margins, and much more.

3. Content Categorization/Classification

Content categorization or classification refers to the process of segmenting a given set of documents
or files into one or more classes that are further contained within multiple classes. It combines
both text mining and natural language processing and can be used for various purposes like the
hierarchical categorization of web pages, document indexing and organization, and so on.

Content classification has been a boon for the media industry. While previously, professionals had
to manually tag content including news pieces, blog posts, and articles, they can now tag all this
content with a single API call, thanks to Natural Language API. Today, content classification is
being leveraged not only by the news and media industry but also by the government,
NGOs, e-commerce platforms, law practitioners, and social researchers, to name a few.

4. Text Summarization

In the modern world dominated by technology, data is growing at an unprecedented


pace. The increasing reliance on data to drive innovation and streamline business functions has
made it increasingly important to analyze the rapidly escalating data and make sense out of it.
However, it is impossible to access and store such vast amounts of data and this is where text
summarization comes in. Text summarization denotes the process of summarizing crucial data
and documents in concise form while retaining all the salient facts and points.

RAHO AMBITIOUS 6
“ Te x t s u m m a r i z a ti o n i s t h e p r o c e s s o f d i s ti l l i n g t h e m o s t i m p o r t a n t i n f o r m a ti o n f r o m a
s o u r c e (o r s o u r c e s) t o p r o d u c e a n a b r i d g e d v e r s i o n f o r a p a r ti c u l a r u s e r (o r u s e r s) a n d
t a s k (o r t a s k s) .”

A l s o o f te n k n ow n a s a u to m a t i c tex t s u m m a r i z a t i o n , i t b r i n g s w i t h i t a h o s t
o f b e n e fi t s s u c h as reduced reading time, improved e ffi c a c y o f d o c u m e n t
i n d ex i n g , a n d b e t te r a n d co nve n i e n t selection process, among others.

Trave l co m p a n i e s l i ke E x p e d i a are leveraging the text summarization technique to


enhance the overall c u s to m e r ex p e r i e n ce . U s u a l l y, t rave l we b s i te s h ave a
va s t p o o l of user-generated data (reviews, i m a g e s , co m m e n t s , e tc . ) a n d users
rely on this data to gain knowledge about travel destinations, assess the credibility of
accommodation facilities, top sightseeing spots, and the list goes on. However, b rows i n g
t h ro u g h to n s o f rev i ews c a n n o t o n l y b e exhausting but also overwhelming.
S u m m a r i z i n g s u c h co n te n t o n t rave l p l a t f o r m s m a ke s i t e a s i e r f o r a u s e r to
p ro m p t l y fi n d w h a t t h ey a re l o o k i n g f o r.

Reference: Advances in Automatic Text Summarization (Page 1)

5. Question Answering

While it is true that we can access any information about anything on the Internet, there are
times when search engines are at a loss of suggestions for certain specific queries. This is
precisely what the NLP technique of Question Answering (QA) seeks to solve. QA amalgamates
two fields - NLP and information retrieval, to automatically find the answers to the questions
asked by humans in their natural languages. This technique generally uses a program to provide
answers generated from a knowledge base.

Google has been making sincere attempts in the field of NLP to enhance its search results
by identifying and understanding natural language queries posed by humans and the result
is for everyone to see - the search results on Google are now much more optimized than
before.

Now that we’ve taken you through the various real-world applications of Natural Language
Processing, it’s time to have a look at the two main components of Natural Language Processing.

RAHO AMBITIOUS 7
Considering text as the main input, we can divide NLP into
the following components:

Entity extraction

Entity extraction refers to segmenting a sentence in order to identify


and extract entities. These entities could be a person, organization,
location, time, events, and such. NLP APIs use data from sources like
Wikipedia to match these entities. One of the major challenges in
entity extraction is that of matching different variations of an entity
and clustering it as the same.

For instance, assume an entity called Sam Dunn. In any written piece,
there could be a number of variations of this entity - ‘Sam’, ‘Mr. Dunn’,
‘Mr. Sam Dunn’, etc. The algorithm should be able to efficiently identify
all of the variations.

Entity extraction has two main components:

Entity type: This can be a person, a place, an organization, a task, etc.


Salience: This refers to the importance of the entity on a scale of 0 to 1.

To understand the various stages of NLP, consider the following paragraph:

“ T h e Fe l l o w s h i p o f t h e R i n g i s t h e fi r s t p a r t o f t h e Lo r d o f t h e
Rings series, and it begins with a summary of the history of the
r i n g o f p o w e r - t h e o n e m a d e b y t h e D a r k Lo r d S a u r o n , i n Mo r d o r.”

RAHO AMBITIOUS 8
The entities that can be extracted from • perilous: modifier
the above sentence might be: • journey: object
• Lord of the rings • to find: verb
• Dark Lord • the ring: noun
• Sauron • and: conjunction
• Mordor • destroy it: verb
• ! : punctuation
Each of these entities will be given a
type, for instance - Sauron will be given Semantic analysis
the entity type “person”, Mordor a
“place”, and so on. Further, each of Once a sentence has been parsed into
these entities will also have a salience entities and the syntax understood,
value depending on the number of semantic analysis explains the meaning
times they occur in the entire text. of the sentence in a context-free form
- as an independent sentence. The
Syntactic analysis inferred meaning may not be the
actual intent of the implied meaning.
Once we’ve extracted the entities, next
comes syntactic analysis. It involves For example, from the sentence
parsing a sentence to understand the “ G o l l u m h a d t h e r i n g ” , the computer
grammar, and co-relate the words used concludes that “had” means “owns”.
in that sentence. This step involves a Hence, it may perceive “G o l l u m h a d a
"context-free grammar check" which is m a n g o” as “ G o l l u m o w n e d a m a n g o”,
independent of the overall context of and not “Gollum ate a mango”. What
the text. However necessary, this isn't this essentially means is that the
enough to understand the complete computer can get confused because
meaning of the sentence. Let’s look at of the grammar rules. It requires a
the syntactic analysis of the following certain knowledge and understanding
s e n t e n c e : “ Fr o d o e m b a r k e d u p o n a of the world to arrive at the correct
perilous journey to find the ring and meaning of the sentence.
d e s t r oy i t ! ”
As you saw in Syntactic analysis, the
Syntactic analysis will parse this sentence system inter-connects the words. For
and break it into the various “parts of instance, the computer will identify
speech” depending on general grammar the root verb and links it to the nouns
rules in the language. Further, this present in the sentence. Semantic
stage also interconnects the various analysis, on the other hand, deals with
parts of the sentence. So, the result lexical semantics which determines
might be something like: the connection between words and
• Frodo: noun (subject) deduce the meaning of the whole
• Embarked: verb sentence.
• Upon: preposition
• a: determinant

RAHO AMBITIOUS 9
Sentiment Analysis

Once we’re done with syntactic and semantic understanding, we look to understand the
sentiment behind each sentence. There are two important terms that you should be
familiar with to better understand sentiment analysis:

Polarity: Polarity is either -1 or +1 and is used to understand the negative or positive emotions
in a sentence. Different APIs use different logic, but most of the logic mainly consider the
negations and sentiment of root verbs to understand the sentence polarity.
Magnitude: Magnitude can range from 0 to infinity and it defines the weight of the
assigned polarity.

For instance, consider the sentence “I saw the new Nolan movie, but I didn’t enjoy it that
much”. The polarity of this sentence will be -1 (because it denotes a negative sentiment).
The magnitude, on the other hand, will depend on the entire context of the statement,
and will go on to define the weight of whatever polarity has been assigned to the sentence.
The combination of both will give us an idea of the sentiment and the intensity of this
sentiment in the overall text piece.

Pragmatic analysis

If you go to any content writer and ask them to restructure/rephrase one line, their first
question to you will be “What is the context?”

Most of the times, due to the flexibility of the language, complexities can arise while
deducing the meaning of an isolated sentence. The Pragmatic analysis is aimed at analyzing
the statement in relation to the statements that precede or succeed it, or even the overall
paragraph of which the sentence is a part.

For instance, consider the sentence: “I made a sandwich today. However, I forgot to
bring it”

In this case, the pronoun ‘it’ refers to ‘sandwich’. For a computer to understand what the
narrator actually forgot, it is important for it to precisely understand the previous statement.
Oft times, more knowledge about the situation may be required to understand the exact
intent of the writer.

Sentiment analysis is often used to understand the impact of a certain business or political
decision. NLP is applied on a large set of content that is published in the media and sentiment
are tracked out and averaged.

RAHO AMBITIOUS 10
Average Settlement
1

0.5

0
12 Aug 13 Aug 14 Aug 15 Aug 16 Aug 17 Aug 18 Aug 19 Aug

-0.5

-1

Fo r e xa m p l e , h a v e a l o o k a t t h e g ra p h a b ov e . I t r e p r e s e n t s t h e s e n ti m e n t s f o r
m e d i a a r ti c l e s o f a c o m p a n y A B C a ft e r t h e l a u n c h o f p r o d u c t X a l l t h e d a t a
o f t h e p r e s s r e l e a s e . - 1 a n d + 1 r e p r e s e n t t h e e x t r e m e p o l a r i ti e s . T h e a v e ra g e
s e n ti m e n t s h a v e b e e n c a l c u l a t e d u s i n g t h e t w o t e r m s ( p o l a r i t y a n d
m a g n i t u d e) w e t a l ke d a b o u t e a r l i e r.
( S o u r c e)

With that, we come to the end of our guide on Natural Language


Processing. All in all, natural language offers us with innumerable
benefits. As the underlying technology gets more sophisticated,
well see an increased number of companies leveraging the magic of
NLP to improve the efficiency of documentation processes, the
accuracy of documentation, and identify the most essential information
from large datasets.

If you feel this is the field for you, we recommend you get your
hands dirty and jump from theoretical knowledge to practical applications.
Dig around, research a few tools, and get started with them. A practical
knowledge of what we discussed will help you understand things
better, and who knows, maybe you’ll be the one to propose the next
breakthrough in the field of Natural Language Processing!

RAHO AMBITIOUS 11
&
RAHO AMBITIOUS

G e t a P G Ce r ti fi c a ti o n i n
Ma c h i n e Le a r n i n g & Na t u ra l L a n g u a g e P r o c e s s i n g
from IIIT Bangalore

WHY LEARN FROM UPGRAD AND IIIT BANGALORE?

Learn on-the-go Cutting-edge Curriculum


Experience immersive learning Designed in collaboration with leading
with 200 hours of dedicated industry experts and academia to
lectures, from anywhere anytime create experts in Machine Learning

Active Student Mentorship Dedicated Student Support


Get unparalleled guidance with Our team of in-house student advisors
1-on-1 networking from industry assist the learners on all fronts - we
experts and academia excel make sure you excel

FIND US HERE:

Ha v e Q u e s ti o n s?
P l e a s e f e e l f r e e t o d r o p u s a l i n e a t i n f o @ u p g ra d . c o m
and we will be there to help you.

CO P Y R I G H T @ U P G R A D E D U C AT I O N P R I VAT E L I M I T E D

You might also like