0% found this document useful (0 votes)
65 views25 pages

Natural Language Processing Report (By Sandeep Kumar Dash)

Uploaded by

wonly4701
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views25 pages

Natural Language Processing Report (By Sandeep Kumar Dash)

Uploaded by

wonly4701
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 25

ESTD : 2001

SURYA VIHAR, BERHAMPUR -761008, GANJAM, ODISHA


(Affiliated to BPUT, Rourkela, Odisha and Accredited by AICTE, New Delhi)
(ISO 9001:2008 Certified)

Seminar Report ON natural language processing :-


B.TECH 4th YEAR

Name:- SANDEEP KUMAR DASH

REG.NO:- 2101204064

BRANCH:- COMPUTER SCIENCE ENGINEERING


ABSTRACT

Natural language processing, or NLP, is a type of


artificial intelligence that deals with analyzing,
understanding, and generating natural human
languages so that computers can process written and
spoken human language without using computer-
driven language. Natural language processing,
sometimes also called “computational linguistics,”
uses both semantics and syntax to help computers
understand how humans talk or write and how to
derive meaning from what they say. This field
combines the power of artificial intelligence and
computer programming into an understanding so
powerful that programs can even translate one
language into another reasonably accurately. This field
also includes voice recognition, the ability of a
computer to understand what you say well enough to
respond appropriately.
INDEX

SR.NO TOPICS PAGE NO.

1 INTRODUCTION 1

2 OBJECTIVES 5

3 BRIEF HISTORY 6

4 NLP APPLICATIONS 11

5 NLP GOALS 14

6 NLP STRUCTURE 15

7 FUTURE SCOPE 18

8 CONCLUTION 19
INTRODUCTION

Natural language processing (NLP) is the intersection


of computer science, linguistics and machine learning.
The field focuses on
communication between computers and humans in
natural language and NLP is all about making
computers understand and generate human
language.

Natural language processing studies interactions


between humans and computers to find ways for
computers to process written and spoken words
similar to how humans do. The field blends
computer science, linguistics and machine learning.

Natural language processing has heavily benefited


from recent advances in machine learning, especially
from deep learning techniques. The field is divided
into the three parts:

Speech recognition — the translation of spoken language into


 text.
 Natural language understanding — a computer’s ability to
understand language.
 Natural language generation — the generation of natural
language by a computer

Human language is special for several reasons. It is


specifically constructed to convey the speaker/writer's
meaning. It is a complex system, although little children can
learn it pretty quickly
Another remarkable thing about human language is that it is
all about symbols. According to Chris Manning, a machine
learning professor at Stanford, it is a discrete, symbolic,
categorical signaling system. This means we can convey the
same meaning in different ways (i.e., speech, gesture, signs,
etc.) The encoding by the human brain is a continuous pattern
of activation by which the symbols are transmitted via
continuous signals of sound and vision.

Understanding human language is considered a difficult task


due to its complexity. For example, there are an infinite
number of different ways to arrange words in a sentence.
Also, words can have several meanings and contextual
information is necessary to correctly interpret sentences.
Every language is more or less unique and ambiguous. Just
take a look at the following newspaper headline “The Pope’s
baby steps on gays.” This sentence clearly has two very
different interpretations, which is a pretty good example of
the challenges in natural language processing.

TEXT SEGMENTATION

Text segmentation in natural language processing is the


process of transforming text into meaningful units like words,
sentences, different topics, the underlying intent and more.
Mostly, the text is segmented into its component words, which
can be a difficult task, depending on the language. This is
again due to the complexity of human language. For example,
it works relatively well in English to separate words by
spaces, except for words like “icebox” that belong together
but are separated by a space. The problem is that people
sometimes also write it as “ice-box.”

Relationship Extraction

Relationship extraction takes the named entities of


NER and tries to identify the semantic relationships
between them. This could mean, for example, finding
out who is married to whom, that a person works for a
specific company and so on. This problem can also be
transformed into a classification problem and a
machine learning model can be trained for every
relationship type.

Sentiment Analysis

With sentiment analysis we want to determine the attitude (i.e.


the sentiment) of a speaker or writer with respect to a
document, interaction or event. Therefore it is a natural
language processing problem where text needs to be
understood in order to predict the underlying intent. The
sentiment is mostly categorized into positive, negative and
neutral categories. With the use of sentiment analysis, for
example, we may want to predict a customer’s opinion and
attitude about a product based on a review they wrote.
Sentiment analysis is widely applied to reviews, surveys,
documents and much more.

If you’re interested in using some of these techniques with


Python, take a look at the Jupyter Notebook about Python’s
natural language toolkit (NLTK) that I created. You can also
check out my blog post about building neural networks with
Keras where I train a neural network to perform sentiment
analysis.

Deep Learning and Natural Language


Processing

Central to deep learning and natural language is “word


meaning,” where a word and especially its meaning are
represented as a vector of real numbers. With these vectors
that represent words, we are placing words in a high-
dimensional space. The interesting thing about this is that the
words, which are represented by vectors, will act as a
semantic space. This simply means the words that are similar
and have a similar meaning tend to cluster together in this
high-dimensional vector space. You can see a visual
representation of word meaning below:
OBJECTIVES

Spellcheck and search are so mainstream, that we often


take for granted, especially at work where Natural Language
Processing provides several productivity benefits.

at work, if you want to know the information about your


leaves, you can save the time of asking questions to your
Human Resource Manager. There is a chatbot based searches
in the companies to whom you can request a question and get
answers about any policy of the company. The integrated
search tools in companies make customer resource calls and
accounting up to 10x shorter.

In addition, NLP helps recruiters in sorting job profiles, attract


varied candidates, and select employees that are more
qualified. NLP also helps in spam detection and keeps
unwanted emails out of your mailbox. Gmail and Outlook use
NLP to label messages from specific senders into folders you
create.
BRIEF HISTORY

In the early 1900s, a Swiss linguistics professor named Ferdinand de


Saussure died, and in the process, almost deprived the world of the
concept of “Language as a Science.” From 1906 to 1911, Professor
Saussure offered three courses at the University of Geneva, where he
developed an approach describing languages as “systems.” Within the
language, a sound represents a concept – a concept that shifts
meaning as the context changes.

He argued that meaning is created inside language, in the relations


and differences between its parts. Saussure proposed “meaning” is
created within a language’s relationships and contrasts. A shared
language system makes communication possible. Saussure viewed
society as a system of “shared” social norms that provides conditions
for reasonable, “extended” thinking, resulting in decisions and actions
by individuals. (The same view can be applied to modern computer
languages).

Saussure died in 1913, but two of his colleagues, Albert Sechehaye


and Charles Bally, recognized the importance of his concepts.
(Imagine the two, days after Saussure’s death, in Bally’s office,
drinking coffee and wondering how to keep his discoveries from being
lost forever). The two took the unusual steps of collecting “his notes
for a manuscript,” and his students’ notes from the courses. From
these, they wrote the Cours de Linguistique Générale, published in
1916. The book laid the foundation for what has come to be called the
structuralist approach, starting with linguistics, and later expanding to
other fields, including computers.
In 1950, Alan Turing wrote a paper describing a test for a “thinking”
machine. He stated that if a machine could be part of a conversation
through the use of a teleprinter, and it imitated a human so
completely there were no noticeable differences, then the machine
could be considered capable of thinking. Shortly after this, in 1952,
the HodgkinHuxley model showed how the brain uses neurons in
forming an electrical network. These events helped inspire the idea
of Artificial Intelligence (AI), Natural Language Processing (NLP), and
the evolution of computers.

Natural Language Processing

Natural Language Processing (NLP) is an aspect of Artificial


Intelligence that helps computers understand, interpret, and utilize
human languages. NLP allows computers to communicate with
people, using a human language. Natural Language Processing also
provides computers with the ability to read text, hear speech, and
interpret it. NLP draws from several disciplines, including
computational linguistics and computer science, as it attempts to
close the gap between human and computer communications.

Generally speaking, NLP breaks down language into shorter, more basic
pieces, called tokens (words, periods, etc.), and attempts to understand the
relationships of the tokens. This process often uses higher-level NLP features,
such as:

 Content Categorization: A linguistic document summary that includes content a


duplication detection, search, and indexing.
 Topic Discovery and Modeling: Captures the themes and meanings of text
collections, and applies advanced analytics to the text.
 Contextual Extraction: Automatically pulls structured data from textbased
sources.
 Sentiment Analysis: Identifies the general mood, or subjective opinions, stored
large amounts of text. Useful for opinion mining.
 Text-to-Speech and Speech-to-Text Conversion: Transforms voice commands int
and vice versa.
NLP Begins and Stops

Noam Chomsky published his book, Syntactic Structures, in 1957. In it, he


revolutionized previous linguistic concepts, concluding that for a computer to
understand a language, the sentence structure would have to be changed.
With this as his goal, Chomsky created a style of grammar called Phase-
Structure Grammar, which methodically translated natural language
sentences into a format that is usable by computers. (The overall goal was to
create a computer capable of imitating the human brain, in terms of in
thinking and communicating, or AI.)

In 1958, the programming language LISP (Locator/Identifier Separation


Protocol), a computer language still in use today, was released by John
McCarthy. In 1964, ELIZA, a “typewritten” comment and response process,
designed to imitate a psychiatrist using reflection techniques, was
developed. (It did this by rearranging sentences and following relatively
simple grammar rules, but there was no understanding on the computer’s
part.) Also in 1964, the U.S. National Research Council (NRC) created the
Automatic Language Processing Advisory Committee, or ALPAC, for short.
This committee was tasked with evaluating the progress of Natural Language
Processing research.

In 1966, the NRC and ALPAC initiated the first AI and NLP stoppage, by
halting the funding of research on Natural Language Processing and machine
translation. After twelve years of research, and $20 million dollars, machine
translations were still more expensive than manual human translations, and
there were still no computers that came anywhere near being able to carry
on a basic conversation. In 1966, Artificial Intelligence and Natural Language
Processing (NLP) research was considered a dead end by many (though not
all).
Return of the NLP

It took nearly fourteen years (until 1980) for Natural Language Processes and
Artificial Intelligence research to recover from the broken expectations
created by extreme enthusiasts. In some ways, the AI stoppage had initiated
a new phase of fresh ideas, with earlier concepts of machine translation
being abandoned, and new ideas promoting new research, including expert
systems. The mixing of linguistics and statistics, which had been popular in
early NLP research, was replaced with a theme of pure statistics. The 1980s
initiated a fundamental reorientation, with simple approximations replacing
deep analysis, and the evaluation process becoming more rigorous.

Until the 1980s, the majority of NLP systems used complex, “handwritten”
rules. But in the late 1980s, a revolution in NLP came about. This was the
result of both the steady increase of computational power, and the shift to
Machine Learning algorithms. While some of the early Machine Learning
algorithms (decision trees provide a good example) produced systems similar
to the old school handwritten rules, research has increasingly focused on
statistical models. These statistical models are capable making soft,
probabilistic decisions. Throughout the 1980s, IBM was responsible for the
development of several successful, complicated statistical models.

In the 1990s, the popularity of statistical models for Natural Language


Processes analyses rose dramatically. The pure statistics NLP methods have
become remarkably valuable in keeping pace with the tremendous flow of
online text. N-Grams have become useful, recognizing and tracking clumps
of linguistic data, numerically. In 1997, LSTM recurrent neural net (RNN)
models were introduced, and found their niche in 2007 for voice and text
processing. Currently, neural net models are considered the cutting edge of
research and development in the NLP’s understanding of text and speech
generation.
After the Year 2000

In 2001, Yoshio Bengio and his team proposed the first neural “language”
model, using a feed-forward neural network. The feedforward neural network
describes an artificial neural network that does not use connections to form a
cycle. In this type of network, the data moves only in one direction, from
input nodes, through any hidden nodes, and then on to the output nodes.
The feed-forward neural network has no cycles or loops, and is quite different
from the recurrent neural networks.

In the year 2011, Apple’s Siri became known as one of the world’s first
successful NLP/AI assistants to be used by general consumers. Within Siri,
the Automated Speech Recognition module translates the owner’s words into
digitally interpreted concepts. The Voice-Command system then matches
those concepts to predefined commands, initiating specific actions. For
example, if Siri asks, “Do you want to hear your balance?” it would
understand a “Yes” or “No” response, and act accordingly.

By using Machine Learning techniques, the owner’s speaking pattern doesn’t


have to match exactly with predefined expressions. The sounds just have to
be reasonably close for an NLP system to translate the meaning correctly. By
using a feedback loop, NLP engines can significantly improve the accuracy of
their translations, and increase the system’s vocabulary. A well-trained
system would understand the words, “Where can I get help with Big Data?”
“Where can I find an expert in Big Data?,” or “I need help with Big Data,” and
provide the appropriate response.

The combination of a dialog manager with NLP makes it possible to develop a


system capable of holding a conversation, and sounding human-like, with
back-and-forth questions, prompts, and answers. Our modern AIs, however,
are still not able to pass Alan Turing’s test, and currently do not sound like
real human beings. (Not yet, anyway.)
NLP APPLICATIONS

1. Email filtering
Email is a part of our everyday life. Whether it is related to work or studies or
many other things, we find ourselves plunged into the pile of emails. We
receive all kinds of emails from various sources; some are work-related or
from our dream school or university, while others are spam or promotional
emails. Here Natural Language Processing comes to work. It identifies and
filters incoming emails into “important” or “spam” and places them into their
respective designations.

2. Language translation
There are as many languages in this world as there are cultures, but not
everyone understands all these languages. As our world is now a global
village owing to the dawn of technology, we need to communicate with other
people who speak a language that might be foreign to us. Natural Language
processing helps us by translating the language with all its sentiments.

3. Smart assistants
In today’s world, every new day brings in a new smart device, making this
world smarter and smarter by the day. And this advancement is not just
limited to machines. We have advanced enough technology to have smart
assistants, such as Siri, Alexa, and Cortana. We can talk to them like we talk
to normal human beings, and they even respond to us in the same way.

All of this is possible because of Natural Language Processing. It helps the


computer system understand our language by breaking it into parts of
speech, root stem, and other linguistic features. It not only helps them
understand the language but also in processing its meaning and sentiments
and answering back in the same way humans do.

4. Document analysis
Another one of NLP’s applications is document analysis. Companies, colleges,
schools, and other such places are always filled to the brim with data, which
needs to be sorted out properly, maintained, and searched for. All this could
be done using NLP. It not only searches a keyword but also categorizes it
according to the instructions and saves us from the long and hectic work of
searching for a single person’s information from a pile of files. It is not only
limited to this but also helps its user to inform decision-making on claims and
risk management.
5. Online searches
results even when you do not know the exact keywords you need to search
for the needed information? Well, the answer is obvious. In this world full of
challenges and puzzles, we must constantly find our way by getting the
required information from available sources. One of the most extensive

information sources is the internet. We type what we want to search and


checkmate! We have got what we wanted. But have you ever thought about
how you get these

It is again Natural Language Processing. It helps search engines understand


what is asked of them by comprehending the literal meaning of words and
the intent behind writing that word, hence giving us the results, we want.

6. Predictive text
A similar application to online searches is predictive text. It is something we
use whenever we type anything on our smartphones. Whenever we type a
few letters on the screen, the keyboard gives us suggestions about what that
word might be and when we have written a few words, it starts suggesting
what the next word could be. These predictive texts might be a little off in
the beginning.

Still, as time passes, it gets trained according to our texts and starts to
suggest the next word correctly even when we have not written a single
letter of the next word. All this is done using NLP by making our smartphones
intelligent enough to suggest words and learn from our texting habits.

7. Automatic summarization
With the increasing inventions and innovations, data has also increased. This
increase in data has also expanded the scope of data processing. Still,
manual data processing is time taking and is prone to error. NLP has a
solution for that, too, it can not only summarize the meaning of information,
but it can also understand the emotional meaning hidden in the information.
Thus, making the summarization process quick and impeccable.

8. Sentiment analysis
The daily conversations, the posted content and comments, book,
restaurant, and product reviews, hence almost all the conversations and
texts are full of emotions.Understanding these emotions is as important as
understanding the word-to-word meaning. We as humans can interpret
emotional sentiments in writings and conversations, but with the help of
natural language processing, computer systems can also understand the
sentiments of a text along with its literal meaning.
9. Chatbots
With the increase in technology, everything has been digitalized, from
studying to shopping, booking tickets, and customer service. Instead of
waiting a long time to get some short and instant answers, the chatbot
replies instantly and accurately. NLP gives these chatbots conversational
capabilities, which help them respond appropriately to the customer’s needs
instead of just bare-bones replies.

Chatbots also help in places where human power is less or is not available
round the clock. Chatbots operating on NLP also have emotional intelligence,
which helps them understand the customer’s emotional sentiments and
respond to them effectively.

10. Social media monitoring


Nowadays, every other person has a social media account where they share
their thoughts, likes, dislikes, experiences, etc., which tells a lot about the
individuals. We do not only find information about individuals but also about
the products and services. The relevant companies can process this data to
get information about their products and services to improve or amend them.
NLP comes into play here.
NLP GOALS

The goal of natural language processing is to specify a language


comprehension and production theory to such a level of detail that a
person is able to write a computer program which can understand and
produce natural language. The basic goal of NLP is to accomplish human
like language processing. The choice of word “processing” is very
deliberate and should not be replaced with “understanding”. For although
the field of NLP was originally referred to as Natural Language
Understanding (NLU), that goal has not yet been accomplished. A full NLU
system would be able to:

® Paraphrase an input text.

® Translate the text into another language.

® Answer questions about the contents of the text.

® Draw inferences from the text.

While NLP has made serious inroads into accomplishing goals from first to
third, the fact that NLP system can not, of themselves, draw inferences from
text, NLU still remains the goal of NLP. Also there are some practical
applications of NLP. An NLP-based IR system has the goal of providing more
precise, complete information in response to a user’s real information need.
The goal of the NLP system is to represent the true meaning and intent of the
user’s query, which can be expressed as naturally in everyday language.
NLP STRUCTURE

NLP tools transform text into something a machine can understand, then
machine learning algorithms are fed trainin gdata and expected outputs
(tags) to train machines to make associations between a particular input and
its corresponding output. Machines then use statistical analysis methods to
build their own “knowledge bank” and discern which features best represent
the texts, before making predictions for unseen data (new texts):

Ultimately, the more data these NLP algorithms are fed, the more accurate
the text analysis models will be.

Sentiment analysis (seen in the above chart) is one of the most popular NLP
tasks, where machine learning models are trained to classify text by polarity
of opinion (positive, negative, neutral, and everywhere in between).
FUTURE SCOPE

The future of Natural Language Processing (NLP) is a little bit


unpredictable, but it is clear that it will be a part of our daily
lives in the next few years. NLP is the process of understanding
natural human language. In other words, it is the ability for
machines and computers to understand human language. The
first phrase that comes to mind, when we hear about NLP is
“Siri”, which is a personal assistant for the iPhone. Siri can
understand what you are saying, but it can’t understand what
you mean. The future of NLP is to have machines that can
understand and have a general understanding of human
language. This would allow us to interact with machines in ways
that we do with other humans.

Natural Language Processing is a term that has been around for


decades and has become an everyday part of our lives.
From the moment we wake up, to the moment we go to sleep, we
interact with NLP. Whether we know it or not, Natural
Language Processing is the technology that powers many of the
everyday things we do. It is the backbone of chatbots, Siri,
Alexa, Google and other voice-activated devices. The
development of Natural Language Processing has been a
relatively slow process, but in
massive strides. In the last couple of years, NLP has become
part of the public consciousness due to its rapid development
and the increasing number of applications. NLP has also been
getting a lot of attention because of its potential to improve the
way we do things. This is why NLP has been a trending topic in
the last few years.
The implications are quite huge — computers will be able to
understand what we say and what we mean by the words we use.
This means that we’ll be able to create machines that can not
only understand what we want, but also predict what we’re
going to want. Machines will be able to read our minds! Well, not
really. But they’ll be able to help us in ways we can’t even
imagine right now.

An area of study that often focuses on the statistical and


mathematical underpinnings of natural language processing and
sometimes includes the more theoretical side of the field, such as
work on natural language semantics. NLP is also seen as an
information interface between humans and computers. Natural
language processing has been the topic of research for more
than 50 years and has many successful real-world applications
today. For example, NLP is frequently used in information
retrieval, text mining, question answering, machine translation
and speech recognition.

It has only been around since the early 1960s, when


researchers first started trying to teach computers how to
understand human languages. As with most new technologies,
the first applications weren’t always perfect. For example, the
first spell-checkers were just dictionaries with words in
alphabetical order. No grammar checking, no sentence
structure, just a list of words. In fact, the first spell-checker was
created in the 1950s by a Harvard student named Ward
Farnsworth. His system would print out a list of words and their
most likely misspellings in a box underneath the text
However, this was a pretty big improvement over just guessing

111words at random, which was the only other option available.


Since the early days, NLP has grown in leaps and bounds. In the
1970s, IBM created a software program called STREPS
(Syntactic Transformation, Evaluation, and Production System).
This was a pretty big deal at the time because it was the first
program to be able to take a sentence in one language and
translate it into another language. While the output wasn’t
always perfect, it was a huge step forward.

Natural language processing (NLP) has been around for a while


now, but it’s only recently that it’s been making huge leaps and
bounds in terms of improvements. From search engines like
Google and Bing to chatbots, NLP is everywhere. Some things
that NLP can be used for include: Text-based learning, search,
social media analytics, web search, document management,
content analysis, and data analytics and visualization. Most
people don’t realize just how much NLP is improving and some
of the insane changes that are happening, but it’s set to
completely change the gaming industry, search engines, and
even how we communicate with each other.
CONCLUSION

NLP supposedly makes the job easier but still demands a human
interference. People and the industry fear NLP would start a trend
of job snatching which is true to a certain sense but it certainly
cannot function the way it does without human inputs. The will to
work and cater to the loopholes or bugs in a machine is the task
of a human who is handling it. Notwithstanding, the advantages
of NLP may anger in the arena of jobs but right now it is the
knight in the shining armor of the industry.

You might also like