0% found this document useful (0 votes)

23 views24 pages

NLP UNIT 1 Part 1

Natural Language Processing (NLP) is a technology enabling machines to understand and manipulate human language, with applications in translation, summarization, and speech recognition. The history of NLP spans from the 1940s with machine translation to modern advancements utilizing machine learning and the internet. Key components include Natural Language Understanding (NLU) and Natural Language Generation (NLG), with various applications such as sentiment analysis, chatbots, and information extraction.

Uploaded by

pavani20891

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views24 pages

NLP UNIT 1 Part 1

Uploaded by

pavani20891

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 24

NATURAL LANGUAGE PROCESSING

UNIT 1 - PART 1

What is NLP?

NLP stands for Natural Language Processing, which is a part of computer

science, Human Language and Artificial Intelligence. It is the technology
that is used by machines to understand , analyse, manipulate, and interpret
human’s language. It helps developers to organise knowledge for
performing tasks such as translation, automatic summarization, Named
Entity Recognition (NER), Speech Recognition, relationship extraction and
topic segmentation.

History of NLP

(1940 - 1960) - Focused on Machine Translation (MT)

● The Natural Language Processing started in the 1940’s.

● 1948 - In the year 1948, the first recognizable NLP application was
introduced in Birkbeck College, London.
● 1950’s - In the 1950’s there was a conflicting view between linguistics
and Computer Science, Now Chomsky developed his first book
syntactic and claimed that language is generative in nature.
● In 1957, Chomsky also introduced the idea of generative Grammar,
which is rule based description of Syntactic Structures.

(1960 - 1980) - Flavoured with Artificial Intelligence (AI)

In the year 1960 to 1980, the key developments were:

● Augmented Transition Networks (ATN): It is a finite state machine

that is capable of recognizing regular languages.
● Cause Grammar: This was developed by Linguist Charles J. Fillmore,
in the year 1968. Cause Grammar uses languages such as English to
express the relationship between nouns and verbs such as English to
express the relationship between nouns and verbs by using the
preposition. In Cause Grammar, cause roles can be defined to link
certain kinds of verbs and objects. For example: “Neha broke the
mirror with the hammer”. In this example, case grammar identifies
Neha as an agent, mirror as a theme, and hammer as an instrument.
● SHRDLU: SHRDLU is a program written by Terry Winograd in 1968 -
1970. It helps users to communicate with the computer and moving
objects. It can handle instructions such as ”pick up the green ball”
and answer the questions like “what is inside the black box?”. The
main importance of SHRDLU is the it shows these syntax, semantics
and reasoning about the world that can be combined to produce a
system that understands a natural language.
● LUNAR: Lunar is the classic example of a Natural database interface
system that is used in ATN’s and wood procedural semantics. It was
capable of translating elaborate natural language expressions into
database queries and handle 78% of requests without errors.

(1980 - Current)

Till the year 1980, NLP systems were based on complete sets of hand-
written rules. After 1980, NLP introduced machine learning algorithms for
language processing. In the beginning of the 1990’s, NLP started growing
faster and achieved good process accuracy, especially in english grammar.
In 1990’s, electronic text was introduced, which provided a good resource
for training and examining natural language programs. Other factors may
include the availability of computers with fast CPU’s and more memory.
The major factor behind the advancement of NLP was the internet. Now,
modern NLP consists of various applications like speech recognition,
machine translation and machine text reading. When We combine all these
applications then it allows artificial intelligence to gain knowledge of the
world. let's consider the example of Amazon Alexa. Using this robot, you
can ask the question to ALEXA and it will reply to you.

Advantages of NLP:

1. NLP helps users to ask questions about any subject and get a direct
response within seconds.
2. NLP office exact answers to the questions means it does not offer
unnecessary and unwanted information.
3. NLP helps computers to communicate with humans in their
languages.
4. It is very time efficient.
5. Most of the companies use NLP to improve the efficiency of
documentation processes, accuracy of documentation, and identify
the information from large data bases.

Disadvantages of NLP:

1. NLP may not show context.

2. NLP is unpredictable.
3. NLP may require more keystrokes.
4. NLP is unable to adapt to the new domain, and it has a limited
function that's why NLP is built for a single and specific task only.

Components of NLP:

These are the two following components of NLP:

1. Natural Language Understanding (NLU):

It helps the machine to understand and analyse human language by
extracting the metadata from content such as concepts, entities,
keywords, emotions, relation and semantic roles. NLU is mainly used
in business applications to understand the customer’s problem in
both spoken and written language .NLU in the following tasks -

● It is used to map the given input into useful representation.

● it is used to analyse different aspects of the language.

2. Natural Language Generation (NLG):

NLG acts as a translator that converts the computerised data into

natural language representation. it mainly involved text planning,
sentence planning and text realisation.

NOTE: The NLU is difficult than NLG

Difference between NLU and NLG:

NLU NLG
NLU is the process of reading and NLG is the process of writing or
interpreting language. generating languages.
It produces non-linguistic outputs It produces constructing natural
from natural language inputs. language outputs non-linguistic
inputs.

Applications of NLP:

The following applications of NLP:

1. Question Answering: It focuses on building systems that
automatically answer the questions asked by humans in a natural
language.

2. Spam Detection: Sapm detection is used to detect unwanted user’s

inbox.

3. Sentiment Analysis: It is also known as opinion mining. It is used on

the web to analyse the attitude, behaviour, and emotional state of the
sender. This application is implemented through a combination of
NLP and statistics by assigning the values to the text (positive,
negative or neutral) to identify the mood of the context (happy, sad,
angry etc.)
4. Machine Translation: It is used to translate text or speech from one
natural language to another natural language. Ex: Google Translator
5. Spelling Correction: Microsoft Corporation provides word processor
software like MS-Word, powerpoint for the spelling correction. Ex:
Virtual Reality, Embedded Systems, Robotics, PHP
6. Speech Recognition: It is used for converting spoken words into text.
It is used in applications such as mobile, home automation, video
recovery, dictating to microsoft word, voice biometrics, voice user
interface and so on.
7. Chatbot: Implementing the chatbot is one of the important
applications of NLP. It is used by many companies to provide the
customer’s chat services.

8. Information Extraction: It is one of the most important applications of

NLP. It is used for extracting structured information from unstructured
or semi-structured machine readable documents.
9. Natural Language Understanding (NLU): It converts a large set of text
into more formal representations such as first-order logic structures
that are easier for the computer programs to manipulate notations of
the natural language processing.
Difference between Natural Language and Computer Language:
NATURAL LANGUAGE COMPUTER LANGUAGE
NL has very large vocabulary CL has a very limited vocabulary
NL is easily understood by humans It is easily understood by the
machines
Nl is ambiguous in nature It is unambiguous

How to bulid NLP pipeline:

There are the following steps to build on NLP pipeline-

Step1: Sentence Segmentation

It is the first step for building the NLP pipeline. It breaks the paragraph into
separate sentences. Ex: Consider the following paragraph- “Independence
Day is one of the important festivals for every Indian Citizen. It is celebrated
on the 15th of August each year ever since India got Independence from
British rule. The day celebrates independence in the true sense.”
Sentence segmentation produces the following result:
1. “Independence Day is one of the important festivals for energy Indian
citizen.”
2. “It is celebrated on the 15th of August each year ever since India got
Independence from British rule.”
3. “The day celebrates independence in the true sense.”

Step 2: Word Tokenization

Work Tokenizer is used to break the sentence into separate words or
tokens.
Ex: JavaTpoint offers corporate training, summer training, online training
and winter training.
Word tokenizer generates following results: ‘JavaTpoint”, “offers”,
“corporate”, “training”, “online”, “summer”, “training”, “online”, “training”,
“and”, “winter”, “training”,”.”

Step 3: Stemming
Stemming is used to normalise words into its base form or root form. For
example, “Celebrates, celebrated and celebrating”, all these are originated
with a single root word “celebrate”. The big problem with stemming is that
sometimes it produces the root word which may not have any meaning.
Intelligence, Intelligent and Intelligently -> root “Intelligen”
In English, the word Inteligen does not have any meaning.

Step 4: Lemmatization
It is quite similar to stemming. It is used to group different inflected forms of
the word called Lemma. The main difference between stemming and
Lemmatization is that it produces the root word, which has a meaning.
For Example, In Lemmatization, the words intelligence, intelligent and
intelligently has a root word intelligent which has meaning.

Step 5: Identifying Stop Words

In English, there are a lot of words that appear very frequently like “is” “and”
“the and “a”. NLP pipelines will flag these words as stop words. Stop words
might be filtered out before doing any statistical analysis. For example, He
is a good boy.

Step 6: Dependency Parsing

It is used to find out how all the words in the sentence are related to each
other.

Step 7: POS Tags

POS stands for part of speech, which includes nouns, verbs, adverbs and
adjectives. It indicates how a word functions with its meaning as well as
grammatically within the sentences. A word has one or more parts of
speech based on the context in which it is used. Ex: “Google” something on
the Internet. In the above example, Google is used as a verb, all the it is a
proper noun.

Step 8: Named Entity Recognition (NER)

It is the process of detecting the named entity such as person name, movie
name, organisation name or location. Ex: Steve Jobs introduced the
iPhone at the Mac world conference in San Francisco, California.

Step 9: Chunking
It is used to collect individual pieces of information and group them into
bigger pieces of sentences.

Phases of NLP:
The following are the five phases of NLP:

1. Lexical Analysis:
This face scans the source code as a stream of characters and
converts it into meaningful lexemes.It divides the whole text into
paragraph sentences and words.
2. Syntactic Analysis (Parsing):
It is used to check the grammar word arrangements and shows the
relationship among the words. Ex: Agra goes to the Poonam. In the
real world ‘Agra goes to the Poonam’ does not make any sense so
the sentence is rejected by the syntactic analyzer.
3. Semantic Analysis:
It is concerned with the meaning of representation. It mainly focuses
on the literal meaning of words Phrases and sentences.
4. Discourse Integration:
It depends upon the sentences that precede it and also involves the
meaning of the sentences that follow it.
5. Pragmatic Analysis:
It is the fifth and the last phase of NLP. It helps you to discover the
intended effect by applying a set of rules that characterise corporative
dialogues.

Ex: “Open the door” is interpreted as a request instead of an order.

1.Lexical and Morphological Analysis

1. Tokenization: Splits text into smaller units such as sentences or words.

o Example: "She likes cats." → ["She", "likes", "cats"].
2. Lemmatization: Converts words to their base or dictionary form while retaining
meaning.
o Example: "running" → "run".
3. Stopword Removal: Filters out common words that add little meaning (e.g., "is", "and").
4. Correcting Misspelled Words: Fixes spelling errors to ensure accuracy.
5. Morphological Analysis: Analyzes the structure of words by breaking them into
morphemes (smallest meaningful units).
o Example: "unhappily" → "un" (prefix) + "happy" (root) + "ly" (suffix).

2.Syntactic Analysis (Parsing)

1. Parsing: Analyzes the grammatical structure of sentences to ensure they follow

language rules.
o Example: "She eats apples." (Correct syntax).
2. Part-of-Speech (POS) Tagging: Assigns grammatical roles (noun, verb, adjective, etc.)
to words in a sentence.
o Example: "The cat sleeps." → ["The" (Determiner), "cat" (Noun), "sleeps" (Verb)].
3. Avoiding Ambiguity: Resolves ambiguities in grammar to clarify sentence meaning.
o Example: "book" (noun: a reading item or verb: to reserve).

3.Semantic Analysis

1. Semantic Analysis: Extracts the meaning of words and sentences, focusing on both
literal and contextual meanings.
2. Named Entity Recognition (NER): Identifies specific entities such as names, places, or
organizations in the text.
o Example: "Google is based in California." → Google (Organization), California
(Location).
3. Word Sense Disambiguation (WSD): Resolves the meaning of ambiguous words
based on context.
o Example: "bank" → riverbank or financial institution.

4.Discourse Integration

1. Discourse Integration: Analyzes how sentences relate to one another in context to

derive meaning.
o Example: "She dropped her bag. It broke." → "It" refers to the bag.
2. Contextual Understanding: Examines text relationships to ensure coherent
interpretation.

5.Pragmatic Analysis

1. Pragmatic Analysis: Interprets the implied meaning behind text, considering context,
tone, and speaker intent.
o Example: "I'm starving!" (Implies hunger, not literal starvation).
2. Context and Tone: Analyzes nuances to understand emotions or implied meanings.

Why is NLP difficult?

NLP is difficult because ambiguity and uncertainty exist in the language.

Ambiguity
There are three ambiguities:
1. Lexical Ambiguity: It exists in the presence of two or more possible
meanings of the sentence within a single word.
Ex: Manya is for a match.
In the above example the word match refers to that either
Manya is looking for a partner or looking for a match (cricket or
other match)

2. Syntactic Ambiguity:It exists in the presence of two or more possible

meanings within the sentence.
Ex: I saw the girl with the binoculars.
In the above example, did I have the Binocular or did the girl
have the binoculars?

3. Referential Ambiguity: It exists when you are referring to something

using a pronoun.
Ex: Kiran went to Sunita. she said,” I am hungry”
In the above example, you do not know who is hungry, either
Kiran or Sunita.

How to implement NLP?

The following are the methods to implement NLP-
NLP applications (API’s)
It allows the developers to integrate human-to-machine communications
and complete several useful tasks such as speech recognition, chatbots,
spelling correction, sentiment analysis etc.

Origins and challenges of NLP:-

Natural Language Processing (NLP) is
➔ A field of computer science, artificial intelligence ( also called
machine learning) and linguistics.
➔ concerned with the interactions between computers and human
( natural) languages.
➔ specifically, the process of a computer extracting meaningful
information from National language input and/ or producing natural
language output.
➔ NLP, sometimes mistakenly termed Natural Language understanding
originated from machine translation research. While NLU involves
only the interpretation of language, NLP includes both interpretation
and production (generation). It also includes speech processing.

For humans, learning in early childhood occurs in a consistent way;

children interact with unstructured data and process that data into
information. After amassing this information, we begin to analyse
information in an attempt to understand its implications in a given situation.
We understand that at a certain point, we have a learned understanding of
our life and environment. Only after understanding implications, can
information be used to solve a set of problems or life situations. Humas
iterate through multiple scenarios to consciously or unconsciously stimulate
whether a solution will be a success or failure. After practice with this
unstructured data -> information -> knowledge -> wisdom.
Machine learn by a similar method,
● Initially, the machine translates unstructured textual data into
meaningful terms.
● Then identifies connections between these terms.
● Finally comprehend the context.

Challenges of NLP:-
● Breaking the sentence
● Building the appropriate vocabulary
● Linking different components of vocabulary
● Setting the context
● Extracting the semantic meaning
● Extracting the named entries
● Use Case: Transforming unstructured to structured form.

Differentiate between the rationalist and empiricist approach form NLP?

Rationalists believe that reason can explain the working of the world
empiricists believe that evidence through experimentation can explain
reality. (Write real time example)

Breaking the sentence:

Formally referred to as “sentence boundary disambiguation” this breaking
process is no longer difficult to archive, but is nevertheless, a critical
process, especially in the case of highly unstructured data that includes
structured information. A breaking application should be intelligent enough
to separate paragraphs into their appropriate sentence units. Highly
complex data might not always be available in easily recognisable
sentence forms. This data may exist in the form of tables, graphics,
robotics, page breaks etc. which read to be appropriately processed for the
machine to derive meaning in the same way a human would approach
interpreting text.

Solution: Tagging the parts of speech (pos) & generating dependency

graphs.

NLP applications employ a set of pos tagging tools that assign a POS tag
to each word or symbol in a given text. Subsequently, the position of each
word in a sentence is determined by a dependency graph, generated in the
same procedure. Those POS tags can be further processed to create
meaningful single or compound vocabulary terms.

Building the appropriate vocabulary:

Using these pos tags and dependency graphs, a powerful vocabulary can
be generated and subsequently interpreted by the machine in a way
comparable to human understanding. Ex: “All employees are responsible
for the management of risk, with the ultimate accountability residing with
the board”
Sentences are generally simple enough to be parsed by a NLP program.
But to be real value, an algorithm should be able to generate, at a
minimum, the following vocabulary terms:
Employees; Management of Risk; Ultimate Accountability; Board

3. Linking different components of vocabulary

Recently, new approaches have been developed that can execute
extraction of the linkage between any two vocabulary terms generated from
the document.
Word2Vec, a vector-space based model, assigns vectors to each word in a
document; those vectors ultimately capture each word's relationship to a
closely occurring word or set of words.

The above example, ”Board” and “Management of risk” are two vocabulary
terms connected with boards having ultimate accountability.

4. Setting the Context:

Important challenging task in the entire NLP process is to train a machine
to derive context from a discussion within a document.
Consider the following two sentences:
“I enjoy working in a bank”
“I enjoy working near a river bank”
The context of these sentences is quite different; graphs are used to train a
machine to understand the differences between the sentences.

In the above example, “enjoy working in a bank” suggests “work” or “job”

“protection” while “enjoying near a river bank” is just a type of work or
activity that can be performed near a river bank.

5. Extracting Semantic Meaning:

Linguistic analysis of vocabulary terms might not be enough for a machine
to correctly apply learned knowledge. To successfully apply learning, a
machine must understand further, the semantics of every vocabulary term
with the context of the documents.
Ex: “Under us GAAP, gains and losses from AFS assets are included in net
income”
“Under IFRS, gains and losses from AFS assets are included in
comprehensive income”
Both sentences have the content of gains and losses in proximity to
some form of income, but resultant information needed to be understood is
entirely different between these sentences due to differing semantics. It is
a combination, encompassing both linguistic and semantic methodologies,
that would allow the machine to truly understand the meanings with a
selected text.
6. Extracting named entities (often referred to as Named Entity Recognition
(NER))
The big challenge is successfully executing NER, which is essential when
training a machine to distinguish between simple vocabulary and named
entities. In many instances these entities are surrounded by dollar
amounts, places, locations, numbers, time etc.

7. Use Case: Transforming unstructured data into Structured Format

Putting the unstructured data into a format that could be reusable for
analysis.

Language and Grammar:-

“Grammar in NLP is a set of rules for constructing sentences in a language
used to understand and analyse the structure in text data”
“Language is a means of communications to share knowledge and
the expression”
Automatic processing of language requires the rules and exceptions of a
language to be explained to the computer. Grammar defines language. It
consists of a set of rules that allows us to parse and generate sentences in
a language. Thus, it provides the means to specify natural language. These
rules relate information to coding devices at the language level not at the
world-knowledge level.
The main hurdle in language specification comes from the constantly
changing nature of natural languages and the presence of a large number
of hard-to-specify exceptions.
Several efforts have been made to provide such specifications which
has led to the development of a no. of grammars.
Main are, transformational grammar, lexical functional Grammar,
government and binding, generalised phrase structure grammar,
dependency grammar, paninian grammar & tree-adjoining grammar.
Some of these grammars focus on derivation (eg: phrase structure
grammar) while other focus on relationships (eg: dependency grammar,
lexical functional grammar, paninian grammar & link grammar)
The term generative grammar is often used to refer the general
Framework introduced by Chomsky.
Generative grammar basically refers to any grammar that uses a set
of rules to specify or generate all and only grammar ( well formed)
sentences in a language. Chomsky argued that phrase structure
grammars are not adequate to specify natural language. He proposed a
complex system of transformational grammar. He suggested that each
sentence in a language has two levels of representation, namely a deep
structure and surface structure. The mapping from deep structure to
surface structure is carried out by transformations.

The surface representation of deeper structure represents its meaning. The

deep structure can be transformed in a number of ways to yield many
different surface level representations. sentences with different structure
level representations having the same meaning, share a common deep
level representation.
the above sentences,
Pooja plays Veena.
Veena is played by Pooja.
have the same meaning, despite having different surface structures. Both
sentences are being generated from the same deep structure in which the
deep subject is Pooja and the deep object is the Veena.

Transformational grammar has three components:

1. Phrase structure grammar
2. Transformation rules
3. Morphophonemic rules
These rules match each sentence representation to a string of phonemes.
Each of these components consist of a set of rules.

Phrase Structure Grammar:

It consists of rules that generate natural language sentences and assign a
structural description to them.

Sentences that can be generated using these rules are termed

grammatical. The structure assigned by the grammar is a constituent
structure analysis of the sentence.
Transformational Grammar/Rules:
Transformational grammar is a set of transformational rules, which
transform one phrase maker ( underlying) into another phrase maker
( derived). These rules are applied on the terminal string generated by
phrase structure rules. Unlike phrase structure rules, Transformational
rules are heterogeneous and may have more than one symbol on there left
hand side. These rules are used to transform one surface representation
into another example, and an active sentence into a passive sentence.
the rule relating active and sentence is,

This rule says that an underline input having the structure

can be transform into

This transformation involves addition of Strings ’ be’ and ’en’ and certain
rearrangements of the constituents of a sentence.

Consider the active sentence “The police will catch the snatcher”
The passive transformation rules will convert the sentence into
The + culprit + will + be + en + catch + by + the + police

Morphophonemic rule:-
Morphophonemic rules do not express conceptual categories. Rather, they
simply specify the pronunciation ( the ” shapes”) of morphemes in context,
once a morphological rule has already been applied.It can also though
actors interface between phonology and morphology.
Another transformational rule will then reorder ’en + catch’ to ‘catch +
en’ and subsequently one morphological rule will convert ‘catch + en’ to
‘caught’.
Eg: The vowel changes in ” sleep” and “ slept”, “ bind” and “ bound”, “vain”
and “ vanity”.

Processing Indian Languages:-

There are a number of differences between Indian languages and English.
This introduces differences in their processing. Some of these differences
are listed here:
❖ Unlike English, Indian scripts have a non linear structure.
❖ Unlike English, Indian languages have SOV ( subject-Object- verb)
as the default sentence structure.
❖ Indian languages have a free word order i.e, words can be moved
freely within a sentence without changing the meaning of the
sentence.
❖ Spelling standardisation is more subtle in Hindi than in English.
❖ Indian languages have a relatively rich set of morphological variance.
❖ Indian languages make extensive and productive use of complex
predicates (CP’s)
❖ Indian languages use post-position case markers instead of
prepositions.
❖ The Indian language uses verb complexes consisting of sequences
of verbs. Ex:

Except for the direction in which it's script is written, Urdu is closely
related to Hindi. Both share similar phonology, morphology and
syntax. both are free-word-order languages and use post-positions.
They also share a large amount of their vocabulary. Differences in the
vocabulary arise mainly because a significant portion of Urdu
vocabulary comes from Persian and Arabic, while Hindi borrows
much of its vocabulary from Sanskrit.

NLP Applications:- (mentioned earlier)

Information Retrieval:-
Information Retrieval (IR) may be defined as a software program that
deals with the organisation, storage, retrieval and evaluation of
information from document repositories, particularly textual
information. The system assists in finding the information they
require but it does not explicitly return the answers of the questions. It
forms the existence and location of documents that might consist of
the required information. The documents that satisfy the user's
requirements are called relevant documents; a perfect IR system will
retrieve only relevant documents.

It is clear from the above diagram that a user who needs information
will have to formulate A request in the form of a query in natural
language. Then the IR system will respond by retrieving the relevant
output, in the form of documents, about the required information.

Major issues in Information retrieval:

The main issues of the Information Retrieval (IR) are, Document and
Query Indexing, Query Evaluation and System Evaluation.

1. Document and Query Indexing:

It is to find important meanings and creating and internal
representation. The factors to be considered our accuracy to
represent semantics, exhaustiveness and facility for a computer
to manipulate.

Indexing is the most vital part of any IR’s.It is a process in

which the documents required by the users are transformed
into searchable data structures.In the process, first the
document surrogates are created to represent each document.
Secondly, :it's requires analysis of original documents that
include simple (Author, title, subject etc) and complex data
(Linguistic analysis of content).
Indexes are the data structures that are used to make the
search faster.

2. Query Evolution:
In the retrieval model how can a document be represented with
the selected keywords and how are documents and query
representations compared to calculate a score. IR deals with
issues like uncertainty and vagueness in the information
system.
Uncertainty: The available representation does not typically
reflect true semantics of objects such as images, videos etc.
Vagueness: The information that the user requires lacks clarity
is only vaguely expressed in a query, feedback or user action.

3. System Evaluation: It tells about the importance of determining

the impact of information given on user achievement. Here, we
see if the efficiency of the particular system related to time and
space.
Evaluation: It is the process of systematically determining a
subjects merit, worth and significance by using certain criteria
that are governed by a set of standards.

NLP Terminology:
Phonology - it is study of organising sound systematically
Morphology - the study of the formation and internal structure of words
Morpheme - it is primitive unit of meaning in a language
Syntax - the study of the formation and internal structure of sentences
Semantics - the study of the meaning of sentences
Pragmatics - it deals with using and understanding sentences in difficult
situations and how the interpretation of the sentence is affected.
World Knowledge - it includes the general knowledge about the world
Discourse - it deals with how the immediately preceding sentence can
affect the interpretation of the next sentence.

DBT Developer Guide
No ratings yet
DBT Developer Guide
28 pages
Mass Transfer Robert Treybal Solution Manual 170711045335
25% (4)
Mass Transfer Robert Treybal Solution Manual 170711045335
4 pages
Study Lib
No ratings yet
Study Lib
36 pages
NLP Tutorial - Javatpoint
No ratings yet
NLP Tutorial - Javatpoint
20 pages
NLP Exam Notes
No ratings yet
NLP Exam Notes
15 pages
Natural Language Processing
No ratings yet
Natural Language Processing
30 pages
Natural Language Processing Inside Pages 2
No ratings yet
Natural Language Processing Inside Pages 2
159 pages
NLP Meterial 5 Units
No ratings yet
NLP Meterial 5 Units
151 pages
Gorenje Wa 512 Upute
0% (5)
Gorenje Wa 512 Upute
3 pages
Natural Language Processing With Python A Comprehensive Guide To NLP in The Age of AI For 2024 (Hayden Van Der Post) (Z-Library)
No ratings yet
Natural Language Processing With Python A Comprehensive Guide To NLP in The Age of AI For 2024 (Hayden Van Der Post) (Z-Library)
315 pages
NLP MODULE 1 Chapter1 &2
No ratings yet
NLP MODULE 1 Chapter1 &2
83 pages
Chapter 1
No ratings yet
Chapter 1
31 pages
Lecture 6
No ratings yet
Lecture 6
11 pages
NLP Module 1
No ratings yet
NLP Module 1
124 pages
Outbox Pattern With Hibernate PDF
No ratings yet
Outbox Pattern With Hibernate PDF
4 pages
AI Unit-5
No ratings yet
AI Unit-5
10 pages
CC S 339 NLP Basics &TSA
No ratings yet
CC S 339 NLP Basics &TSA
68 pages
Natural Language Processing
100% (1)
Natural Language Processing
6 pages
Section 5
33% (3)
Section 5
16 pages
NLP Unit I
No ratings yet
NLP Unit I
30 pages
Nlput-Unit1 Notes
No ratings yet
Nlput-Unit1 Notes
29 pages
Unit 1 and Unit 2 Good Notes
No ratings yet
Unit 1 and Unit 2 Good Notes
21 pages
Unit 1
No ratings yet
Unit 1
26 pages
What Is NLP?: Natural Language Processing in AI
No ratings yet
What Is NLP?: Natural Language Processing in AI
5 pages
AI Init-5
No ratings yet
AI Init-5
6 pages
AI Unit 5
No ratings yet
AI Unit 5
10 pages
4 - Aisc
No ratings yet
4 - Aisc
14 pages
CH 5 NLP
No ratings yet
CH 5 NLP
12 pages
NLP Notes
No ratings yet
NLP Notes
37 pages
What Is NLP?: Natural Language Processing Computer Science, Human Language, Artificial Intelligence
No ratings yet
What Is NLP?: Natural Language Processing Computer Science, Human Language, Artificial Intelligence
10 pages
What Is NLP?
No ratings yet
What Is NLP?
5 pages
Natural Language Processing
No ratings yet
Natural Language Processing
16 pages
NLP
No ratings yet
NLP
21 pages
Natural Language Processing Notes
No ratings yet
Natural Language Processing Notes
80 pages
AI 3rd Unit - Part 2 - Natural Language Processing
No ratings yet
AI 3rd Unit - Part 2 - Natural Language Processing
36 pages
Natural Language Processing State of The Art Curre
No ratings yet
Natural Language Processing State of The Art Curre
33 pages
sp-1 FINAL
No ratings yet
sp-1 FINAL
9 pages
Unit 1
No ratings yet
Unit 1
18 pages
Ai Unit4
No ratings yet
Ai Unit4
36 pages
Natural Language Procesing Notes-3-21
No ratings yet
Natural Language Procesing Notes-3-21
19 pages
NLP Unit 1 1
No ratings yet
NLP Unit 1 1
67 pages
Natural Language Processing
No ratings yet
Natural Language Processing
14 pages
NLP Unit 1 To 5
No ratings yet
NLP Unit 1 To 5
91 pages
SQL Server Security (Logins, Users - Fixed Roles)
No ratings yet
SQL Server Security (Logins, Users - Fixed Roles)
3 pages
NLP - Natural Language Processing and APPLICATION
No ratings yet
NLP - Natural Language Processing and APPLICATION
31 pages
Implementing Logical Volumes: On Linux-Based Dell Poweredge Servers
No ratings yet
Implementing Logical Volumes: On Linux-Based Dell Poweredge Servers
6 pages
Natural Language Processing
No ratings yet
Natural Language Processing
12 pages
Natural Language Processing
No ratings yet
Natural Language Processing
73 pages
SITA3012 NLP Unit 1
No ratings yet
SITA3012 NLP Unit 1
33 pages
NLP 833
No ratings yet
NLP 833
26 pages
Class 1 - NLP
No ratings yet
Class 1 - NLP
28 pages
Natural Language Processing
No ratings yet
Natural Language Processing
4 pages
NLP Lecture
No ratings yet
NLP Lecture
18 pages
NLP Presentation
No ratings yet
NLP Presentation
20 pages
Chapter 6.
No ratings yet
Chapter 6.
31 pages
Introduction To NLP: Prof: Vraj M Hingu Dept: Computer
No ratings yet
Introduction To NLP: Prof: Vraj M Hingu Dept: Computer
87 pages
Dream Medicine Educton
No ratings yet
Dream Medicine Educton
8 pages
1 Natural Language Processing-Intro
No ratings yet
1 Natural Language Processing-Intro
16 pages
Unit1 A
No ratings yet
Unit1 A
8 pages
CL Unit 1
No ratings yet
CL Unit 1
11 pages
Data Dictionary
No ratings yet
Data Dictionary
15 pages
Foundation For NLP
No ratings yet
Foundation For NLP
14 pages
Group 8 NLP
No ratings yet
Group 8 NLP
3 pages
Natural Language Processing
No ratings yet
Natural Language Processing
30 pages
Natural Language Processing
No ratings yet
Natural Language Processing
5 pages
TOPIC 4 Natural Language Processing
No ratings yet
TOPIC 4 Natural Language Processing
26 pages
PDF Pemeriksaan Klinis Pada Bayi Dan Anak Edisi 3 Bab 1 4 - Compress
No ratings yet
PDF Pemeriksaan Klinis Pada Bayi Dan Anak Edisi 3 Bab 1 4 - Compress
72 pages
ML Module A7707 - Part1
No ratings yet
ML Module A7707 - Part1
48 pages
Unit 1 NLP
No ratings yet
Unit 1 NLP
76 pages
Persistence Hibernate
No ratings yet
Persistence Hibernate
39 pages
Chapter 5 - Introduction To SQL: JJM/IT/IT-Portal/2011/DBS
No ratings yet
Chapter 5 - Introduction To SQL: JJM/IT/IT-Portal/2011/DBS
28 pages
Data Pump
No ratings yet
Data Pump
50 pages
Natural Language Processing - 1
No ratings yet
Natural Language Processing - 1
44 pages
What Is Natural Language Processing?
No ratings yet
What Is Natural Language Processing?
5 pages
1 s2.0 S1877050922020737 Main
No ratings yet
1 s2.0 S1877050922020737 Main
16 pages
Parking
No ratings yet
Parking
18 pages
How To Backup and Restore Your XenApp Database
No ratings yet
How To Backup and Restore Your XenApp Database
13 pages
Cloning A Database Using RMAN
No ratings yet
Cloning A Database Using RMAN
7 pages
02-02 The Benefits of SAN and NAS Storage PDF
No ratings yet
02-02 The Benefits of SAN and NAS Storage PDF
18 pages
Report ZF Delete Duplicat Bset Fica 2
No ratings yet
Report ZF Delete Duplicat Bset Fica 2
11 pages
Laravel E-Commerce CRUD
No ratings yet
Laravel E-Commerce CRUD
7 pages
Bai Tap SQL Server
No ratings yet
Bai Tap SQL Server
22 pages
DBMS Presentation
No ratings yet
DBMS Presentation
21 pages
Past Paper - Database Systems - March 2009
No ratings yet
Past Paper - Database Systems - March 2009
5 pages
Intelligent Systems Design and Applications 16th International Conference on Intelligent Systems Design and Applications ISDA 2016 held in Porto Systems and Computing 557 Band 557 Ana Maria Madureira (Editor) All Chapters Instant Download
100% (1)
Intelligent Systems Design and Applications 16th International Conference on Intelligent Systems Design and Applications ISDA 2016 held in Porto Systems and Computing 557 Band 557 Ana Maria Madureira (Editor) All Chapters Instant Download
55 pages
Practical Programs
No ratings yet
Practical Programs
9 pages
Snoflake DBT Fivetran Datasheet
No ratings yet
Snoflake DBT Fivetran Datasheet
2 pages
Fitness Zone Gym Management System The Software Provides in This Regard
No ratings yet
Fitness Zone Gym Management System The Software Provides in This Regard
1 page
Natural Language Processing with Python: Natural Language Processing Using NLTK
From Everand
Natural Language Processing with Python: Natural Language Processing Using NLTK
Frank Millstein
3.5/5 (4)
Exploring the Fascinating World of Natural Language Processing (NLP): Revolutionizing Communication and Empowering Machines through NLP Techniques and Applications
From Everand
Exploring the Fascinating World of Natural Language Processing (NLP): Revolutionizing Communication and Empowering Machines through NLP Techniques and Applications
daniel Huston
No ratings yet

NLP UNIT 1 Part 1

Uploaded by

NLP UNIT 1 Part 1

Uploaded by

NATURAL LANGUAGE PROCESSING

NLP stands for Natural Language Processing, which is a part of computer

(1940 - 1960) - Focused on Machine Translation (MT)

● The Natural Language Processing started in the 1940’s.

(1960 - 1980) - Flavoured with Artificial Intelligence (AI)

● Augmented Transition Networks (ATN): It is a finite state machine

1. NLP may not show context.

These are the two following components of NLP:

1. Natural Language Understanding (NLU):

● It is used to map the given input into useful representation.

2. Natural Language Generation (NLG):

NLG acts as a translator that converts the computerised data into

NOTE: The NLU is difficult than NLG

Difference between NLU and NLG:

The following applications of NLP:

2. Spam Detection: Sapm detection is used to detect unwanted user’s

3. Sentiment Analysis: It is also known as opinion mining. It is used on

8. Information Extraction: It is one of the most important applications of

How to bulid NLP pipeline:

Step1: Sentence Segmentation

Step 2: Word Tokenization

Step 5: Identifying Stop Words

Step 6: Dependency Parsing

Step 7: POS Tags

Step 8: Named Entity Recognition (NER)

Ex: “Open the door” is interpreted as a request instead of an order.

1.Lexical and Morphological Analysis

1. Tokenization: Splits text into smaller units such as sentences or words.

2.Syntactic Analysis (Parsing)

1. Parsing: Analyzes the grammatical structure of sentences to ensure they follow

1. Discourse Integration: Analyzes how sentences relate to one another in context to

Why is NLP difficult?

2. Syntactic Ambiguity:It exists in the presence of two or more possible

3. Referential Ambiguity: It exists when you are referring to something

How to implement NLP?

Origins and challenges of NLP:-

For humans, learning in early childhood occurs in a consistent way;

Differentiate between the rationalist and empiricist approach form NLP?

Breaking the sentence:

Solution: Tagging the parts of speech (pos) & generating dependency

Building the appropriate vocabulary:

3. Linking different components of vocabulary

4. Setting the Context:

In the above example, “enjoy working in a bank” suggests “work” or “job”

5. Extracting Semantic Meaning:

7. Use Case: Transforming unstructured data into Structured Format

Language and Grammar:-

The surface representation of deeper structure represents its meaning. The

Transformational grammar has three components:

Phrase Structure Grammar:

Sentences that can be generated using these rules are termed

This rule says that an underline input having the structure

can be transform into

Processing Indian Languages:-

NLP Applications:- (mentioned earlier)

Major issues in Information retrieval:

1. Document and Query Indexing:

Indexing is the most vital part of any IR’s.It is a process in

3. System Evaluation: It tells about the importance of determining

You might also like