0% found this document useful (0 votes)

22 views67 pages

NLP Unit 1 1

Natural Language Processing (NLP) is a field within computer science and artificial intelligence that enables machines to understand and manipulate human language. The document outlines the history, advantages, disadvantages, applications, and challenges of NLP, along with its phases and morphological processing. Key issues include language differences, training data quality, ambiguity, and the need for context in understanding language.

Uploaded by

20-EI-032 N.Sruthi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views67 pages

NLP Unit 1 1

Uploaded by

20-EI-032 N.Sruthi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 67

NLP - UNIT 1

WHAT IS NLP
NLP stands for Natural Language Processing, which is a part of Computer
Science, Human language, and Artiﬁcial Intelligence.

It is the technology that is used by machines to understand, analyse,

manipulate, and interpret human's languages.

It helps developers to organize knowledge for performing tasks such as

translation, automatic summarization, Named Entity Recognition (NER), speech
recognition, relationship extraction, and topic segmentation.
HISTORY OF NLP
(1940-1960) - Focused on Machine Translation (MT)

The Natural Languages Processing started in the year 1940s.

1948 - In the Year 1948, the ﬁrst recognisable NLP application was introduced in Birkbeck College,
London.

1950s - In the Year 1950s, there was a conﬂicting view between linguistics and computer science.
Now, Chomsky developed his ﬁrst book syntactic structures and claimed that language is
generative in nature.

I957 -In 1957, Chomsky also introduced the idea of Generative Grammar, which is rule based
descriptions of syntactic structures.
HISTORY OF NLP
(1960-1980) - Flavored with Artiﬁcial Intelligence (AI)

In the year 1960 to 1980, the key developments were:

Augmented Transition Networks (ATN)

Augmented Transition Networks is a ﬁnite state machine that is capable of recognizing regular
languages.

Case Grammar

Case Grammar was developed by Linguist Charles J. Fillmore in the year 1968.

Case Grammar uses languages such as English to express the relationship between nouns and
verbs by using the preposition.

In Case Grammar, case roles can be deﬁned to link certain kinds of verbs and objects.

For example: "Neha broke the mirror with the hammer". In this example case grammar identify Neha
as an agent, mirror as a theme, and hammer as an instrument.
HISTORY OF NLP
In the year 1960 to 1980, key systems were:
SHRDLU
SHRDLU is a program written by Terry Winograd in 1968-70. It helps users to communicate
with the computer and moving objects.
LUNAR
LUNAR is the classic example of a Natural Language database interface system that is used
ATNs and Woods' Procedural Semantics.
It was capable of translating elaborate natural language expressions into database queries
and handle 78% of requests without errors.
HISTORY OF NLP
1980 - Current

Till the year 1980, natural language processing systems were based on complex sets of
hand-written rules. After 1980, NLP introduced machine learning algorithms for language
processing.

In the beginning of the year 1990s, NLP started growing faster and achieved good process accuracy,
especially in English Grammar. In 1990 also, an electronic text introduced, which provided a good
resource for training and examining natural language programs.

modern NLP consists of various applications, like speech recognition, machine translation, and
machine text reading. When we combine all these applications then it allows the artificial
intelligence to gain knowledge of the world. Let's consider the example of AMAZON ALEXA, using
this robot you can ask the question to Alexa, and it will reply to you.
ADVANTAGES OF NLP
○ NLP helps us to analyse data from both structured and unstructured sources.
○ NLP is very fast and time efficient.
○ NLP offers end-to-end exact answers to the question. So, It saves time that going to
consume unnecessary and unwanted information.
○ NLP offers users to ask questions about any subject and give a direct response within
milliseconds.
○ Most of the companies use NLP to improve the efficiency of documentation processes,
accuracy of documentation, and identify the information from large databases.

○
DISADVANTAGES OF NLP
○ For the training of the NLP model, A lot of data and computation are required.
○ Many issues arise for NLP when dealing with informal expressions, idioms, and cultural jargon.
○ NLP results are sometimes not to be accurate, and accuracy is directly proportional to the accuracy of
data.
○ NLP is designed for a single, narrow job since it cannot adapt to new domains and has a limited function.
○ NLP is unpredictable
○ NLP may require more keystrokes.
○ NLP is unable to adapt to the new domain, and it has a limited function that's why NLP is built for a single
and speciﬁc task only.
IDIOMS

A particular type of idiom, called a phrasal verb, consists of a verb followed by an adverb or
preposition (or sometimes both);

Ex: make over, make out, and make up, for instance,

notice how the meanings have nothing to do with the usual meanings of over, out, and up.

JARGON

Jargon is occupation-speciﬁc language used by people in a given profession, the “shorthand”

that people in the same profession use to communicate with each other.

For example, plumbers might use terms such as elbow, ABS, sweating the pipes, reducer,
flapper, snake, and rough-in.
APPLICATIONS OF NLP
● Text and speech processing like-Voice assistants – Alexa, Siri, etc.
● Text classification like Grammarly, Microsoft Word, and Google Docs
● Information extraction like-Search engines like DuckDuckGo, Google
● Chatbot and Question Answering like:- website bots
● Language Translation like:- Google Translate
● Text summarization
Why NLP is difficult?
NLP is difficult because Ambiguity and Uncertainty exist in the language.
Ambiguity
There are the following three ambiguity -

○ Lexical Ambiguity
○ Syntactic Ambiguity
○ Referential Ambiguity
○
○
○ Lexical Ambiguity

Lexical Ambiguity exists in the presence of two or more possible

meanings of the sentence within a single word.

Example:

Manya is looking for a match.

In the above example, the word match refers to that either Manya is
looking for a partner or Manya is looking for a match. (Cricket or other
match)
○ Syntactic Ambiguity

Syntactic Ambiguity exists in the presence of two or more possible

meanings within the sentence.

Example:

I saw the girl with the binocular.

In the above example, did I have the binoculars? Or did the girl have the
binoculars?
○ Referential Ambiguity

Referential Ambiguity exists when you are referring to something using

the pronoun.

Example: Kavya went to Sunita. She said, "I am hungry."

In the above sentence, you do not know that who is hungry, either Kavya
or Sunita.
Phases of Natural Language Processing
1. Lexical Analysis and Morphological
The ﬁrst phase of NLP is the Lexical Analysis. This
phase scans the source code as a stream of
characters and converts it into meaningful lexemes.
It divides the whole text into paragraphs, sentences,
and words.
2. Syntactic Analysis (Parsing)
Syntactic Analysis is used to check grammar, word
arrangements, and shows the relationship among the
words.
Example: Agra goes to the Poonam
In the real world, Agra goes to the Poonam, does not make any sense, so
this sentence is rejected by the Syntactic analyzer.
3. Semantic Analysis
Semantic analysis is concerned with the meaning
representation. It mainly focuses on the literal
meaning of words, phrases, and sentences.
The semantic analyzer disregards sentence such as
“hot ice-cream”.
4. Discourse Integration
Discourse Integration depends upon the sentences
that proceeds it and also invokes the meaning of the
sentences that follow it.
5. Pragmatic Analysis
Pragmatic is the ﬁfth and last phase of NLP. It helps
you to discover the intended effect by applying a set
of rules that characterize cooperative dialogues.
For Example: "Open the door" is interpreted as a
request instead of an order.
WORDS AND THEIR COMPONENTS

1.TOKENS

2.LEXEMES

3.MORPHEMES

4.TYPOLOGY
TOKENS

Tokens are typically words or sub-words in the context of natural language

processing.
Tokenization is the process of dividing a text into smaller units known as
tokens.
Tokenization in natural language processing (NLP) is a technique that
involves dividing a sentence or phrase into smaller units known as tokens.
These tokens can encompass words, dates, punctuation marks, or even
fragments of words.
TYPES OF TOKENIZATION

Tokenization can be classiﬁed into several types based on how the text is segmented. Here
are some types of tokenization:

Word Tokenization:

Word tokenization divides the text into individual words. Many NLP tasks use this
approach, in which words are treated as the basic units of meaning.
Example:
Input: "Tokenization is an important NLP task."
Output: ["Tokenization", "is", "an", "important", "NLP", "task",
"."]
Sentence Tokenization:

The text is segmented into sentences during sentence tokenization.

This is useful for tasks requiring individual sentence analysis or
processing.
Example:
Input: "Tokenization is an important NLP task. It helps break
down text into smaller units."
Output: ["Tokenization is an important NLP task.", "It helps
break down text into smaller units."]
Subword Tokenization:

Subword tokenization entails breaking down words into smaller units,

which can be especially useful when dealing with morphologically rich
languages or rare words.
Example:
Input: "tokenization"
Output: ["token", "ization"]
Character Tokenization:

This process divides the text into individual characters. This can be
useful for modelling character-level language.
Example:
Input: "Tokenization"
Output: ["T", "o", "k", "e", "n", "i", "z", "a",
"t", "i", "o", "n"]
Implementation for Tokenization using Python3
Sentence Tokenization using sent_tokenize

from nltk.tokenize import sent_tokenize

text = "Hello everyone. Welcome to GeeksforGeeks. You are studying NLP article."
sent_tokenize(text)

Output:
['Hello everyone.',

'Welcome to GeeksforGeeks.',

'You are studying NLP article']

Word Tokenization using work_tokenize

from nltk.tokenize import word_tokenize

text = "Hello everyone. Welcome to GeeksforGeeks."

word_tokenize(text)

Output:
['Hello', 'everyone', '.', 'Welcome', 'to', 'GeeksforGeeks', '.']
Issues and Challenges
● Irregularity: word forms are not described by a prototypical
linguistic model.

● Ambiguity: word forms be understood in multiple ways out of

the context of their discourse.

● Productivity: is the inventory of words in a language ﬁnite, or is

it unlimited?
The 10 Biggest Issues for NLP

1. Language differences

In the United States, most people speak English, but if you’re

thinking of reaching an international and/or multicultural audience,
you’ll need to provide support for multiple languages.
2. Training data

● The best AI must also spend a significant amount of time

reading, listening to, and utilizing a language.

● The abilities of an NLP system depend on the training data

provided to it. If you feed the system bad or questionable data,
it’s going to learn the wrong things, or learn in an inefficient
way.
4. Phrasing ambiguities

● Sometimes it’s hard even for another human being to parse out what
someone means when they say something ambiguous. There may not
be a clear concise meaning to be found in a strict analysis of their
words. In order to resolve this, an NLP system must be able to seek
context to help it understand the phrasing. It may also need to ask the
user for clarity.
5. Misspellings

Misspellings are a simple problem for human beings. We can easily

associate a misspelled word with its properly spelled counterpart, and
seamlessly understand the rest of the sentence in which it’s used. But
for a machine, misspellings can be harder to identify. You’ll need to
use an NLP tool with capabilities to recognize common misspellings
of words, and move beyond them.
6. Innate biases

In some cases, NLP tools can carry the biases of their programmers,
as well as biases within the data sets used to train them. Depending
on the application, an NLP could exploit and/or reinforce certain
societal biases, or may provide a better experience to certain types
of users over others. It’s challenging to make a system that works
equally well in all situations, with all people.
7. Words with multiple meanings

No language is perfect, and most languages have words that

have multiple meanings. For example, a user who asks, “how
are you” has a totally different goal than a user who asks
something like “how do I add a new credit card?” Good NLP
tools should be able to differentiate between these phrases with
the help of context.
8. Phrases with multiple intentions

Some phrases and questions actually have multiple intentions, so

your NLP system can’t oversimplify the situation by interpreting only
one of those intentions.

For example, a user may prompt your chatbot with something like, “I
need to cancel my previous order and update my card on file.” Your
AI needs to be able to distinguish these intentions separately.
9. False positives and uncertainty

A false positive occurs when an NLP notices a phrase that should be

understandable and/or addressable, but cannot be sufficiently answered.

The solution here is to develop an NLP system that can recognize its
own limitations, and use questions or prompts to clear up the ambiguity.
10. Keeping a conversation moving

Many modern NLP applications are built on dialogue between a

human and a machine.

Accordingly, your NLP AI needs to be able to keep the conversation

moving, providing additional questions to collect more information and
always pointing toward a solution.
Morphology
∙ Morphology is the domain of linguistics that
analyses the internal structure of words.
∙ Morphological analysis – exploring the structure of words
∙ Words are built up of minimal meaningful elements called morphemes:
played = play-ed
cats = cat-s
unfriendly = un-friend-ly
∙ Two types of morphemes:
i Stems: play, cat, friend
∙ ii Affixes: -ed, -s, un-, -ly
∙ Two main types of affixes:
i Prefixes precede the stem: un-
ii Suffixes follow the stem: -ed, -s, -ly
∙ Stemming = find the stem by stripping off
affixes play = play
replayed = re-play-ed
computerized = comput-er-ize-d
Problems in morphological processing
∙ Inflectional morphology: inflected forms are constructed frombase forms and
inflectional affixes.
∙ Inflection relates different forms of the same word

Lemma Singular Plural

cat cat cats
dog dog dogs
knife knife knives
sheep sheep sheep
mouse mouse mice
∙ Derivational morphology: words are constructed fromroots (or stems)
and derivational
affixes:
inter+national = international
international+ize = internationalize
internationalize+ation =
internationalization
Morphological Models

1.Dictionary Lookup

2.Finite-State Morphology
3.Unification-Based Morphology
4.Functional Morphology
Dictionary Lookup
∙ Morphological parsing is a process by which word forms of a language
are associated with
corresponding linguistic descriptions.
∙ Morphological systems that specify these associations by merely
enumerating(is the act or process of making or stating a list of
things one after another) them case by case do not offer any
generalization means.
∙ These approaches do not allow development of reusable morphological
rules.
Finite-State Morphology
∙ By finite-state morphological models, we mean those in which the specifications
written by human programmers are directly compiled into finite-state
transducers.
∙ The two most popular tools supporting this approach, XFST (Xerox
Finite-State Tool) and LexTools.
∙ Finite-state transducers are computational devices extending the
powerof finite-state
Automata.
∙ A theoretical limitation of ﬁnite-state models of morphology is the problem of
capturing reduplication of words or their elements (e.g., to express plurality)
found in several human languages.
Input Input Morphological parsed output
Cats cat +N +PL
Cat cat +N +SG
Cities city +N +PL
Geese goose +N +PL
Goose goose +N +SG) or (goose +V)
Gooses goose +V +3SG
mergin merge +V +PRES-PART
g
Caught (caught +V +PAST-PART) or (catch +V +PAST)
Unification-Based Morphology

∙ The concepts and methods of these formalisms are often closely connected to those of logic
programming.

∙ In ﬁnite-state morphological models, both surface and lexical forms are by themselves
unstructured strings of atomic symbols.

∙ In higher-level approaches, linguistic information is expressed by more appropriate data

structures that can include complex values or can be recursively nested if needed.

∙ Advantages of this approach include better abstraction possibilities for developing a

morphological grammar as well as elimination of redundant information from it.

∙ Uniﬁcation-based models have been implemented for Russian, Czech,

Slovene
Functional Morphology
∙ Functional morphology deﬁnes its models using principles of functional
programming and type theory.

∙ It treats morphological operations and processes as pure mathematical

functions and organizes the linguistic as well as abstract elements of a
model into distinct types of values and type classes.

∙ Functional morphology implementations are intended to be reused as

programming libraries capable of handling the complete morphology of
a language and to be incorporated into various kinds of applications.
1.b.Finding structure of Documents

● In human language, words and sentences do not appear randomly

but have structure.
● Automatic extraction of structure of documents helps subsequent
NLP tasks.
● for example, parsing, machine translation, and semantic role labelling
use sentences as the basic processing unit.
` Finding structure of Documents

There are several approaches to ﬁnding the structure of documents in

NLP, including:

1. Rule-based methods

2. Machine learning methods

3. Hybrid methods
` Finding structure of Documents
Some of the speciﬁc techniques and tools used in ﬁnding the structure
of documents in NLP include:

1. Named entity recognition

2. Part-of-speech tagging

3. Dependency parsing

4. Topic modeling
` Finding structure of Documents
1. Named entity recognition: This technique identiﬁes and extracts speciﬁc entities,
such as people, places, and organizations, from the document, which can help in
identifying the different sections and topics.

2. Part-of-speech tagging: This technique assigns a part-of-speech tag to each word

in the document, which can help in identifying the syntactic and semantic structure of
the text.

3. Dependency parsing: This technique analyzes the relationships between the

words in a sentence, and can be used to identify the different clauses and phrases in
the text.

4. Topic modeling: This technique uses unsupervised learning algorithms to identify

the different topics and themes in the document, which can be used to organize the
content into different sections.
` Finding structure of Documents
1.Sentence Boundary Detection
Sentence boundary detection is a subtask of ﬁnding the structure of documents
inNLP that involves identifying the boundaries between sentences in a document.
This is an important task, as it is a fundamental step in many NLP
applications, such as machine translation, text summarization, and
information retrieval.

2.Topic Boundary Detection

Topic boundary detection is another important subtask of finding the structure of
documents in NLP. It involves identifying the points in a document where the topic
or theme of the text shifts. This task is particularly useful for organizing and
summarizing large amounts of text, as it allows for the identification of different
topics or subtopics within a document.
1.Sentence Boundary Detection
Some of the specific techniques and tools used in sentence boundary
detection include:

1. Regular expressions: These are patterns that can be used to match speciﬁc
character sequences in a text, such as periods followed by whitespace
characters, and can be used to identify the end of a sentence.

2. Hidden Markov Models(HMM): These are statistical models that can be

used to identify the most likely sequence of sentence boundaries in a text,
based on the probabilities of different sentence boundary markers.

3. Deep learning models: These are neural network models that can learn
complex patterns and features of sentence boundaries from a large corpus of
text, and can be used to achieve state-of-the-art performance in sentence
boundary detection.
2.Topic Boundary Detection
1. Lexical cohesion: This method looks at the patterns of words and phrases that appear
in a text, and identiﬁes changes in the frequency or distribution of these patterns as
potential topic boundaries. For example, if the frequency of a particular keyword or
phrase drops off sharply after a certain point in the text, this could indicate a shift in
topic.

2. Discourse markers: This method looks at the use of discourse markers, such as
"however", "in contrast", and "furthermore", which are often used to signal a change in
topic or subtopic. By identifying these markers in a text, it is possible to locate potential
topic boundaries.

3. Machine learning: This method involves training a machine learning model to identify
patterns and features in a text that are associated with topic boundaries. This can
involve using a variety of linguistic and contextual features, such as sentence length,
word frequency, and part-of-speech tags, to identify potential topic boundaries.
2.Topic Boundary Detection
Some of the speciﬁc techniques and tools used in topic boundary detection include:

1. Latent Dirichlet Allocation (LDA): This is a probabilistic topic modeling technique that
can be used to identify topics within a corpus of text. By analyzing the distribution of
words within a text, LDA can identify the most likely topics and subtopics within the text,
and can be used to locate topic boundaries.

2. TextTiling: This is a technique that involves breaking a text into smaller segments, or
"tiles", based on the frequency and distribution of key words and phrases. By comparing
the tiles to each other, it is possible to identify shifts in topic or subtopic, and locate
potential topic boundaries.

3. Coh-Metrix: This is a text analysis tool that uses a range of linguistic and
discourse-based features to identify different aspects of text complexity, including topic
boundaries. By analyzing the patterns of words, syntax, and discourse in a text,
Coh-Metrix can identify potential topic boundaries, as well as provide insights into the
overall structure and organization of the text.
2.Methods used in NLP

There are several methods and techniques used in NLP to ﬁnd the structure of
documents, which include:

1. Sentence boundary detection

2. Part-of-speech tagging

3. Named entity recognition

4. Coreference resolution

5. Topic boundary detection

6. Parsing

7. Sentiment analysis
2.1 Generative Sequence Classification Methods
Generative sequence classification methods are a type of NLP method used to find
the structure of documents. These methods involve using probabilistic models to
classify sequences of words into predefined categories or labels.

One popular generative sequence classiﬁcation method is Hidden Markov

Models(HMMs).

HMMs are statistical models that can be used to classify sequences of words by
modeling the probability distribution of the observed words given a set of hidden
states.

The hidden states in an HMM can represent different linguistic features, such as
part-of-speech tags or named entities, and the model can be trained using labeled
data to learn the most likely sequence of hidden states for a given sequence of
words.
2.1 Generative Sequence Classification Methods

Another type of generative sequence classiﬁcation method is Conditional

Random Fields (CRFs).

CRFs are similar to HMMs in that they model the conditional probability of a
sequence of labels given a sequence of words, but they are more flexible in
that they can take into account more complex features and dependencies
between labels.
2.2 Discriminative Sequence Classification Methods:
Discriminative local classification methods are another type of NLP method used to
find the structure of documents. These methods involve training a model to classify
each individual word or token in a document based on its features and the context in
which it appears.

One popular example of a discriminative local classiﬁcation method is Conditional

Random Fields (CRFs).

CRFs are a type of generative model that can also be used as a discriminative model,
as they can model the conditional probability of a sequence of labels given a sequence
of features, without making assumptions about the underlying distribution of the data.

CRFs have been used for tasks such as named entity recognition, part-of-speech
tagging, and chunking.
2.2 Discriminative Sequence Classification Methods:

Another example of a discriminative local classiﬁcation method is Maximum

Entropy Markov Models (MEMMs), which are similar to CRFs but use
maximum entropy modeling to make predictions about the next label in a
sequence given the current label and features.

MEMMs have been used for tasks such as speech recognition, named entity
recognition, and machine translation.

Other discriminative local classiﬁcation methods include support vector

machines (SVMs), decision trees, and neural networks. These methods
have also been used for tasks such as sentiment analysis, topic
classiﬁcation, and document categorization.
2.3 Hybrid Approaches:

Hybrid approaches to ﬁnding the structure of documents in NLP

combine multiple methods to achieve better results than any one
method alone.

For example, a hybrid approach might combine generative and

discriminative models, or combine different types of models with
different types of features.
3.Complexity of the Approaches:

Finding the structure of documents in natural language processing (NLP) can be a complex task, and
there are several approaches with varying degrees of complexity.

Here are a few examples:

1. Rule-based approaches: These approaches use a set of predeﬁned rules to identify the structure of
a document. For instance, they might identify headings based on font size and style or look for bullet
points or numbered lists. While these approaches can be effective in some cases, they are often
limited in their ability to handle complex or ambiguous structures.

2. Statistical approaches: These approaches use machine learning algorithms to identify the structure
of a document based on patterns in the data. For instance, they might use a classiﬁer to predict whether
a given sentence is a heading or a body paragraph. These approaches can be quite effective, but
they require large amounts of labeled data to train the model.

3. Deep learning approaches: These approaches use deep neural networks to learn the structure of a
document. For instance, they might use a hierarchical attention network to identify headings and
subheadings, or a sequence-to-sequence model to summarize the document. These approaches can
be very powerful, but they require even larger amounts of labeled data and signiﬁcant
computational resources to train.
4.Performances of the Approaches:

The performance of different approaches for ﬁnding the structure of documents in natural
language processing (NLP) can vary depending on the speciﬁc task and the complexity of
the document. Here are some general trends:

1. Rule-based approaches: These approaches can be effective when the document

structure is relatively simple and the rules are well-deﬁned. However, they can struggle with
more complex or ambiguous structures, and require a lot of manual effort to deﬁne the
rules.

2. Statistical approaches: These approaches can be quite effective when there is a large
amount of labeled data available for training, and the document structure is relatively
consistent across examples. However, they may struggle with identifying new or unusual
structures that are not well-represented in the training data.

3. Deep learning approaches: These approaches can be very effective in identifying

complex and ambiguous document structures, and can even discover new structures that
were not present in the training data. However, they require large amounts of labeled data
and signiﬁcant computational resources to train, and can be diﬃcult to interpret.

NLP Meterial 5 Units
No ratings yet
NLP Meterial 5 Units
151 pages
NLP Notes
No ratings yet
NLP Notes
37 pages
NLP Merged
100% (1)
NLP Merged
975 pages
NLP Unit 1 To 5
No ratings yet
NLP Unit 1 To 5
91 pages
Skillful 2ed 4 Listening & Speaking Student - S Book
No ratings yet
Skillful 2ed 4 Listening & Speaking Student - S Book
192 pages
Natural Language Processing Inside Pages 2
No ratings yet
Natural Language Processing Inside Pages 2
159 pages
NLP Notes
No ratings yet
NLP Notes
18 pages
1 NLP (Introduction)
No ratings yet
1 NLP (Introduction)
60 pages
NLP Module 1
No ratings yet
NLP Module 1
124 pages
NLP - Natural Language Processing and APPLICATION
No ratings yet
NLP - Natural Language Processing and APPLICATION
31 pages
NLP Notes2
No ratings yet
NLP Notes2
27 pages
TOPIC 4 Natural Language Processing
No ratings yet
TOPIC 4 Natural Language Processing
26 pages
NLP
No ratings yet
NLP
21 pages
NLP MODULE 1 Chapter1 &2
No ratings yet
NLP MODULE 1 Chapter1 &2
83 pages
NLP Natural Language Processing Notes
No ratings yet
NLP Natural Language Processing Notes
76 pages
Natural Language Processing With Python A Comprehensive Guide To NLP in The Age of AI For 2024 (Hayden Van Der Post) (Z-Library)
No ratings yet
Natural Language Processing With Python A Comprehensive Guide To NLP in The Age of AI For 2024 (Hayden Van Der Post) (Z-Library)
315 pages
NLP Notes
No ratings yet
NLP Notes
73 pages
NLP Chap1
No ratings yet
NLP Chap1
50 pages
GSv4-U2-LP-Unit02 Lesson Plans
50% (2)
GSv4-U2-LP-Unit02 Lesson Plans
66 pages
تعلم ML4
No ratings yet
تعلم ML4
42 pages
Ai Unit4
No ratings yet
Ai Unit4
36 pages
Natural Language Processing
No ratings yet
Natural Language Processing
30 pages
UNIT - 03 (All Topics)
No ratings yet
UNIT - 03 (All Topics)
54 pages
Unit 1
No ratings yet
Unit 1
18 pages
NLP Presentation
No ratings yet
NLP Presentation
19 pages
ENTREPRENUERSHIP SYLLABUS 2024 Implementation
100% (1)
ENTREPRENUERSHIP SYLLABUS 2024 Implementation
53 pages
Natural Language Processing
100% (1)
Natural Language Processing
6 pages
Rephrasing B2
100% (1)
Rephrasing B2
24 pages
Introduction To NLP: Prof: Vraj M Hingu Dept: Computer
No ratings yet
Introduction To NLP: Prof: Vraj M Hingu Dept: Computer
87 pages
Course Code HUM1012 Logic and Language Structure BL202425040 0921 D21+D22
No ratings yet
Course Code HUM1012 Logic and Language Structure BL202425040 0921 D21+D22
55 pages
Natural Language Processing
No ratings yet
Natural Language Processing
73 pages
NLP UNIT 1 Part 1
No ratings yet
NLP UNIT 1 Part 1
24 pages
Unit 1 and Unit 2 Good Notes
No ratings yet
Unit 1 and Unit 2 Good Notes
21 pages
1 - Introducntion To NLP
No ratings yet
1 - Introducntion To NLP
43 pages
NLP Unit 1
No ratings yet
NLP Unit 1
15 pages
Natural Language Processing: By-Himani (ROLL NO. 43)
No ratings yet
Natural Language Processing: By-Himani (ROLL NO. 43)
19 pages
Unit V
No ratings yet
Unit V
16 pages
Natural Language Procesing Notes-3-21
No ratings yet
Natural Language Procesing Notes-3-21
19 pages
Course Title: Introduction To Morphology Level: BS 3 Course Code: ELL203 Course Description
100% (1)
Course Title: Introduction To Morphology Level: BS 3 Course Code: ELL203 Course Description
2 pages
NLP Introduction Overview
No ratings yet
NLP Introduction Overview
34 pages
Natural Language Processing
No ratings yet
Natural Language Processing
5 pages
NLP Lecture
No ratings yet
NLP Lecture
18 pages
SITA3012 NLP Unit 1
No ratings yet
SITA3012 NLP Unit 1
33 pages
Natural Language Processing
No ratings yet
Natural Language Processing
14 pages
Natural Language Processing
No ratings yet
Natural Language Processing
14 pages
What Is NLP?: Natural Language Processing in AI
No ratings yet
What Is NLP?: Natural Language Processing in AI
5 pages
Natural Language Processing
No ratings yet
Natural Language Processing
4 pages
What Is NLP?: Natural Language Processing Computer Science, Human Language, Artificial Intelligence
No ratings yet
What Is NLP?: Natural Language Processing Computer Science, Human Language, Artificial Intelligence
10 pages
Natural Language Processing
No ratings yet
Natural Language Processing
24 pages
ORAL COM 11 Quarter 1 Module 7
85% (52)
ORAL COM 11 Quarter 1 Module 7
33 pages
2 Introduction
No ratings yet
2 Introduction
15 pages
ML Module A7707 - Part1
No ratings yet
ML Module A7707 - Part1
48 pages
Natural Language Processing
No ratings yet
Natural Language Processing
30 pages
Foundation For NLP
No ratings yet
Foundation For NLP
14 pages
Class 1 - NLP
No ratings yet
Class 1 - NLP
28 pages
Natural Language Processing 101
No ratings yet
Natural Language Processing 101
26 pages
Seminar Report
No ratings yet
Seminar Report
12 pages
Translation As Discovery Summary
No ratings yet
Translation As Discovery Summary
3 pages
1 Natural Language Processing-Intro
No ratings yet
1 Natural Language Processing-Intro
16 pages
NLP Presentation
No ratings yet
NLP Presentation
19 pages
CBSE-XI English - Chap-G1 (Complete Grammar)
No ratings yet
CBSE-XI English - Chap-G1 (Complete Grammar)
11 pages
Exocentric Compounds in Akan
No ratings yet
Exocentric Compounds in Akan
40 pages
Exploring the Fascinating World of Natural Language Processing (NLP): Revolutionizing Communication and Empowering Machines through NLP Techniques and Applications
From Everand
Exploring the Fascinating World of Natural Language Processing (NLP): Revolutionizing Communication and Empowering Machines through NLP Techniques and Applications
daniel Huston
No ratings yet
Unit 1 Extra
No ratings yet
Unit 1 Extra
6 pages
Natural Language Processing (NPL) : Group Name: Goal Diggers
No ratings yet
Natural Language Processing (NPL) : Group Name: Goal Diggers
22 pages
Extra Grammar Exercises (Unit 1, Page 6) : Top Notch 3, Third Edition
No ratings yet
Extra Grammar Exercises (Unit 1, Page 6) : Top Notch 3, Third Edition
2 pages
C. Articles
No ratings yet
C. Articles
8 pages
Artificial Intelligence: Natural Language Processing
No ratings yet
Artificial Intelligence: Natural Language Processing
41 pages
Sometimes
No ratings yet
Sometimes
13 pages
Super Teacher Worksheets LOOKeyt Me
No ratings yet
Super Teacher Worksheets LOOKeyt Me
3 pages
English FAL P3 Grade 11 Nov 2017 Memo PDF
0% (1)
English FAL P3 Grade 11 Nov 2017 Memo PDF
11 pages
Intervensi Uasa
No ratings yet
Intervensi Uasa
1 page
Phonic - Letter Z - Period 1 - VI
No ratings yet
Phonic - Letter Z - Period 1 - VI
5 pages
Natural Language Processing: Bachelor of Technology Computer Science and Engineering
No ratings yet
Natural Language Processing: Bachelor of Technology Computer Science and Engineering
7 pages
Natural Language Processing
No ratings yet
Natural Language Processing
5 pages
UTS Bahasa Inggris Hukum 2023
No ratings yet
UTS Bahasa Inggris Hukum 2023
31 pages
Apuntes Week 1
No ratings yet
Apuntes Week 1
8 pages
12th English Internal Assessment Test 1 Question Paper PDF Download
No ratings yet
12th English Internal Assessment Test 1 Question Paper PDF Download
4 pages
Social Media Sentiment Analysis Document
No ratings yet
Social Media Sentiment Analysis Document
6 pages
Table Topics Forms
No ratings yet
Table Topics Forms
6 pages
Practical Research
No ratings yet
Practical Research
11 pages
CB 5 Module Test 1
No ratings yet
CB 5 Module Test 1
2 pages
Read-Aloud Strategies Newsletter
No ratings yet
Read-Aloud Strategies Newsletter
5 pages
Natural Language Processing
From Everand
Natural Language Processing
Ajit Singh
No ratings yet
2 Part
No ratings yet
2 Part
2 pages
Noun Clause: M.Khresna Dirgantara A1B019034 4B
No ratings yet
Noun Clause: M.Khresna Dirgantara A1B019034 4B
11 pages
Elementary GW 05a-1
No ratings yet
Elementary GW 05a-1
2 pages
LP - Move Up 1 - U5 - Review - Period 1
No ratings yet
LP - Move Up 1 - U5 - Review - Period 1
3 pages
7 C's of Effective Communication
No ratings yet
7 C's of Effective Communication
2 pages
Gr5 Final Exam Schedule and SL T3-DOHA - 866352
No ratings yet
Gr5 Final Exam Schedule and SL T3-DOHA - 866352
4 pages

NLP Unit 1 1

Uploaded by

NLP Unit 1 1

Uploaded by

NLP - UNIT 1

It is the technology that is used by machines to understand, analyse,

It helps developers to organize knowledge for performing tasks such as

The Natural Languages Processing started in the year 1940s.

In the year 1960 to 1980, the key developments were:

Augmented Transition Networks (ATN)

Jargon is occupation-speciﬁc language used by people in a given profession, the “shorthand”

Lexical Ambiguity exists in the presence of two or more possible

Manya is looking for a match.

Syntactic Ambiguity exists in the presence of two or more possible

I saw the girl with the binocular.

Referential Ambiguity exists when you are referring to something using

Example: Kavya went to Sunita. She said, "I am hungry."

Tokens are typically words or sub-words in the context of natural language

The text is segmented into sentences during sentence tokenization.

Subword tokenization entails breaking down words into smaller units,

from nltk.tokenize import sent_tokenize

'You are studying NLP article']

from nltk.tokenize import word_tokenize

text = "Hello everyone. Welcome to GeeksforGeeks."

● Ambiguity: word forms be understood in multiple ways out of

● Productivity: is the inventory of words in a language ﬁnite, or is

In the United States, most people speak English, but if you’re

● The best AI must also spend a significant amount of time

● The abilities of an NLP system depend on the training data

Misspellings are a simple problem for human beings. We can easily

No language is perfect, and most languages have words that

Some phrases and questions actually have multiple intentions, so

A false positive occurs when an NLP notices a phrase that should be

Many modern NLP applications are built on dialogue between a

Accordingly, your NLP AI needs to be able to keep the conversation

Lemma Singular Plural

∙ In higher-level approaches, linguistic information is expressed by more appropriate data

∙ Advantages of this approach include better abstraction possibilities for developing a

∙ Uniﬁcation-based models have been implemented for Russian, Czech,

∙ It treats morphological operations and processes as pure mathematical

∙ Functional morphology implementations are intended to be reused as

● In human language, words and sentences do not appear randomly

There are several approaches to ﬁnding the structure of documents in

2. Machine learning methods

1. Named entity recognition

2. Part-of-speech tagging: This technique assigns a part-of-speech tag to each word

3. Dependency parsing: This technique analyzes the relationships between the

4. Topic modeling: This technique uses unsupervised learning algorithms to identify

2.Topic Boundary Detection

2. Hidden Markov Models(HMM): These are statistical models that can be

1. Sentence boundary detection

3. Named entity recognition

5. Topic boundary detection

One popular generative sequence classiﬁcation method is Hidden Markov

Another type of generative sequence classiﬁcation method is Conditional

One popular example of a discriminative local classiﬁcation method is Conditional

Another example of a discriminative local classiﬁcation method is Maximum

Other discriminative local classiﬁcation methods include support vector

Hybrid approaches to ﬁnding the structure of documents in NLP

For example, a hybrid approach might combine generative and

Here are a few examples:

1. Rule-based approaches: These approaches can be effective when the document

3. Deep learning approaches: These approaches can be very effective in identifying

You might also like