0% found this document useful (0 votes)

3 views

Unit I Inroduction

Uploaded by

GAJULA SATYA NAGA SAI SRI RAM

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Unit I Inroduction

Uploaded by

GAJULA SATYA NAGA SAI SRI RAM

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

Natural Language

Processing
NLP

 is among the hottest topic in the field of data science.

 Companies are putting tons of money into research in this field.
 Everyone is trying to understand NLP and its applications to make a career around
it.
 Every business out there wants to integrate it into their business somehow.
Are you using NLP these days?
Search Autocorrect and
Autocomplete – Language Translator
Social media monitoring
 More people these days have started using social media for posting their thoughts about a
particular product, policy, or matter.
 These could contain some useful information about an individual’s likes and dislikes.
 Analyzing this unstructured data can help in generating valuable insights. NLP comes to rescue
here too.
 various NLP techniques are used by companies to analyze social media posts and know what
customers think about their products.
 Companies are also using social media monitoring to understand the issues and problems that
their customers are facing by using their products.
Chatbots

Modern Conversational
Agents can
• Answer questions
• Book flights
• Find Restaurants
• functions for which
they rely on a much
more sophisticated
understanding of
the user’s intent
Survey Analysis

 Surveys are an important way of evaluating a

company’s performance.
 to get customer’s feedback on various products.
 useful in understanding the flaws and help
companies improve their products.

 NLP is used to analyze the surveys and

generate insights from them, like knowing the
sentiments of users analyzing product
reviews to understand the pros and cons
Targeted Advertising – Hiring and
Recruitment

Targeted advertising is a type of online

advertising where ads are shown to the user
based on their online activity.

it saves companies a lot of money

relevant ads are shown only to the potential
customers.
Voice Assistants
Conventional vs. NLP-based search
What is NLP?

 Natural language processing is a sub-field of linguistics, computer

science and AI concerned with the interactions between computers
and human language
 NLP makes computers understand complex language structure and retrieve
meaningful pieces of information from it
 Modern challenges in NLP involve
 speech recognition,
 natural language understanding and
 natural language generation
Why study NLP?

 Text is the largest repository of human knowledge –

 news articles, web pages, scientific articles, patents, emails, government
documents…
 Tweets, facebook posts, comments, quora… etc.
 What are the top ten languages in the internet in terms of millions of user?
 Goals of NLP
 Fundamental and Scientific Goal – Deep understanding of broad language
 Engineering Goal – Design, implement and test subject that process natural
languages for practical applications.
Applications of NLP

 Text Classification
 Language Modelling
 Information Extraction
 Information Retrieval
 Conversational Agents
 Text Summarization
 Question Answering
 Machine Translation
 Topic Modelling
 Speech Recognition
Origins of NLP

 Alan Turing’s Turing Test (1950)

 1950s – 1960s : Early Developments
 Georgetown – IBM Experiment (1954)
 Chomsky’s Transformational Generative Grammar (1957)
 1960s – 1970s : Rule-based approaches
 1970s – 1980s : Rise of statistical methods
 1980s – 1990s : Corpus Linguistics and Machine Learning
 2000s – present : Deep Learning and Neural networks.
Challenges of NLP
Why NLP is Hard?
Lexical Ambiguity
Why NLP is Hard?
Lexical Ambiguity
Ambiguity is pervasive
Activity
 Find at least 5 meanings of this sentence

I made her duck

 Syntactic category
 Duck can be a noun or verb
 Her can be possessive or dative pronoun
 Word meaning
 Make can mean create or cook
Why NLP is Hard?
Ambiguities

Ambiguity is Pervasive
Lexical Ambiguity
Ambiguity is Explosive
Lexical Ambiguity

Why is language ambiguous?

Natural Language Vs. Computer Languages
 The goal in the production and
 Ambiguity is the primary difference
comprehension of natural language is
efficient communication  Programming languages are designed to
be unambiguous
 Allowing resolvable ambiguity
 PLs are defined by grammar that
 Permits shorter linguistic expressions
produces a unique parse for each
 Avoids language being overly complex sentence in the language.

 Language relies on people’s ability to use

that their knowledge and inference
abilities to properly resolve ambiguities
NLP is Hard? .. Why else NLP is hard?
See you, I will text you later.

 Neologisms
 Non standard use of English in Social
media  Unfriend

 Segmentation issues  Retweet

 The New York-New Heaven Road  Google / skype

 Idioms  New senses of the word

 Dark horse  That’s sick due

 Ball in the court  Giants – multinationals, manufacturers

 Burn the midnight oil  Tricky Entity Names

 Where is A Bug’s life playing…
 Let It Be was recorded
Empirical Laws

 Function words Vs. Content Words

 Function words have little lexical
meaning but serve as important
elements to the structure of the
sentences
 Function words are closed class
words
 Prepositions, pronouns, auxiliary
verbs, conjunctions, grammatical
articles, particles etc. • Most of the words here are function words
• The list is dominated by the little words of
 Eg: a, an, the etc. English having important grammatical role
Empirical Laws
Type Vs. Token
 Type
 Type-Token Ratio (TTR) :
 Concept
 It is the ratio of the no.of different words(types)
 Unique words
to the no.of running words (tokens) in a given
 Tokens text or corpus.
 Instances of concepts  The index indicates how often, on average, a
new ‘word form’ appears in the text or corpus.
 The number of words Mark Twain’s Complete
 Type-Token distinction is a distinction Tom Sawyer Shakespeare work
that separates a concept from the Word Tokens 71,370 884,647
objects which are particular instances Word types 8018 29,066
of the concept.
TTR 0.112 0.032
Empirical Laws
Observation on various texts
 Consider various texts from conversation, academic prose, news, fiction. Which one will
have high TTR and which one will have lowest TTR?

High TTR – tendency

to use new words
Low TTR – same word
repeatedly
Word distribution from Tom Sawyer
Empirical Laws
Zipf’s Law
 Count the frequency of each word type in a large corpus
 List the word types in decreasing order of their frequency

 i.e., the 50th most common word should occur with 3 times the frequency
of the 150th most common word
Empirical evaluation from Tom Sawyer
Empirical Laws
Zipf’s Other laws
Empirical Laws
Heap’s Law
Words – What counts as a word?

 corpus (plural corpora): a computer-readable corpora collection of text or speech

 For example the Brown corpus is a million-word collection of samples from 500 written English texts from
different genres (newspaper, fiction, non-fiction, academic, etc.)

How many words are in the following Brown sentence?

Sentence : He stepped out into the hall, was delighted to encounter a water
brother.
 This sentence has 13 words if we don’t count punctuation marks as words,
 15 if we count punctuation.
 Are capitalized tokens like They and uncapitalized tokens like they the same word?
 How about inflected forms like cats versus cat?
 These two words have the same lemma cat but are different wordforms.
 A lemma is a set of lexical forms having the same stem, the same major part-of-speech, and the
same word sense.
 The wordform is the full inflected or derived form of the word.
Notion of Corpus:
Words – Types and Tokens

 Types are the number of distinct words in a corpus; if the set of words in the vocabulary is V, the
number of types is the word token vocabulary size |V|.
 Tokens are the total number N of running words.
 ignore punctuation and find the number of tokens and types in the following sentence

They picnicked by the pool, then lay back on the grass and looked
at the stars
16
tokens
14 types
Notion of Corpus:
Corpora

 Any particular piece of text that we study is produced by

 one or more specific speakers or writers,
 in a specific dialect of a specific language,
 at a specific time,
 in a specific place,
 for a specific function.
 The most important dimension of variation is the language.
 NLP algorithms are most useful when they apply across many languages. The world has 7097
languages.
 It is important to test algorithms on more than one language, and particularly on languages with
different properties; by contrast there is an unfortunate current tendency for NLP algorithms to
be developed or tested just on English
 Code Switching : A phenomenon which uses multiple languages in a single communicative act
 Another variations are Genre, demographic characteristics of the writer, time.
Text-processing Basics
Tokenization
 Tokenization is the process of segmenting a string of characters into
words.
 What is sentence segmentation? –
 The problem of deciding where the sentences begin and end.
 Depending on the application in hand, you might have to perform
sentence segmentation as well.
 What are the challenges in sentence segmentation?
 !, ? Are quite unambiguous. Period (.) is quite ambiguous.
 What are the strategies to build a sentence segmenter?
 Hand-written rules, regular expressions, machine learning
Text-processing Basics
Word Normalization

 Is the process of segmenting a string of characters into words.  Issues in Tokenization

 Finland’s
I have a can opener; but I can’t open these cans
 What’re, I’m, Should n’t
 Word Token  San Francisco
 An occurrence of a word  m.p.h.
 For the above sentence, 11 word tokens.  Handling Hyphenation
 Word Type  End-of-line hyphen

 A different realization of a word  Lexical hyphen

 Sentential determined
 For the above sentence, 10 word types
 Language specific issues
 Practice
 French and German
 NLTK toolkit, Stanford CoreNLP, Unix Commands
 Chinese and Japanese
 Sanskrit
Using Python’s split() function
Tokenization using Regular Expressions
Tokenization using NLTK
Word Normalization, Stemming and
Lemmatization
 used to prepare text, words, and documents for further
processing
 Reduce inflections or invariant forms to base form:
 am, are, is – be
 car, car’s, cars, cars’ – car
 Finds correct dictionary handword form
 Morphemes are divided into two categories
 Stems – The core meaning bearing units
 Affixes – prefix (un-, anti- etc.,), suffix – (-ity, -ation etc.)
 Stemming and Lemmatization helps us to achieve the
root forms of inflected words
Stemming
• helps us to achieve the root forms of inflected words.
• Stem (root) is the part of the word to which you add inflectional
(changing/deriving) affixes such as (-ed,-ize, -s,-de,mis).
• Crude chopping of affixes
• stemming a word or sentence may result in words that are not actual words.
Stems are created by removing the suffixes or prefixes used with a word.
• A computer program that stems word is called a stemming program, or
stemmer
• PorterStemmer is stemming algorithm present in NLTK which uses Suffix
Stripping

• It does not follow linguistics rather a set of 5 rules for different cases that are
applied in phases to generate stems.
create a function which takes a sentence and returns the stemmed sentence.
Lemmatization

• Lemmatization reduces the inflected words properly ensuring that the root word belongs to the language. In
Lemmatization root word is called Lemma
• For example, runs, running, ran are all forms of the word run, therefore run is the lemma of all these words.
• As lemmatization returns an actual word of the language, it is used where it is necessary to get valid words.
• Python NLTK provides WordNetLemmatizer that uses the WordNet Database to lookup lemmas of words.
Standardization of Data
The common operations performed to standardize the data are

 Removal of duplicate whitespaces and  Acronym normalization (e.g.: ‘US’→‘United

punctuation. States’/‘U.S.A’) and abbreviation normalization
(e.g.: ‘btw’→‘by the way’).
 Accent removal
 Normalize date formats, social security numbers
 Capital letter removal
 Spell correction — this is very important if you’re
 Removal or substitution of special
dealing with open user inputs, such as tweets, IMs
characters/emojis (e.g.: remove
and emails.
hashtags).
 Removal of gender/time/grade variation with
 Substitution of contractions (very common
Stemming or Lemmatization.
in English; e.g.: ‘I’m’→‘I am’).
 Substitution of rare words for more common
 Transform word numerals into numbers
synonyms.
(eg.: ‘twenty three’→‘23’).
 Stop word removal (more a dimensionality
 Substitution of values for their type (e.g.:
reduction technique than a normalization
‘$50’→‘MONEY’).
technique).
Spelling Correction – Edit Distance

 Isolated word error correction

 Pick the one that is closest to ‘behaf’
 How to define ‘closest’?
 Need a distance metric
 The simplest metric is – Edit Distance
 Edit Distance
 The minimum edit distance between two strings – is defined as the minimum number
of editing operations
 Insertion
 Deletion
 Substittution
 Levenshtein distance - substitution has cost -1
 Alternate version – substitution cost - 2
Defining minimum edit distance matrix
Edit Distance calculation
Algorithm using Dynamic Programming
Tracing
Edit Distance
Computing Alignments

 Computing edit distance may not be sufficient for some applications – we

often need to align characters of the two strings to each other
 We do this by keeping a backtrace
 Everytime we enter a cell, remember where we came from
 When we reach the end, tracke back the path from upper right corner to
read off the algorithms.
 Performance
 Time – O(nm)
 Space – O(nm)
 Backtrace – O(n+m)
Language models

 is a computational model or algorithm designed to understand, generate, and predict

human language.
 fundamental part of natural language processing (NLP) and machine learning applications
that involve dealing with textual data.
 The primary goals of a language model include:
 Understanding Language

 Generating Text

 Predicting Sequences

 There are different types of language models, and they can be broadly categorized into
 Statistical Language Models (SLM)

 Grammar-based Language Models

 Neural Language Models

Grammar based Language models

 Grammar-based language models rely on predefined rules and structures

to generate sentences. These rules are often based on formal
grammatical frameworks, such as context-free grammars.
 The model uses syntactic rules to define the permissible arrangements of
words in a sentence.
 Example: In a grammar-based LM, you might have rules specifying that a
sentence must start with a noun phrase followed by a verb phrase.
 Challenge - These models may struggle with handling natural language
variations and may not capture the full complexity of language.
Statistical Language Model

 SLMs are based on statistical patterns observed in a given dataset. They

estimate the probability of a sequence of words occurring based on the
frequencies of these sequences in the training data.
 N-gram Models: SLMs often use n-gram models, where the probability of a
word is conditioned on the previous n-1 words. Commonly used n-grams
include bigrams (n=2) and trigrams (n=3).
 Example: In an SLM, the probability of the word "rain" might be higher if the
preceding words are "the" and "it" compared to other combinations.
 Challenge – data sparsity issues

Achiever G8 - Semester 1 - Final Test
No ratings yet
Achiever G8 - Semester 1 - Final Test
6 pages
Natural Language Processing (NPL) : Group Name: Goal Diggers
No ratings yet
Natural Language Processing (NPL) : Group Name: Goal Diggers
22 pages
ELLP Oral Language Matrix - Input-Listening
No ratings yet
ELLP Oral Language Matrix - Input-Listening
1 page
Understanding English Pronunciation - An Integrated Practice Course (RED)
100% (12)
Understanding English Pronunciation - An Integrated Practice Course (RED)
157 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
36 pages
38. Natural Language Processing (1) Copy
No ratings yet
38. Natural Language Processing (1) Copy
30 pages
Seminar Report
No ratings yet
Seminar Report
12 pages
notes
No ratings yet
notes
9 pages
Basic NLP to End-to-end Pipeline .pptx_removed
No ratings yet
Basic NLP to End-to-end Pipeline .pptx_removed
35 pages
NLP Introduction Overview
No ratings yet
NLP Introduction Overview
34 pages
01
No ratings yet
01
60 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
63 pages
Unit 1 Extra
No ratings yet
Unit 1 Extra
6 pages
Unit 1 NLP
No ratings yet
Unit 1 NLP
44 pages
INTRONLP
No ratings yet
INTRONLP
30 pages
NLP-UNIT-1-1
No ratings yet
NLP-UNIT-1-1
67 pages
Natural Language Processing
No ratings yet
Natural Language Processing
30 pages
1 - Intro - To - NLP 2
No ratings yet
1 - Intro - To - NLP 2
55 pages
Lec1 Introduction
No ratings yet
Lec1 Introduction
30 pages
NLP Digital Notes
No ratings yet
NLP Digital Notes
128 pages
1 INTRODUCTION
No ratings yet
1 INTRODUCTION
13 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
45 pages
Archivo - 01 (4 Cópia)
No ratings yet
Archivo - 01 (4 Cópia)
6 pages
Natural Language Processing Revision Notes
No ratings yet
Natural Language Processing Revision Notes
4 pages
NLP_MODULE-1_new_updated
No ratings yet
NLP_MODULE-1_new_updated
57 pages
nayie bayes classifier 21 page
No ratings yet
nayie bayes classifier 21 page
28 pages
2 INTRODUCTION
No ratings yet
2 INTRODUCTION
15 pages
Human Communication, Either Spoken or Written, Consisting of The Use of Words in A Structured and Conventional Way". Language Makes Us Unique From Other Living Beings and I Would
No ratings yet
Human Communication, Either Spoken or Written, Consisting of The Use of Words in A Structured and Conventional Way". Language Makes Us Unique From Other Living Beings and I Would
7 pages
nlp
No ratings yet
nlp
19 pages
NLP Presentation
No ratings yet
NLP Presentation
19 pages
Chapter 6-NLPs
No ratings yet
Chapter 6-NLPs
31 pages
1 Introduction
No ratings yet
1 Introduction
99 pages
Natural Language Processing: Bachelor of Technology Computer Science and Engineering
No ratings yet
Natural Language Processing: Bachelor of Technology Computer Science and Engineering
7 pages
NLP Unit I Notes-1
75% (4)
NLP Unit I Notes-1
22 pages
Chapter 6
100% (1)
Chapter 6
28 pages
Natural Language Processing State of the Art Curre
No ratings yet
Natural Language Processing State of the Art Curre
33 pages
Natural Language Processing
No ratings yet
Natural Language Processing
5 pages
NLP Introduction
No ratings yet
NLP Introduction
35 pages
NLP Notes
No ratings yet
NLP Notes
18 pages
NLP Presentation
No ratings yet
NLP Presentation
19 pages
1.introduction To Natural Language Processing (NLP)
100% (1)
1.introduction To Natural Language Processing (NLP)
37 pages
NLP Notes Unit-1
No ratings yet
NLP Notes Unit-1
20 pages
6CS4 AI Unit-5
No ratings yet
6CS4 AI Unit-5
65 pages
1 Natural Language Processing-Intro
No ratings yet
1 Natural Language Processing-Intro
16 pages
NLP.pptx
No ratings yet
NLP.pptx
21 pages
NLPNotes
No ratings yet
NLPNotes
12 pages
CH1
No ratings yet
CH1
87 pages
1 Introduction
No ratings yet
1 Introduction
45 pages
Seminar Report1
No ratings yet
Seminar Report1
17 pages
Unit V
No ratings yet
Unit V
16 pages
Natural Language Processing
No ratings yet
Natural Language Processing
72 pages
NLP Introduction
No ratings yet
NLP Introduction
52 pages
NLP Module 1
No ratings yet
NLP Module 1
124 pages
Text Analysis Based On Natural Language Processing NLP
No ratings yet
Text Analysis Based On Natural Language Processing NLP
10 pages
NLP
No ratings yet
NLP
14 pages
Lecture 2 NLP
No ratings yet
Lecture 2 NLP
27 pages
AI Chapter 6 and 7 New
No ratings yet
AI Chapter 6 and 7 New
48 pages
Assignment of AI Finished
No ratings yet
Assignment of AI Finished
16 pages
Natural Language Processing State of The Art, Current Trends and Challenges - s11042-022-13428-4 PDF
No ratings yet
Natural Language Processing State of The Art, Current Trends and Challenges - s11042-022-13428-4 PDF
32 pages
NLP Unit-1 - 1
No ratings yet
NLP Unit-1 - 1
24 pages
404-BA-Chapter V
No ratings yet
404-BA-Chapter V
22 pages
Module_1_part1_NLP
No ratings yet
Module_1_part1_NLP
24 pages
Natural Language Processing
From Everand
Natural Language Processing
Ajit Singh
No ratings yet
67c7d4a944082_designathon
No ratings yet
67c7d4a944082_designathon
8 pages
Best Quantitative Aptitude Formula Cheat Sheet for Exam Prep __ Unstop
No ratings yet
Best Quantitative Aptitude Formula Cheat Sheet for Exam Prep __ Unstop
26 pages
FfundamentalsOofEentraprenurship_merged
No ratings yet
FfundamentalsOofEentraprenurship_merged
49 pages
Infosys Hr Questions
No ratings yet
Infosys Hr Questions
7 pages
Unit 3 Shell Programming
No ratings yet
Unit 3 Shell Programming
5 pages
Jambhalache Divas
No ratings yet
Jambhalache Divas
4 pages
Phil Iri Pretest English
No ratings yet
Phil Iri Pretest English
5 pages
Inglés Articulación
No ratings yet
Inglés Articulación
12 pages
Answer Key
No ratings yet
Answer Key
2 pages
Soal Ulangan Bahasa Inggris
No ratings yet
Soal Ulangan Bahasa Inggris
4 pages
The Grammar of English Comparing To The Grammar of Kazakh As A Foreign
No ratings yet
The Grammar of English Comparing To The Grammar of Kazakh As A Foreign
7 pages
Pradhan Mantri Mahila Shakti Kendra PMMSK Important
No ratings yet
Pradhan Mantri Mahila Shakti Kendra PMMSK Important
2 pages
(Ebook) Unwilling Pawn (Reluctant Brides #1) by Measha Stone ISBN 9798454903756, 8454903755, B09C2R6R2P all chapter instant download
100% (9)
(Ebook) Unwilling Pawn (Reluctant Brides #1) by Measha Stone ISBN 9798454903756, 8454903755, B09C2R6R2P all chapter instant download
81 pages
Fan Bhakti and Subaltern Sovereignty: Enthusiasm As A Political Factor
No ratings yet
Fan Bhakti and Subaltern Sovereignty: Enthusiasm As A Political Factor
9 pages
Spend 20 Minutes: SUBJECT: English Literature
No ratings yet
Spend 20 Minutes: SUBJECT: English Literature
6 pages
Generative Grammar: Terminals Nonterminals Productions
No ratings yet
Generative Grammar: Terminals Nonterminals Productions
22 pages
Vocabulary - Adjectives
No ratings yet
Vocabulary - Adjectives
2 pages
Untitled
No ratings yet
Untitled
1,365 pages
A_Preliminary_Description_of_Srugi_Arabi
No ratings yet
A_Preliminary_Description_of_Srugi_Arabi
17 pages
AI Scaling and limitation
No ratings yet
AI Scaling and limitation
3 pages
What Is An Idiom... ?: Sample Sentences With Idioms..
No ratings yet
What Is An Idiom... ?: Sample Sentences With Idioms..
11 pages
Weekly Plannig 22-25 Sixth Richard Duque
No ratings yet
Weekly Plannig 22-25 Sixth Richard Duque
3 pages
Indirectspeech Elementary
No ratings yet
Indirectspeech Elementary
5 pages
Communication and Its Types
No ratings yet
Communication and Its Types
15 pages
History of Philippines
No ratings yet
History of Philippines
3 pages
Standard Phrases English (Upper) Intermediate
100% (1)
Standard Phrases English (Upper) Intermediate
11 pages
01 Reported Statements
No ratings yet
01 Reported Statements
6 pages
Student Book (SB) Learn English Select Face To Face A1
No ratings yet
Student Book (SB) Learn English Select Face To Face A1
402 pages
CLASS 5 CHAPTER 8-13
No ratings yet
CLASS 5 CHAPTER 8-13
32 pages
Unit Test 11A: 1 Choose The Correct Modal Verb
No ratings yet
Unit Test 11A: 1 Choose The Correct Modal Verb
4 pages
Nabokovs Time Doubling From The Gift To PDF
No ratings yet
Nabokovs Time Doubling From The Gift To PDF
41 pages

Unit I Inroduction

Uploaded by

Unit I Inroduction

Uploaded by

Natural Language

 is among the hottest topic in the field of data science.

 Surveys are an important way of evaluating a

 NLP is used to analyze the surveys and

Targeted advertising is a type of online

it saves companies a lot of money

 Natural language processing is a sub-field of linguistics, computer

 Text is the largest repository of human knowledge –

 Alan Turing’s Turing Test (1950)

I made her duck

Why is language ambiguous?

 Language relies on people’s ability to use

 Segmentation issues  Retweet

 The New York-New Heaven Road  Google / skype

 Idioms  New senses of the word

 Ball in the court  Giants – multinationals, manufacturers

 Burn the midnight oil  Tricky Entity Names

 Function words Vs. Content Words

High TTR – tendency

 corpus (plural corpora): a computer-readable corpora collection of text or speech

How many words are in the following Brown sentence?

 Any particular piece of text that we study is produced by

 Is the process of segmenting a string of characters into words.  Issues in Tokenization

 A different realization of a word  Lexical hyphen

 Removal of duplicate whitespaces and  Acronym normalization (e.g.: ‘US’→‘United

 Isolated word error correction

 Computing edit distance may not be sufficient for some applications – we

 is a computational model or algorithm designed to understand, generate, and predict

 Grammar-based Language Models

 Neural Language Models

 Grammar-based language models rely on predefined rules and structures

 SLMs are based on statistical patterns observed in a given dataset. They

You might also like