0% found this document useful (0 votes)

66 views53 pages

Unit 1a

Uploaded by

Samriddhi Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views53 pages

Unit 1a

Uploaded by

Samriddhi Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 53

NLP

Unit-1

Introduction
Components of NLP
Natural Language Understanding (NLU)
NLU is the process of enabling machines to comprehend and interpret human language. It involves the analysis
of input text or speech to extract meaning, context, and intent.

Tokenization: Breaking down the input into individual words or tokens.

Part-of-Speech (POS) Tagging: Assigning grammatical categories (nouns, verbs, etc.) to each token.
Named Entity Recognition (NER): Identifying and classifying entities such as names of people,
organizations, locations, etc.
Syntax and Semantics Analysis: Understanding the grammatical structure and meaning of
sentences.
Sentiment Analysis: Determining the emotional tone expressed in the text.
Intent Recognition: Identifying the purpose or goal behind a given user input.
Natural Language Generation (NLG)
NLG is the process of generating human-like language or text based on underlying data or information. It
involves transforming structured data into coherent and contextually relevant natural language output.

Text Planning: Deciding what information to include and how to structure it.
Sentence Generation: Creating grammatically correct and contextually appropriate
sentences.
Lexical Choice: Selecting appropriate words and vocabulary for the generated text.
Referring Expression Generation: Deciding how to refer to entities mentioned in
the text.
Coherence and Cohesion: Ensuring that the generated text flows logically and is
cohesive.

NLG is used in various applications such as automatic summarization, report generation, chatbots, and content
creation.
Approaches and Models for Applying Natural
Language Processing
Classical approach to NLP
• Rule-Based Systems:
• Syntax and Grammar Rules:
• Semantic Analysis:
• Named Entity Recognition (NER):
• Shallow Natural Language Processing:
• Information Retrieval Techniques:
• Machine Translation with Rules:
• Expert Systems:
Rule Based approaches NLP
• Syntax and Grammatical Rules:
• Named Entity Recognition (NER):
• Semantic Rules:
• Sentiment Analysis Rules:
• Question Answering Rules:
• Dialogue Management Rules:
• Template-based NLG:
• Hybrid Approaches:

They are particularly suitable for well-defined and rule-bound

tasks, but their effectiveness may be limited in more complex
and dynamic language understanding scenarios.
Traditional approachesStatistical approaches
Tokenization
involves breaking text into a sequence of tokens, roughly corresponding to words.
Part-of-speech tagging
identifying words as a part of speech they belong to
Chunking
based on grouping bits of information in order to come to a deductive / inductive
conclusion.
Named-entity recognition
locating and classifying named entities
Co-reference resolution
identifying all the expressions that refer to the very same entity in a text.
Semantic role labeling
assigning roles to the constituents or phrases in sentences.
Statistical approaches in NLP
Statistical approaches in Natural Language Processing (NLP) involve the use of statistical models and machine
learning algorithms to automatically learn patterns, relationships, and structures from large amounts of
linguistic data.
• Corpus-based Learning
• Probabilistic Models
• N-gram Models
• Hidden Markov Models (HMMs)
• Maximum Likelihood Estimation (MLE)
• Conditional Random Fields (CRFs)
• Machine Learning Algorithms
• Support Vector Machines (SVMs),
• decision trees, and
• neural networks
• Word Embedding
Adaptive models
Employ a bunch of adaptive or self-learning models (like neural Networks etc.) that help to improve predictions
on ever-changing data
Examples
long short-term memory networks (LSTMs)
Used to classify, process, and predict time-series data based on time lags of unknown size and duration
between important events.
Generative adversarial networks (GANs)
belongs to unsupervised machine learning and comprise two neural networks, one of which generates
candidates and the other evaluates them.
Understanding linguistics

https://www.uni-due.de/SHE/REV_Levels_Chart.htm

Source: https://towardsdatascience.com/linguistic-knowledge-in-natural-language-processing-
332630f43ce1
Morphology
At this stage we care about the words that make up the sentence, how they are formed,
and how do they change depending on their context. Some examples of these include:
• Prefixes/suffixes
• Singularization/pluralization
• Gender detection
• Word inflection (modification of word to express different grammatical categories
such tenses, case, voice etc..). Other forms of inflection includes conjugation
(inflection of verbs) and declension (inflection of nouns, adjectives, adverbs etc…).
• Lemmatization (the base form of the word, or the reverse of inflection)
• Spell checking

Source: https://towardsdatascience.com/linguistic-knowledge-in-natural-language-processing-
332630f43ce1
Syntax (Parsing)
In this stage, we focus more on the relationship of the words within a sentence ie
how a sentence is constructed.

syntactical analysis is usually done at a sentence-level, where as for morphology

the analysis is done at word level.

When we’re building dependency trees or processing parts-of-speech — we’re

basically analyzing the syntax of the sentence.

Source: https://towardsdatascience.com/linguistic-knowledge-in-natural-language-processing-
Semantics
Once we’ve understood the syntactic structures, we are more prepared to get into the “meaning” of
the sentence (for a fun read on what meaning can actually mean in NLP — head over here to dive
into a Twitter discussion on the subject ).
Some example of tasks performed at this stage include:
• Named Entity Recognition (NER)
• Relationship Extraction

Source: https://towardsdatascience.com/linguistic-knowledge-in-natural-language-processing-
332630f43ce1
Pragmatics
At this level, we try to understand the text as a whole. Popular problems that we’re
trying to solve at this stage are:
• Topic modelling
• Co-reference
• Summarization
• Question & Answering

Source: https://towardsdatascience.com/linguistic-knowledge-in-natural-language-processing-
332630f43ce1
Text processing
Theory and practice of automating the creation or manipulation of electronic text.

Representation of data:
• Text.
• Images.
• Audio.
• Videos.

Analyzing the data which may be structured or unstructured to

obtain structured information

• Text extraction.
• Text classification.
Extracting individual and small bits of information from large text data is called as text extraction.
Assigning values to the text data depending upon the content is called as text classification.
Text analysis vs. Text mining vs. Text analytics
• Used to obtain data by statistical pattern learning.
• Both text analysis and text mining are qualitative processes.
• Text Analytics is quantitative process.

• Example:
– Banking service: Customer satisfaction.
– Text analysis: Individual performance of the customer support
executive. Text used in the feedback like "good", "bad“.
– Text analytics:
• Overall performance of all the support executives.
• Graph for visualizing the performance of the entire support team.
– Text analytics for overall count of issues resolved.
Text processing tools
• Statistical methods
• Text classification methods
• Text extraction methods
Tools and methodologies: Statistical methods
• Statistical methods:
– Word frequency: Identify the most regularly used expressions or words that is present in a specific
text.
– Collocation: Method for identifying the common words that appear together.
– Concordance: Methodology to provide context to the natural language.
– TF-IDF: Identifies the importance of words in a document.

Figure: Statistical methods

Source: http://grjenkin.com/articles/category/data-science/106322/big-data-data-science-and-machine-learning-explained
Tools and methodologies: Text classification
• Text classification:
– Content is analyzed and classified into multiple predefined groups based upon the analysis.

Figure: Text classification

Source: https://hackernoon.com/text-classification-simplified-with-facebooks-fasttext-b9d3022ac9cb
Tools and methodologies: Text classification
• Topic analysis: Identify and interpret la2e collection of text according to the individual topics
assigned.
• Sentiment analysis: Understanding the emotional feel represent in a textual message.

Figure: Language classification

Source: https://www.kdnuggets.com/2018/03/5-things-sentiment-analysis-classification.html
Tools and methodologies: Text extraction
• Text extraction: Process of gathering valuable pieces of information present within the text
data.

Figure: Text extraction

Source: https://www.upgrad.com/blog/what-is-text-mining-techniques-and-applications/

• Keyword extraction: Identifying and detecting the most relevant of the words inside a text.

• Entity extraction: Useful for gathering information on specific relevant elements and to
discard all the other irrelevant elements.
Scope of text analysis/processing
• Large documents:
– Refer for a context.
– Cross examine multiple documents.

• Individual sentences:
– Gathering specific information.
– Identify the emotional or intentional activities.

• Parts of the sentences:

– Sentiments of the words can be analyzed.
– Better understanding of the natural language.
– Provided for machine to analyze and understand.
Importance of text analysis
• Business growth:
– Extraction of information to identify the customer.

• Real time analysis:

– Urgent requirements or complaint handled on a real-time basis.
– Categorized as priority.
– Require multiple analysis.

• Checking for consistency:

– Detect latest models.
– Analyzing.
– Understanding.
– Sharing of the available data accurately.
Working principles of text analysis
• Data gathering.

• Data preparation.

• Data analysis.
Data gathering
• Text analysis: Gathering the required data that need to be analyzed.

• Internal data:
– Email.
– Chat messages.
– CRM tools.
– Databases.
– Surveys.
– Spreadsheets.
– Product analysis report.

Figure: Text analysis

Source: https://voziq.com/customer-retention/improving-customer-retention-strategies-with-unstructured-customer-data/attachment/common-
sources-of-unstructured-data/
Data gathering
• External data: The external data do not belong to the organization and are available freely
through other sources.

• Web scraping tools.

• Open data.

Figure: Web scraping tools

Source: https://strikedeck.com/top-10-customer-data-sources/
Data preparation
• Before text is analyzed by any machine learning algorithm, it needs to be prepared.

• Tokenization:
– Identify and recognize the unit of text.
– Process of breaking up text characters into meaningful elements.
– Analyze the meaningful parts of the text and discarding the meaningless sections.
– Removes all the frequent words that can be found in a sentence.

• Stemming:
– Used to reduces a word to its root to convey meaning.
– Unnecessary character removal like prefix, suffix etc.

• Lemmatization:
– Identify parts of the speech not needed and removes the inflection.
Dependency parsing
Constituency parsing
– Uses syntactic structures: Abstract notes associated to words and abstract categories.

Figure: Constituency parsing

Source: http://www.cs.cornell.edu/courses/cs5740/2017sp/lectures/13-parsing-const.pdf
Lexical analysis, Syntactic analysis, Semantic
analysis, Discourse integration, Pragmatic
analysis
Lexical analysis
• Lexical analysis is the process of converting a sequence of characters into a
sequence of tokens. A lexer is generally combined with a parser, which together
analyzes the syntax of programming languages, web pages, and so forth.
• Lexers and parsers are most often used for compilers but can be used for other
computer languages tools, such as pretty printers or liters.
• Lexical analysis is also an important analysis during the early stage of natural
language processing, where text or sound waves are segmented into words and
other units

How it can be done?

Classical way  Lookup in a Dictionary
Traditional Way  Tokenization by ML methods
Tokenization
Tags or recognition
Syntactic analysis
• Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string
of symbols, either in natural language, computer languages, or data structures,
conforming to the rules of formal grammar.
• It is used in the analysis of computer languages, referring to the syntactic
analysis of the input code into its component parts to facilitate the writing of
compilers and interpreters.
• Grammatical rules are applied to categories and groups of words, not individual
words. Syntactic analysis is a very important part of NLP that helps in
understanding the grammatical meaning of any sentence.
Syntactic analysis
Semantic analysis
• Semantic Analysis attempts to understand the meaning of Natural Language.
• Semantic Analysis of Natural Language captures the meaning of the given text
while considering context, logical structuring of sentences, and grammar roles.
• Semantic analysis can begin with the relationship between individual words.
Discourse integration
The analysis and identification of the larger context for any smaller part of natural
language structure

“He” can be Ram or Mohan

Pragmatic analysis
• Refers to the study of how language is used in context to convey meaning.
• Pragmatic analysis deals with outside word knowledge, which means knowledge
that is external to the documents and/or queries
Corpus
What Is a Corpus?
A corpus is a collection of examples of language in use that are selected
and compiled in a principled way.
• The term corpus refers to the intention for it to be a representative
body of evidence for the study of language and language use. In the
most general terms, the purpose of a corpus is to document a
language. Hence a corpus is often an essential component of
language documentation or language archives.
• The construction of a corpus starts with decisions on design criteria.
Corpus design criteria are mainly driven by the purpose of the corpus,
but may also be affected by meta-theoretical concerns such as
evaluation methods, reusability, and interoperability.
Corpus creation
• Basic reference location that the features of the natural language.
• Created based upon the grammar and context.
• Analysis of the Linguistics and Hypothesis testing.

Figure: Corpus creation

Source: https://devopedia.org/text-corpus-for-nlp
Types of corpora
Single Language(or monolingual) corpus
Balanced (General purpose) Corpus Multilanguage (or multilingual) corpus
• Example English Language
Specialized Corpus
Parallel Corpus
• Example Engineering, medical in English
• contains texts of the same content in different
language
languages (e.g., an original text and its translation(s) in
Synchronic Corpus one or more other languages).
• contains language data that are produced in roughly Comparable Corpus
the same time period • Multilanguage corpus containing a collection of texts
Diachronic Corpus from two or more languages collected under the same
• contains data from different time periods set of criteria.
• Language for general purposes corpora:
Spoken Corpus – Economic corpora.
Written Corpus – Legal corpora.
Mixed Corpus – Medical corpora.

• Multilingual parallel corpora:

– L1: L2 bidirectional.
– L1 translation L2.
Usage areas of corpora
• Translation:
– Parallel corpora.
– Native corpora.

• Education:
– Data driven learning.
– Concordance usage.
– Generalization extraction from data.

• General usages:
– Native speaker intuition.
– Frequency of occurrence.
– Relationship as per usage.
Traits of a good text corpus
• Depth:
– Example: Wordlist: Top 50000 words and not just top 5000 words.

• Recent:
– Example: outdated texts: Courting vs current age texts: Dating.

• Metadata:
– Example: Source, Genre, Type.

• Genre:
– Example: Text from newspapers, journals, etc.

• Size:
– Example: half million English words.

• Clean:
– Example: Noun – Flower, Fruit Verb – Eat, Smell.
Annotation and Storage of Corpus
• modern corpora are digitally stored.
• a popular approach is to use the Extensible Markup Language (XML) for encoding and storing
corpus files containing two types of elements: content (for keeping linguistic data collected for
the corpus) and markup (for keeping annotations).
• annotation enriches a corpus and facilitates discovery of generalizations and
knowledge which is difficult, if not impossible, to obtain from a raw corpus.
Example.. POS Tagging
• All currently available balanced corpora are POS-tagged
• a syntactically annotated corpus, however, is conventionally called a treebank
• Standard way is manual annotation by annotators
• Modern Automated annotation is used using prediction using either statistic or heuristic rules
• Another way is crowdsourcing.
Annotations in text corpus (1 of 2)

Figure: Inline annotations

Source: https://www.researchgate.net/figure/A-piece-of-biology-text-annotated-with-multiple-ontologies-Different-color-
highlights_fig1_261028765
Corpus-Words
They picnicked by the pool, then lay back on the grass and looked at the stars.

• Types are the number of distinct words in a corpus.

• Tokens are the total number N of running words.
If we ignore punctuation, the above sentence has 16 tokens and 14 types
Datasheet or Data statement
The best way is for the corpus creator to build a datasheet or data statement
A datasheet specifies properties of a dataset like:
Motivation: Why was the corpus collected, by whom, and who funded it?
Situation: When and in what situation was the text written/spoken? For example, was there a task?
Was the language originally spoken conversation, edited text, social media communication,
monologue vs. dialogue?
Language variety: What language (including dialect/region) was the corpus in?
Speaker demographics: What was, e.g., age or gender of the authors of the text?
Collection process: How big is the data? If it is a subsample how was it sampled? Was the data
collected with consent? How was the data pre-processed, and what metadata is available?
Annotation process: What are the annotations, what are the demographics of the annotators, how
were they trained, how was the data annotated?
Distribution: Are there copyright or other intellectual property restrictions?
NLP Libraries
• Scikit-learn: It provides a wide range of algorithms for building machine
learning models in Python.
• Natural language Toolkit (NLTK): NLTK is a complete toolkit for all NLP
techniques.
• Pattern: It is a web mining module for NLP and machine learning.
• TextBlob: It provides an easy interface to learn basic NLP tasks like
sentiment analysis, noun phrase extraction, or pos-tagging.
• Quepy: Quepy is used to transform natural language questions into queries
in a database query language.
• SpaCy: SpaCy is an open-source NLP library which is used for Data
Extraction, Data Analysis, Sentiment Analysis, and Text Summarization.
• Gensim: Gensim works with large datasets and processes data streams.

Raymond S. T. Lee - Natural Language Processing. A Textbook With Python Implementation-Springer (2024)
No ratings yet
Raymond S. T. Lee - Natural Language Processing. A Textbook With Python Implementation-Springer (2024)
454 pages
Current Trends and Issues in Materials Development
100% (1)
Current Trends and Issues in Materials Development
13 pages
Unit 1
No ratings yet
Unit 1
99 pages
01 Introduction To Natural Language Processing
No ratings yet
01 Introduction To Natural Language Processing
42 pages
NLP Question and Answers Final
No ratings yet
NLP Question and Answers Final
129 pages
Vocabulary: WZ 01 W' B BQV HVQ BV - Z e 01 GVM Póv Ki J 70% Avq Ë WB Z Cvi Eb
No ratings yet
Vocabulary: WZ 01 W' B BQV HVQ BV - Z e 01 GVM Póv Ki J 70% Avq Ë WB Z Cvi Eb
160 pages
Ebook Erp
No ratings yet
Ebook Erp
223 pages
NLP PPT
No ratings yet
NLP PPT
41 pages
Unit V Natural Language Processing
No ratings yet
Unit V Natural Language Processing
20 pages
Natural Language Processing Tools and Approaches
No ratings yet
Natural Language Processing Tools and Approaches
106 pages
AI Unit 3
No ratings yet
AI Unit 3
12 pages
Notes MSC NLP
No ratings yet
Notes MSC NLP
36 pages
TOPIC 4 Natural Language Processing
No ratings yet
TOPIC 4 Natural Language Processing
26 pages
NLP Notes
No ratings yet
NLP Notes
73 pages
GR 6 - Unit 1 - Revision Worksheet
No ratings yet
GR 6 - Unit 1 - Revision Worksheet
4 pages
Introduction To NLPAbebe Zerihun
No ratings yet
Introduction To NLPAbebe Zerihun
45 pages
Introduction To NLP
No ratings yet
Introduction To NLP
50 pages
Lesson Plan Celebrations Around The World
No ratings yet
Lesson Plan Celebrations Around The World
6 pages
Ai TXT Unit1
No ratings yet
Ai TXT Unit1
13 pages
Student-Friendly Writing Rubric-6+1 Traits - 1
No ratings yet
Student-Friendly Writing Rubric-6+1 Traits - 1
2 pages
Natural Language Processin1
No ratings yet
Natural Language Processin1
86 pages
Unit 4
No ratings yet
Unit 4
39 pages
Brocode OP
No ratings yet
Brocode OP
133 pages
Lect1 Intro 3jan08
No ratings yet
Lect1 Intro 3jan08
94 pages
NLP Notes
No ratings yet
NLP Notes
18 pages
تعلم ML4
No ratings yet
تعلم ML4
42 pages
ESP Summative
No ratings yet
ESP Summative
36 pages
Unit-I NLP
No ratings yet
Unit-I NLP
15 pages
Lesson 26 - Chomsky Normal Form - Leftmost Derivation
No ratings yet
Lesson 26 - Chomsky Normal Form - Leftmost Derivation
22 pages
Natural Language Processing - Step by Step Guide - NLP
No ratings yet
Natural Language Processing - Step by Step Guide - NLP
21 pages
NLP Unit 1 Part1
No ratings yet
NLP Unit 1 Part1
61 pages
1 Introduction
No ratings yet
1 Introduction
99 pages
Contemporary Topics 2 3rd Edition Audio Booklet
No ratings yet
Contemporary Topics 2 3rd Edition Audio Booklet
3 pages
NLP Lab1
No ratings yet
NLP Lab1
33 pages
Natural Language Processing Manual
No ratings yet
Natural Language Processing Manual
39 pages
Unit 5
No ratings yet
Unit 5
42 pages
1 NLP
No ratings yet
1 NLP
26 pages
NLP Part1
No ratings yet
NLP Part1
67 pages
Natural Language Processing 5
No ratings yet
Natural Language Processing 5
24 pages
Chapter - 1
No ratings yet
Chapter - 1
25 pages
Introduction To Rhetoric
No ratings yet
Introduction To Rhetoric
21 pages
Natural Language Processing Unit 1-2
No ratings yet
Natural Language Processing Unit 1-2
18 pages
Unit 2a
No ratings yet
Unit 2a
51 pages
Chapter 6
No ratings yet
Chapter 6
21 pages
Course File - IPCW-Aug-Dec 2023 - All Batches
No ratings yet
Course File - IPCW-Aug-Dec 2023 - All Batches
11 pages
NLP Unit1
No ratings yet
NLP Unit1
51 pages
Part01 Overview
No ratings yet
Part01 Overview
31 pages
Membership Service Provider (MSP)
No ratings yet
Membership Service Provider (MSP)
12 pages
Unit 1b
No ratings yet
Unit 1b
24 pages
Transaction Flow
No ratings yet
Transaction Flow
11 pages
Introduction To Natural Language Processing
No ratings yet
Introduction To Natural Language Processing
32 pages
Unit 2b
No ratings yet
Unit 2b
22 pages
NLP PPT1
No ratings yet
NLP PPT1
29 pages
Natural Language Processing: By-Himani (ROLL NO. 43)
No ratings yet
Natural Language Processing: By-Himani (ROLL NO. 43)
19 pages
Eng 10 Purposive Communication
No ratings yet
Eng 10 Purposive Communication
3 pages
Module 1 Lecture 1
No ratings yet
Module 1 Lecture 1
29 pages
UPES Internship - HLD
No ratings yet
UPES Internship - HLD
7 pages
Introduction To NLP
No ratings yet
Introduction To NLP
23 pages
Linguistic Lesson Plan Guidelines and Template - Rev Apr 18
No ratings yet
Linguistic Lesson Plan Guidelines and Template - Rev Apr 18
5 pages
NLP Unit1
No ratings yet
NLP Unit1
24 pages
Developing A Genrebased Model
No ratings yet
Developing A Genrebased Model
15 pages
Junior Waec Timetable
50% (2)
Junior Waec Timetable
2 pages
NLP Steps Basic
No ratings yet
NLP Steps Basic
26 pages
NLP Final
No ratings yet
NLP Final
33 pages
PresentationDayone-Introduction of NLP
No ratings yet
PresentationDayone-Introduction of NLP
17 pages
NLP Introduction
No ratings yet
NLP Introduction
35 pages
NLP Self
No ratings yet
NLP Self
22 pages
ECE 598 PV Course Notes2
No ratings yet
ECE 598 PV Course Notes2
6 pages
Natural Language Processing 101
No ratings yet
Natural Language Processing 101
26 pages
Ai 2
No ratings yet
Ai 2
7 pages
Irregular Verbs With Translation
No ratings yet
Irregular Verbs With Translation
3 pages
Natural Language Processing
No ratings yet
Natural Language Processing
21 pages
JD - Global Security
No ratings yet
JD - Global Security
2 pages
Assignment 3
No ratings yet
Assignment 3
1 page
Quiz Unit 1
No ratings yet
Quiz Unit 1
6 pages
Cac de Luyen Thi
No ratings yet
Cac de Luyen Thi
7 pages
Natural Language Processing Lec 1
No ratings yet
Natural Language Processing Lec 1
23 pages
What Is Computational Linguistics
No ratings yet
What Is Computational Linguistics
14 pages
Planificador 8th
No ratings yet
Planificador 8th
7 pages
Topic 2: Introduction To Natural Language Processing (NLP)
No ratings yet
Topic 2: Introduction To Natural Language Processing (NLP)
16 pages
Natural Language Processing Notes by Prof. Suresh R. Mestry: L I L L L I
No ratings yet
Natural Language Processing Notes by Prof. Suresh R. Mestry: L I L L L I
41 pages
Introduction To Natural Language Processing and NLTK
No ratings yet
Introduction To Natural Language Processing and NLTK
23 pages
SET-01 - SOCS - ESE-MAY23 - B.Tech (CSE) +AIML - VIII - CSBA4014 - Data Analysis and Modelling Technique
No ratings yet
SET-01 - SOCS - ESE-MAY23 - B.Tech (CSE) +AIML - VIII - CSBA4014 - Data Analysis and Modelling Technique
2 pages
Unit I NLP
No ratings yet
Unit I NLP
5 pages
JD - Managed Services
No ratings yet
JD - Managed Services
2 pages
Malta Presentation Word
No ratings yet
Malta Presentation Word
5 pages
Corpuz Mary Nicole Deconstruction Paper
100% (1)
Corpuz Mary Nicole Deconstruction Paper
3 pages
Harmonizing Humanity and Technology
No ratings yet
Harmonizing Humanity and Technology
10 pages
Set-1 Ese Dec22 B Tech (Cse+Bao) B Tech (Cse+Bao) (Hons.) V Eceg3052 Micro Processor & Embedded Systems
No ratings yet
Set-1 Ese Dec22 B Tech (Cse+Bao) B Tech (Cse+Bao) (Hons.) V Eceg3052 Micro Processor & Embedded Systems
2 pages
Introduction To Data Science - Week 7 - LAQ's
No ratings yet
Introduction To Data Science - Week 7 - LAQ's
4 pages
Strategies For Developing Communicative Competence in English As A Second Language (ESL) Situation
No ratings yet
Strategies For Developing Communicative Competence in English As A Second Language (ESL) Situation
6 pages
IPE JR English Half Yearly Examinations QP's at 31-12-2020
No ratings yet
IPE JR English Half Yearly Examinations QP's at 31-12-2020
4 pages
Introduction To NLP - First - Week - Lecture - 1st
No ratings yet
Introduction To NLP - First - Week - Lecture - 1st
6 pages
26 English Swear Words That You Thought Were Harmless
No ratings yet
26 English Swear Words That You Thought Were Harmless
7 pages
Overview of DACH
No ratings yet
Overview of DACH
3 pages
SET-02 - SOCS - ESE-DEC23 - B.Tech (CSE-H+NH) - All Spec. - 5 - ECEG3052 - Micro Processor & Embedded Systems
No ratings yet
SET-02 - SOCS - ESE-DEC23 - B.Tech (CSE-H+NH) - All Spec. - 5 - ECEG3052 - Micro Processor & Embedded Systems
2 pages
ECB1 - Tests - Language Test 1A
67% (3)
ECB1 - Tests - Language Test 1A
2 pages
How Men Women: Listening
No ratings yet
How Men Women: Listening
1 page
Summary Agum
No ratings yet
Summary Agum
1 page
Universitas Merdeka Malang: Program D-Iii Bahasa Inggris
No ratings yet
Universitas Merdeka Malang: Program D-Iii Bahasa Inggris
3 pages
Chapter-1 Introduction To NLP
No ratings yet
Chapter-1 Introduction To NLP
12 pages
Assignment of AI Finished
No ratings yet
Assignment of AI Finished
16 pages
HSC Letter and Paragraph Suggestion
100% (1)
HSC Letter and Paragraph Suggestion
3 pages
Communication Modes
No ratings yet
Communication Modes
9 pages
Tagalog Was Declared The Official Language by The First Revolutionary Constitution in The Philippines
No ratings yet
Tagalog Was Declared The Official Language by The First Revolutionary Constitution in The Philippines
4 pages
Concept Mining: Fundamentals and Applications
From Everand
Concept Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet

Unit 1a

Uploaded by

Unit 1a

Uploaded by

NLP

Tokenization: Breaking down the input into individual words or tokens.

They are particularly suitable for well-defined and rule-bound

syntactical analysis is usually done at a sentence-level, where as for morphology

When we’re building dependency trees or processing parts-of-speech — we’re

Analyzing the data which may be structured or unstructured to

Figure: Statistical methods

Figure: Text classification

Figure: Language classification

Figure: Text extraction

• Parts of the sentences:

• Real time analysis:

• Checking for consistency:

Figure: Text analysis

• Web scraping tools.

Figure: Web scraping tools

Figure: Constituency parsing

How it can be done?

“He” can be Ram or Mohan

Figure: Corpus creation

• Multilingual parallel corpora:

Figure: Inline annotations

• Types are the number of distinct words in a corpus.

You might also like