0% found this document useful (0 votes)

33 views21 pages

NLP

Natural Language Processing (NLP) is a field of computer science focused on enabling machines to understand and generate human language, utilizing various techniques such as Speech to Text. The document outlines the stages of a comprehensive NLP system, the historical development of NLP, and key concepts including text corpus, tokenization, and applications of NLP. It highlights the transition from traditional rule-based systems to modern machine learning and deep learning approaches in processing natural language.

Uploaded by

mukherjeenandan917

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views21 pages

NLP

Uploaded by

mukherjeenandan917

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

• Natural language is a means for us to express our thoughts and ideas.

• Language is a mutually agreed upon set of protocols involving words/sounds that

we use to communicate with each other.
• In this era of digitization and computation, we are constantly interacting with
machines around us through various means, such as voice commands and typing
instructions in the form of words.
• NLP can be defined as a field of computer science that is concerned with enabling
computer algorithms to understand, analyze and generate natural languages.
• For example, interacting with Siri or Alexa at some point.
• Siri and Alexa use techniques such as Speech to Text with the help of a search
engine to do the magic.
• Speech to Text is an application of NLP.

2
Stages in a Comprehensive NLP System

Tokenization
Morphological Analysis
Syntactic Analysis
Semantic Analysis (lexical and compositional)
Pragmatics and Discourse Analysis
Knowledge-Based Reasoning
Text generation
• NLP works at different levels, which means that machine process and understand
natural language at different levels.
• These levels are :
• Morphological level: This level deals with understanding word structure and word
information.
• Lexical level: This level deals with understanding the part of speech of the word.
• Syntactic level: This level deals with understanding the syntactic analysis of a sentence, or
parsing a sentence.
• Semantic level: This level deals with understanding the actual meaning of a sentence.
• Discourse level: This level deals with understanding the meaning of a sentence beyond just
the sentence level, that is, considering the context.
• Pragmatic level: This level deals with using real-world knowledge to understand the
sentence.

5
History of NLP

• NLP is a field that has emerged from various other fields such as AI, linguistics
and Data science.
• As stated above the idea had emerged from the need for Machine Translation in
the 1940s.
• Then the original language was English and Russian.
• But the use of other words such as Chinese also came into existence in the initial
period of the 1960s.
• Then a lousy era came for MT/NLP during 1966, this fact was supported by a
report of ALPAC, according to which MT/NLP almost died because the research
in this area did not have the pace at that time.
• This condition became better again in the 1980s when the product related to
MT/NLP started providing some results to customers.

8
• After reaching in dying state in the 1960s, the NLP/MT got a new life when the
idea and need of Artificial Intelligence emerged. LUNAR is developed in 1978 by
W.A woods; it could analyze, compare and evaluate the chemical data on a lunar
rock and soil composition that was accumulating as a result of Apollo moon
missions and can answer the related question.
• In the 1980s the area of computational grammar became a very active field of
research which was linked with the science of reasoning for meaning and
considering the user ‘s beliefs and intentions.
• In the period of 1990s, the pace of growth of NLP/MT increased. Grammars, tools
and Practical resources related to NLP/MT became available with the parsers.
• The research on the core and futuristic topics such as word sense disambiguation
and statistically colored NLP, the work on the lexicon got a direction of research.

9
• This quest of the emergence of NLP was joined by other essential topics such as
statistical language processing, Information Extraction and automatic
summarising.
• The discussion on the history of NLP cannot be considered complete without the
mention of the ELIZA, a chatbot program which was developed from 1964 to
1966 at the Artificial Intelligence Laboratory of MIT.
• It was created by Joseph Weizenbaum.
• It was a program which was based on script named as DOCTOR which was
arranged to Rogerian Psychotherapist and used rules, to response the questions of
the users which were psychometric-based.
• It was one of the chatbots which were capable of taking the Turing test at that
time.

10
• Previously, a traditional rule-based system was used for computations, in which
you had to explicitly write hardcoded rules.
• Today, computations on natural language are being done using ML and DL
techniques.
• Let’s say we have to extract the names of some politicians from a set of policial
news articles. So, if we want to apply rule-based grammar, we must manually craft
certain rules based on human understanding of language.
• As we can see, using a rule-based system like this would not yield very accurate
results.
• One major disadvantage is that the same rule cannot be applicable in all cases,
given the complex and nuanced nature of most language.

11
Basic Concepts

• Text corpus or corpora

• Paragraph
• Sentences
• Phrases and words
• N-grams
• Bag-of-words

12
Text Corpus or corpora

• The language data that all NLP tasks depend upon is called the text corpus or
simply corpus.
• A corpus is a large set of text data that can be in one of the languages like English,
French, and so on.
• The corpus can consist of a single document or a bunch of documents.
• The source of the text corpus can be social network sites like Twitter, blog sites,
open discussion forums like Stack Overflow, books, and several others.
• In some of the tasks like machine translation, we would require a multilingual
corpus.
• For example we might need both the English and French translations of the same
document content for developing a machine translation model.
13
• For speech tasks, we would also need human voice recordings and the
corresponding transcribed corpus.
• For many of the NLP task, the corpus is split into chunks for further analysis.
• These chunks could be at the paragraph, sentence, or word level.

14
Paragraph

• A paragraph is the largest unit of text handled by an NLP task.

• Paragraph level boundaries by itself may not be much use unless broken down into
sentences.
• Though sometimes the paragraph may be considered as context boundaries.
• Tokenizers that can split a document into paragraphs are available in some of the
Python libraries.

15
Sentences

• Sentences are the next level of lexical unit of language data.

• A sentence encapsulates a complete meaning or thought and context.
• It is usually extracted from a paragraph based on boundaries determined by
punctuations like period.
• The sentence may also convey opinion or sentiment expressed in it.
• In general, sentences consists of parts of speech (POS) entities like nouns, verbs,
adjectives, and so on.
• There are tokenizers available to split paragraphs to sentences based on
punctuations.

16
Phrases and words

• Phrases are a group of consecutive words within a sentence that can convey a
specific meaning.
• For example, in the sentence Tomorrow is going to be a rainy day the part going to
be a rainy day expresses a specific thought.
• Some of the NLP tasks extract key phrases from sentences for search and retrieval
applications.
• The next smallest unit of text is the word.
• The common tokenizers split sentences into text based on punctuations like spaces
and comma.
• One of the problems with NLP is ambiguity in the meaning of same words used in
different context.
17
N-gram

• A sequence of characters or words forms an N-gram.

• For example, character unigram consists of a single character.
• A bigram consists of a sequence of two characters and so on.
• Similarly word N-grams consists of a sequence of n words.
• In NLP, N-grams are used as features for tasks like text classification.

18
Bag-of-words

• Bag-of-words in contrast to N-grams does not consider word order or sequence.

• It captures the word occurrence frequencies in the text corpus.
• Bag-of-words is also used as features in tasks like sentiment analysis and topic
identification.

19
Applications

• Analyzing sentiment
• Recognizing named entities
• Linking entities
• Translating text
• Natural language interfaces
• Semantic Role Labeling
• Relation extraction
• SQL query generation, or semantic parsing
• Machine Comprehension
• Textual entailment

20
• Coreference resolution
• Searching
• Question answering and chatbots
• Converting text to voice
• Converting voice to text
• Speaker identification
• Spoken dialog systems
• Other applications

NLP Unit 1 and 2
No ratings yet
NLP Unit 1 and 2
106 pages
NLP Important Question and Answers Module Wise
No ratings yet
NLP Important Question and Answers Module Wise
101 pages
Natural Language Processing
No ratings yet
Natural Language Processing
30 pages
Natural Language Processing With Python A Comprehensive Guide To NLP in The Age of AI For 2024 (Hayden Van Der Post) (Z-Library)
No ratings yet
Natural Language Processing With Python A Comprehensive Guide To NLP in The Age of AI For 2024 (Hayden Van Der Post) (Z-Library)
315 pages
NLP Module 1
No ratings yet
NLP Module 1
124 pages
NLP MODULE 1 Chapter1 &2
No ratings yet
NLP MODULE 1 Chapter1 &2
83 pages
Natural Language Processing Inside Pages 2
No ratings yet
Natural Language Processing Inside Pages 2
159 pages
Unit I
No ratings yet
Unit I
36 pages
NLP Unit 1 To 5
No ratings yet
NLP Unit 1 To 5
91 pages
NLP Notes
No ratings yet
NLP Notes
73 pages
Unit-I NLP
No ratings yet
Unit-I NLP
37 pages
NLP Module1-4
No ratings yet
NLP Module1-4
100 pages
NLP 1
No ratings yet
NLP 1
37 pages
Natural Language Processing New
No ratings yet
Natural Language Processing New
25 pages
NLP Unit I
No ratings yet
NLP Unit I
30 pages
NLP Unit 1 1
No ratings yet
NLP Unit 1 1
67 pages
NLP Notes2
No ratings yet
NLP Notes2
27 pages
Test On Stylistics
100% (4)
Test On Stylistics
42 pages
NLP Chap1
No ratings yet
NLP Chap1
50 pages
Natural Language Processing Tools and Approaches
No ratings yet
Natural Language Processing Tools and Approaches
106 pages
Natural Language Processing
No ratings yet
Natural Language Processing
28 pages
Lec1-UNIT5 - MORE SIMPLER
No ratings yet
Lec1-UNIT5 - MORE SIMPLER
28 pages
AI Chapter 6 and 7 New
No ratings yet
AI Chapter 6 and 7 New
48 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
63 pages
Unit-I NLP
No ratings yet
Unit-I NLP
15 pages
2 Introduction
No ratings yet
2 Introduction
15 pages
The Flight of A Relativistic Charge in Matter. Insights, Calculations and Practical Applications of Classical Electromagnetism
100% (3)
The Flight of A Relativistic Charge in Matter. Insights, Calculations and Practical Applications of Classical Electromagnetism
134 pages
Natural Language Processing State of The Art Curre
No ratings yet
Natural Language Processing State of The Art Curre
33 pages
NLP Introduction Week3
No ratings yet
NLP Introduction Week3
28 pages
Brocode OP
No ratings yet
Brocode OP
133 pages
Unit I
No ratings yet
Unit I
28 pages
Finnish Learning Book
100% (7)
Finnish Learning Book
12 pages
Unit 1
No ratings yet
Unit 1
18 pages
An In-Depth Exploration of Natural Language Processing: Evolution, Applications, and Future Directions
100% (8)
An In-Depth Exploration of Natural Language Processing: Evolution, Applications, and Future Directions
5 pages
NLP Unit1
No ratings yet
NLP Unit1
51 pages
Lecture 1
No ratings yet
Lecture 1
16 pages
Chapter 6
No ratings yet
Chapter 6
21 pages
NLP UNIT 1 Part 1
No ratings yet
NLP UNIT 1 Part 1
24 pages
2-Lecture Two - (Back Ground of NLP)
No ratings yet
2-Lecture Two - (Back Ground of NLP)
65 pages
Error Analysis On Students Descriptive Writing
No ratings yet
Error Analysis On Students Descriptive Writing
13 pages
Artificial Intelligence: Natural Language Processing
No ratings yet
Artificial Intelligence: Natural Language Processing
41 pages
Natural Language Processing
No ratings yet
Natural Language Processing
14 pages
Natural Language Processing State of The Art, Current Trends and Challenges - s11042-022-13428-4 PDF
No ratings yet
Natural Language Processing State of The Art, Current Trends and Challenges - s11042-022-13428-4 PDF
32 pages
1 NLP
No ratings yet
1 NLP
26 pages
Introducing Natural Language Processing
No ratings yet
Introducing Natural Language Processing
13 pages
SITA3012 NLP Unit 1
No ratings yet
SITA3012 NLP Unit 1
33 pages
NLP 833
No ratings yet
NLP 833
26 pages
Nlp-Unit-I Final
No ratings yet
Nlp-Unit-I Final
31 pages
Simple Present, Affirmative and Negative
No ratings yet
Simple Present, Affirmative and Negative
17 pages
Put The Verbs in Brackets Into The Present Perfect or Simple Past Tense
No ratings yet
Put The Verbs in Brackets Into The Present Perfect or Simple Past Tense
2 pages
Underline The Noun/s and Circle The Pronoun in Each Sentence
No ratings yet
Underline The Noun/s and Circle The Pronoun in Each Sentence
21 pages
Human Communication, Either Spoken or Written, Consisting of The Use of Words in A Structured and Conventional Way". Language Makes Us Unique From Other Living Beings and I Would
No ratings yet
Human Communication, Either Spoken or Written, Consisting of The Use of Words in A Structured and Conventional Way". Language Makes Us Unique From Other Living Beings and I Would
7 pages
Natural Language Processing
No ratings yet
Natural Language Processing
24 pages
Class 1 - NLP
No ratings yet
Class 1 - NLP
28 pages
ML Module A7707 - Part1
No ratings yet
ML Module A7707 - Part1
48 pages
What Is Computational Linguistics
No ratings yet
What Is Computational Linguistics
14 pages
Natural Language Processing
No ratings yet
Natural Language Processing
30 pages
Natural Language Processing
No ratings yet
Natural Language Processing
21 pages
1 Natural Language Processing-Intro
No ratings yet
1 Natural Language Processing-Intro
16 pages
Natural Language Processing
No ratings yet
Natural Language Processing
5 pages
Natural Language Processing
No ratings yet
Natural Language Processing
4 pages
Oral Presentation Rubric: Final Project Daily Routine 5 Grade
100% (1)
Oral Presentation Rubric: Final Project Daily Routine 5 Grade
2 pages
CL Unit 1
No ratings yet
CL Unit 1
11 pages
Advances in Natural Language Processing
No ratings yet
Advances in Natural Language Processing
7 pages
Unit1 A
No ratings yet
Unit1 A
8 pages
Natural Language Processing: Bachelor of Technology Computer Science and Engineering
No ratings yet
Natural Language Processing: Bachelor of Technology Computer Science and Engineering
7 pages
Magic Lens Condensed Notes Handout 2
0% (1)
Magic Lens Condensed Notes Handout 2
2 pages
Just Enough Spanish - Unidad 1 Sustantivos
100% (1)
Just Enough Spanish - Unidad 1 Sustantivos
32 pages
Homework and Practice Workbook Answers 6th Grade
100% (1)
Homework and Practice Workbook Answers 6th Grade
6 pages
Unit 1 - Verb Complementation
No ratings yet
Unit 1 - Verb Complementation
7 pages
Jurafsky Martin Edition 3 Draft Chapter 10
No ratings yet
Jurafsky Martin Edition 3 Draft Chapter 10
29 pages
Grammer and Writing
No ratings yet
Grammer and Writing
76 pages
Form 2
No ratings yet
Form 2
2 pages
Proofread Symbols
No ratings yet
Proofread Symbols
4 pages
Relativistic Cosmology. I - Amalkumar Raychaudhuri
No ratings yet
Relativistic Cosmology. I - Amalkumar Raychaudhuri
4 pages
02 - PAST TENSES Ok
No ratings yet
02 - PAST TENSES Ok
11 pages
English 9th No 3
No ratings yet
English 9th No 3
3 pages
MODULE 4 Unit 7 Nastavno (Past Simple Present Perfect)
No ratings yet
MODULE 4 Unit 7 Nastavno (Past Simple Present Perfect)
2 pages
Articles and Quantifiers: Level 4-Unit 9 Nature
No ratings yet
Articles and Quantifiers: Level 4-Unit 9 Nature
14 pages
Hint of Dark Matter-Dark Energy Interaction in The Current Cosmological Data
No ratings yet
Hint of Dark Matter-Dark Energy Interaction in The Current Cosmological Data
12 pages
Street Fight MMA Training Guide
No ratings yet
Street Fight MMA Training Guide
3 pages
8th Grade - The Second Written Test - Group A-April 2023
No ratings yet
8th Grade - The Second Written Test - Group A-April 2023
3 pages
Environmental Science
No ratings yet
Environmental Science
6 pages
English Project (Answers)
No ratings yet
English Project (Answers)
30 pages
Bursa Uludağ University School of Foreign Languages Academic Year 2022-2023 / Spring Term Module 2 - Quarter 4 B1 Plus Level Digital Syllabus
No ratings yet
Bursa Uludağ University School of Foreign Languages Academic Year 2022-2023 / Spring Term Module 2 - Quarter 4 B1 Plus Level Digital Syllabus
7 pages
Higher Lie Idempotents
No ratings yet
Higher Lie Idempotents
16 pages
Present Tenses - Revision
No ratings yet
Present Tenses - Revision
2 pages
A Heterotic Hermitian-Yang-Mills Equivalence: Mathematical Physics
No ratings yet
A Heterotic Hermitian-Yang-Mills Equivalence: Mathematical Physics
33 pages
Universal High-Fidelity Quantum Gates For Spin Qubits in Diamond
No ratings yet
Universal High-Fidelity Quantum Gates For Spin Qubits in Diamond
26 pages
Concord
No ratings yet
Concord
5 pages
Participles
No ratings yet
Participles
7 pages
Common Mistakes
No ratings yet
Common Mistakes
1 page
Narration 1
No ratings yet
Narration 1
4 pages
Empirical Learning of Dynamical Decoupling On Quantum Processors
No ratings yet
Empirical Learning of Dynamical Decoupling On Quantum Processors
17 pages
Steven Weinberg and Higgs Physics: A D J I I
No ratings yet
Steven Weinberg and Higgs Physics: A D J I I
14 pages
Adobe Scan Sep 28, 2020
No ratings yet
Adobe Scan Sep 28, 2020
6 pages
Examen Quimestral Quinto Basica Alumnos
No ratings yet
Examen Quimestral Quinto Basica Alumnos
4 pages
Ozone Layer Depletion - Air Pollution and Control
No ratings yet
Ozone Layer Depletion - Air Pollution and Control
7 pages
Halocarbons Air Pollution
No ratings yet
Halocarbons Air Pollution
4 pages
Document 15
No ratings yet
Document 15
3 pages
The ADM Decomposition
No ratings yet
The ADM Decomposition
4 pages
Adverbials
No ratings yet
Adverbials
2 pages
50 Great Philosophy Books
No ratings yet
50 Great Philosophy Books
3 pages
Mastering Natural Language Processing with Python and NLTK
From Everand
Mastering Natural Language Processing with Python and NLTK
Pedro Martins
No ratings yet
Natural Language Processing
From Everand
Natural Language Processing
Ajit Singh
No ratings yet

NLP

Uploaded by

NLP

Uploaded by

• Natural language is a means for us to express our thoughts and ideas.

• Language is a mutually agreed upon set of protocols involving words/sounds that

• Text corpus or corpora

• A paragraph is the largest unit of text handled by an NLP task.

• Sentences are the next level of lexical unit of language data.

• A sequence of characters or words forms an N-gram.

• Bag-of-words in contrast to N-grams does not consider word order or sequence.

You might also like