0% found this document useful (0 votes)

76 views66 pages

AI4youngster - 6 - Topic NLP

The document provides an overview of natural language processing (NLP) including its achievements, methods, and trends. It discusses how NLP is used in industry applications like virtual assistants and machine translation. It describes the levels of linguistic knowledge involved in NLP like morphology, syntax, and semantics. Popular NLP methods are explained such as word embeddings, recurrent neural networks, encoder-decoder frameworks, attention mechanisms, and transformer models. Emerging trends in NLP involve pre-trained language models, transfer learning, and combining supervised and unsupervised learning techniques.

Uploaded by

Chi Le Minh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

76 views66 pages

AI4youngster - 6 - Topic NLP

Uploaded by

Chi Le Minh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

TABLE OF CONTENT

1. Some achievements of NLP

2. Overview of NLP

1. Linguistic levels of description

2. Why is NLP difficult?

3. Method in NLP

4. Trend in NLP

5. Conclusion
1. Some achievements of NLP
NLP in Industry
Communication With Machines
Virtual Assistant

● Conversational agents contain:

○ Speech recognition
○ Language analysis
○ Dialogue processing
○ Information retrieval
○ Text to speech
● Google now, Alexa, Siri, Cortana, VAV…
Google Translate & Vietgle Translate
Name entity Recognition

Example of NLP task: Named Entity Recognition (NER):

2. Overview of NLP
What is Natural Language Processing?

● Natural language processing (NLP) is a subfield of artificial intelligence and

computational linguistics. It studies the problems of automated generation and
understanding of natural human languages.
● Natural-language-generation systems convert information from computer databases
into normal-sounding human language. Natural-language-understanding systems
convert samples of human language into more formal representations that are easier
for computer programs to manipulate.
What is Natural Language Processing?

Computers using natural language as input and/or output

language Computer language

Understanding (NLU)

Generation
(NLG)
Natural language processing and computational linguistics

● Natural language processing (NLP) develops methods for solving practical problems
involving language:
○ Automatic speech recognition
○ Machine Translation
○ Sentiment Analysis
○ Information extraction from documents
● Computational linguistics (CL) focused on using technology to support/implement
linguistics:
○ How do we understand language?
○ How do we produce language?
○ How do we learn language?
Level Of Linguistic Knowledge
Morphology

● Morphology studies the structure of words

● Morphological derivation exhibits hierarchical structure
Example: re + vital + ize + ation
● The suffix usually determines the syntactic
category of the derived word
Syntax

● Syntax studies the ways words combine to form phrases and sentences

● Syntactic parsing helps identify who did what to whom, a key step in understanding a
sentence
Semantics and pragmatics

● Semantics studies the meaning of words, phrases and sentences

Ex: I have a dinner in/for an hour
● Pragmatics studies how we use language to do things in the world
Ex: Quy Nhơn thật là đẹp
Natural Language Processing

Applications Core Technologies (NLP sub-problems)

● Machine Translation ● Language modeling
● Information Retrieval ● Part-of-speech tagging
● Question Answering ● Syntactic parsing
● Dialogue Systems ● Named-entity recognition
● Information Extraction ● Word sense disambiguation
● Summarization ● Semantic role labeling
● Sentiment Analysis …
…

NLP lies at the intersection of computational linguistics and machine learning.

Why is NLP difficult?

● Ambiguity
● Sparsity
● Abstractly, most NLP applications can be viewed as prediction problems
 Should be able to solve them with Machine Learning
● The label set is often the set of all possible sentences
○ Infinite (or at least astronomically large)
● Training data for supervised learning is often not available
 Unsupervised/semi-supervised techniques for training from available data
● Algorithmic challenges
○ Vocabulary can be large (e.g., 50K words)
○ Data sets are often large (GB or TB)
Ambiguity

● “At last, a computer that understands you like your mother”

● It understands you as well as your mother understands you
● It understands (that) you like your mother
● It understands you as well as it understands your mother
Sparsity
Sparsity

Order words by frequency. What is the frequency of n-th ranked word?

Sparsity

● Regardless of how large our corpus is,

there will be a lot of infrequent words

● This means we need to find clever ways

to estimate probabilities for things we
have rarely or never seen
Feature Engineering and Deep Learning

● Up until 2014 most state-of-the-art NLP systems were based on feature engineering +
shallow machine learning models (e.g., SVMs, CRFs)
● Designing the features of a winning NLP system requires a lot of domain-specific
knowledge.
● Deep Learning systems on the other hand rely on neural networks to automatically
learn good representations.
Feature Engineering and Deep Learning

● Deep Learning yields state-of-the-art results in most NLP tasks.

● Large amounts of training data and faster multicore GPU machines are key in the
success of deep learning.
● Neural networks and word embedding play a key role in modern NLP models.
Deep Learning and Linguistic Concepts

● If deep learning models can learn representations automatically, are linguistic

concepts still useful (e.g., syntax, morphology)?
● Some proponents of deep-learning argue that such inferred, manually designed,
linguistic properties are not needed, and that the neural network will learn these
intermediate representations (or equivalent, or better ones) on its own [Goldberg, 2016]
● Goldberg believes many of these linguistic concepts can indeed be inferred by the
network on its own if given enough data.
● However, for many other cases we do not have enough training data available for the
task we care about, and in these cases providing the network with the more explicit
general concepts can be very valuable.
History
3. Method in NLP
NLP representation and architecture
Word2vec
Recurrent Neural Networks

● Neural networks that exploit the temporal nature of language!

● Also allow variable-length inputs
Sequence processing is particularly useful for some tasks!

● To capture this phenomenon

computationally, we can use recurrent
neural networks to perform sequence
processing.
● Syntactic parsing
● Part of speech tagging
● Language modeling
Issue with RNN

● In particular, RNNs may struggle with managing context.

● There’s also the issue of “vanishing gradients”
○ When small derivatives are repeatedly multiplied together, the products can become extremely small
Long Short-Term Memory Networks (LSTMs)

● Remove information no longer needed from the context, and add information likely to
be needed later
● Do this by:
○ Adding an explicit context layer to the architecture
○ This layer controls the flow of information into and out of network layers using specialized neural units called gates
Encoder-decoder Framework

K-dimensional vector of context

Condition of word generated in translation

“Sequence to Sequence Learning with Neural Networks”, 2014

Encoder-decoder Framework
The entire input is summed
in this one single vector

In the seq2seq model, the decoder's

state depends only on the previous
state and the previous output
Neural machine translation
Sequence-to-sequence: the bottleneck problem
Attention

● Attention provides a solution to the bottleneck problem.

● Main idea: on each step of the decoder, focus on a particular part of the source
sequence
Seq2Seq with attention
Seq2Seq with attention
Seq2Seq with attention
Seq2Seq with attention
Seq2Seq with attention
Seq2Seq with attention
Seq2Seq with attention
Seq2Seq with attention
Seq2Seq with attention
Seq2Seq with attention
Seq2Seq with attention
Seq2Seq with attention
4. Trend in NLP
Neural Machine Translation by Jointly Learn to Align and Translate
Neural Machine Translation by Jointly Learn to Align and Translate

Source: Bahdanau et al., ICLR 2015, https://arxiv.org/abs/1409.0473

Attention (Bahdanau at al., 14164 citations)

● Attention significantly improves NMT performance

○ It’s very useful to allow decoder to focus on certain parts of the source
● Attention solves the bottleneck problem
○ Attention allows decoder to look directly at source; bypass bottleneck
● Attention provides some interpretability
○ By inspecting attention distribution, we can see what the decoder was focusing on
○ We get alignment for free!
○ This is cool because we never explicitly trained an alignment system
○ The network just learned alignment by itself
Seq2seq is very flexible!

● Sequence-to-sequence is useful for more than just MT

● Many NLP tasks can be phrased as sequence-to-sequence:

○ Summarization (long text → short text)
○ Dialogue (previous utterances → next utterance)
○ Parsing (input text → output parse as sequence)
○ Code generation (natural language → Python code)
Self Attention
Transformer model
Pre-trained language model (LM)
Trends in NLP for Technology

● Seq2Seq (Transformer)
● Transfer Learning. Fine-tuning a pre-trained language model (LM) has become the de
facto standard for doing transfer learning in natural language processing
● Combining Supervised & Unsupervised Methods
● Self supervised learning
● Reinforcement Learning
● NLP model with interpretability
● Combine Feature Engineering and Knowledge base
Trends in NLP for applications

● Sentiment Analysis On Social Media

● Multilingual NLP
● Automation In NLP
● Market Intelligence Monitoring
● NLP is already in use in financial marketing, for which it helps to determine the market
situation, employment changes, tender-related information, extracting information
from large repositories, etc.
Key Applications in 2021

● Computational linguistics (i.e., modeling the human capacity for language

computationally)
● Information extraction, especially “open” IE
● Question answering, chatbot (e.g., Watson, Siri, Google now)
● Machine translation
● Summarization
● Opinion and sentiment analysis
● Social media analysis
● Fake News Recognition
5. Conclusion
Challenges

● Localization in language (differ for images)

● Low-resource language
● Domain-specific language.
● The breakthroughs in NLP in the last 2 years are mainly driven by BERT-like
architectures, namely Transformers.
● However, these methods require significant computational resources (memory, time).
Sometimes, even a simple count vectorization does better than a complex BERT
approach.
● Transformers are only understood well enough by a handful of researchers. It is black
box.
Journal and Conference in NLP

http://anthology.aclweb.org/
Conclusion

● Computational linguistics and natural language processing:

○ were originally inspired by linguistics, but now they are almost applications of machine learning and statistics
● Methods in NLP based on typically deep learning
● Trends in NLP (technique and applications)
THANKS FOR LISTENING!

TTL 2 Chapter 2 Developing Project and Problem Based Instructional Plan Rev. 2021
100% (5)
TTL 2 Chapter 2 Developing Project and Problem Based Instructional Plan Rev. 2021
53 pages
Episode 6: Deductive and Inductive Methods of Teaching: My Learning Episode Overview
85% (13)
Episode 6: Deductive and Inductive Methods of Teaching: My Learning Episode Overview
10 pages
Communication Styles Inventory PDF
75% (4)
Communication Styles Inventory PDF
2 pages
Major and Minor Details
100% (7)
Major and Minor Details
16 pages
20 Toxic Beliefs To Move Beyond and Why
No ratings yet
20 Toxic Beliefs To Move Beyond and Why
11 pages
THIRD
No ratings yet
THIRD
4 pages
Redeveloped Division Initiated Self-Learning Module: Department of Education - Division of Palawan
No ratings yet
Redeveloped Division Initiated Self-Learning Module: Department of Education - Division of Palawan
20 pages
Sample Interpretive Report PDF
100% (1)
Sample Interpretive Report PDF
28 pages
Follow Me On For More:: Steve Nouri
No ratings yet
Follow Me On For More:: Steve Nouri
39 pages
Massp2023 NLP
No ratings yet
Massp2023 NLP
26 pages
Robbins Mgmt13e Inppt 02
No ratings yet
Robbins Mgmt13e Inppt 02
50 pages
NLP Survey - Presentation
No ratings yet
NLP Survey - Presentation
31 pages
DLL Week 6 - English 7
No ratings yet
DLL Week 6 - English 7
5 pages
Part01 Overview
No ratings yet
Part01 Overview
31 pages
Psych Testing Notes
No ratings yet
Psych Testing Notes
3 pages
A.acuna DLP Fabm1 Week 1 Day 4 June 9, 2017
No ratings yet
A.acuna DLP Fabm1 Week 1 Day 4 June 9, 2017
3 pages
Chapter 16 - Smart Plan Better Future
No ratings yet
Chapter 16 - Smart Plan Better Future
9 pages
Factors Influencing Perception
100% (1)
Factors Influencing Perception
2 pages
Detailed Lesson Plan (DLP) Format: Learning Competency/ies: Use The Correct Sounds of English Code: EN8OL-Ia-3.11
No ratings yet
Detailed Lesson Plan (DLP) Format: Learning Competency/ies: Use The Correct Sounds of English Code: EN8OL-Ia-3.11
4 pages
Shinzen Young - Focus On Rest-Summary
No ratings yet
Shinzen Young - Focus On Rest-Summary
6 pages
Ira Lestari
No ratings yet
Ira Lestari
109 pages
Introduction To NLP - Part 1
No ratings yet
Introduction To NLP - Part 1
23 pages
Train Your Brain
No ratings yet
Train Your Brain
6 pages
Med 12 Online Output
No ratings yet
Med 12 Online Output
13 pages
Paper Psycholinguistics - Group 9 (6b)
No ratings yet
Paper Psycholinguistics - Group 9 (6b)
16 pages
Second Language Acquisition and Foreign Language Teaching
No ratings yet
Second Language Acquisition and Foreign Language Teaching
5 pages
Introducing Natural Language Processing
No ratings yet
Introducing Natural Language Processing
13 pages
Natural Language Processing
No ratings yet
Natural Language Processing
21 pages
1 - Historical Overview
No ratings yet
1 - Historical Overview
8 pages
Psych Notes Chapter 5-6
No ratings yet
Psych Notes Chapter 5-6
8 pages
Understanding The Proccess of Communicatio1
No ratings yet
Understanding The Proccess of Communicatio1
2 pages
1 s2.0 S0925231221010997 Main
No ratings yet
1 s2.0 S0925231221010997 Main
14 pages
Introduction To Natural Language Processing: by Rohit Sharma
No ratings yet
Introduction To Natural Language Processing: by Rohit Sharma
8 pages
English Lesson Plan Year 1 Cefr 24 Oktober 2019
No ratings yet
English Lesson Plan Year 1 Cefr 24 Oktober 2019
1 page
Basic Concepts of Perception: 1.sensation 2.absolute Threshold 3.differential Threshold 4.subliminal Perception
No ratings yet
Basic Concepts of Perception: 1.sensation 2.absolute Threshold 3.differential Threshold 4.subliminal Perception
31 pages
The Neuroanatomic and Neurophysiological Language 2014
No ratings yet
The Neuroanatomic and Neurophysiological Language 2014
8 pages
Math 18 - Principle and Strategy of Teaching Mathematics: Course Learning Packet (CLP)
No ratings yet
Math 18 - Principle and Strategy of Teaching Mathematics: Course Learning Packet (CLP)
12 pages
Natural Language Processing
No ratings yet
Natural Language Processing
5 pages
Natural Language Processing
No ratings yet
Natural Language Processing
30 pages
DS Exp2 20101A0021 Satyam Mishra
No ratings yet
DS Exp2 20101A0021 Satyam Mishra
5 pages
Lesson Plan: "The Family Is One of Nature's Masterpieces" George Santayana
No ratings yet
Lesson Plan: "The Family Is One of Nature's Masterpieces" George Santayana
4 pages
DS Exp2 Rugved
No ratings yet
DS Exp2 Rugved
5 pages
NLP Short Que Ans
No ratings yet
NLP Short Que Ans
21 pages
What Is Natural Language Processing?
No ratings yet
What Is Natural Language Processing?
5 pages
Social Emotional Checklist
No ratings yet
Social Emotional Checklist
1 page
Deep Learning For Natural Language Processing: July 2021
No ratings yet
Deep Learning For Natural Language Processing: July 2021
10 pages
Natural Language Processing 101
No ratings yet
Natural Language Processing 101
26 pages
Natural Language Processing
No ratings yet
Natural Language Processing
87 pages
ML For NLP-LO3
No ratings yet
ML For NLP-LO3
61 pages
Course Code HUM1012 Logic and Language Structure BL202425040 0921 D21+D22
No ratings yet
Course Code HUM1012 Logic and Language Structure BL202425040 0921 D21+D22
55 pages
Topic 2: Introduction To Natural Language Processing (NLP)
No ratings yet
Topic 2: Introduction To Natural Language Processing (NLP)
16 pages
Natural Language Processing (NLP) : April 2024
No ratings yet
Natural Language Processing (NLP) : April 2024
88 pages
Chapter 6.
No ratings yet
Chapter 6.
31 pages
NLP Lecture 1
No ratings yet
NLP Lecture 1
3 pages
NLP Module 1
No ratings yet
NLP Module 1
31 pages
Natural Language Processing
100% (1)
Natural Language Processing
3 pages
NLP Handwritten Notes
No ratings yet
NLP Handwritten Notes
26 pages
The 7 NLP Techniques That Will Change How You Communicate in The Future (Part I)
No ratings yet
The 7 NLP Techniques That Will Change How You Communicate in The Future (Part I)
19 pages
NLP PPT1
No ratings yet
NLP PPT1
29 pages
NLP LectureNotes UNIT 1
No ratings yet
NLP LectureNotes UNIT 1
55 pages
Natural Language Processing
No ratings yet
Natural Language Processing
3 pages
Hocken Maier 25
No ratings yet
Hocken Maier 25
46 pages
Introduction To Data Science - Week 7 - LAQ's
No ratings yet
Introduction To Data Science - Week 7 - LAQ's
4 pages
Unit 1 and 2
No ratings yet
Unit 1 and 2
5 pages
Natural Language Processing
No ratings yet
Natural Language Processing
43 pages
Natural Language Processing: John Doe CEO
No ratings yet
Natural Language Processing: John Doe CEO
16 pages
NLP Presentation
No ratings yet
NLP Presentation
15 pages
Introduction To NLP - First - Week - Lecture - 2st
No ratings yet
Introduction To NLP - First - Week - Lecture - 2st
4 pages
Deep Learning Lecture 28 April
No ratings yet
Deep Learning Lecture 28 April
4 pages
Chapter - 6 Communicating, Perceiving, and Acting
No ratings yet
Chapter - 6 Communicating, Perceiving, and Acting
30 pages
Sha 10
No ratings yet
Sha 10
6 pages
1 NLP
No ratings yet
1 NLP
26 pages
Introduction To NLP - First - Week - Lecture - 1st
No ratings yet
Introduction To NLP - First - Week - Lecture - 1st
6 pages
Eco 36
No ratings yet
Eco 36
6 pages
NLP AI Detailed Presentation
No ratings yet
NLP AI Detailed Presentation
18 pages
Unit 4
No ratings yet
Unit 4
39 pages
Natural Language Processing
No ratings yet
Natural Language Processing
5 pages
NLP Chapter - 1 Sheet
No ratings yet
NLP Chapter - 1 Sheet
6 pages
Unit-I NLP
No ratings yet
Unit-I NLP
37 pages
Slide
No ratings yet
Slide
28 pages
NLP Sheets
No ratings yet
NLP Sheets
23 pages
01 Introduction To Natural Language Processing
No ratings yet
01 Introduction To Natural Language Processing
42 pages
BE02000041 Funda of AI Unit 2 NLP
No ratings yet
BE02000041 Funda of AI Unit 2 NLP
16 pages
Natural Language Processing - Bridging The Gap Between Humans and Machines
No ratings yet
Natural Language Processing - Bridging The Gap Between Humans and Machines
6 pages
Complete NLP Guide - From Fundamentals To Deep Learning With TensorFlow
No ratings yet
Complete NLP Guide - From Fundamentals To Deep Learning With TensorFlow
13 pages
Unit I
No ratings yet
Unit I
36 pages
Deep Learning Paper1
No ratings yet
Deep Learning Paper1
16 pages
ChatBot Unit1
No ratings yet
ChatBot Unit1
35 pages
Unit 1
No ratings yet
Unit 1
99 pages
Mastering Natural Language Processing with Python and NLTK
From Everand
Mastering Natural Language Processing with Python and NLTK
Pedro Martins
No ratings yet