0% found this document useful (0 votes)
29 views26 pages

Massp2023 NLP

The document provides an overview of natural language processing (NLP), covering NLP components, problem types, techniques like classification, vector spaces, probabilistic models, sequence models, attention models, and applications. It also lists NLP courses, books, videos, and repositories for further study.

Uploaded by

Thuần Văn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views26 pages

Massp2023 NLP

The document provides an overview of natural language processing (NLP), covering NLP components, problem types, techniques like classification, vector spaces, probabilistic models, sequence models, attention models, and applications. It also lists NLP courses, books, videos, and repositories for further study.

Uploaded by

Thuần Văn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Natural Language

Processing
Dao Minh Dung
Table of contents
1. Introduction to NLP
2. NLP with Classification and Vector Spaces
3. NLP with Probabilistic Models
4. NLP with Sequence Models
5. NLP with Attention Models
6. NLP applications
7. NLP study resources
1. Introduction to NLP
What is Natural Language Processing
● NLP is a branch of artificial
intelligence that gives machines the
ability to read, understand, and
derive meaning from human
languages.
● Examples of NLP in daily life:
chatbot, virtual assistant,
autocorrect/spell-check, Google
Translate, spam filters, and
sentiment analysis in social media.
NLP components and problem types
2 components: Natural Language Understanding (NLU) and Generation (NLG)

● NLU tasks: machine translation, sentiment analysis, named entity recognition, and question
answering.
● NLG: text summarization, automatic report generation, and chatbot dialogue.

Categorize and explain the types of problems NLP tries to solve:

● Syntax: word arrangement - part-of-speech tagging, sentence parsing, word segmentation


● Semantics: meaning - word sense disambiguation, semantic similarity, semantic role
labeling.
● Discourse: immediate context influences interpretation of a sentence - pronoun resolution.
● Pragmatics: how language is used in social settings to convey intended meaning, and
include tasks like sentiment analysis.
2. NLP with Classification and Vector Spaces
Example 1: Sentiment Analysis with Logistic Regression
Supervised ML & sentiment analysis: features -> train -> predict
Vocabulary & Feature extraction: Vector of dimension V (vocabulary size) 0/1 for
existence -> sparse
Positive and negative counts
● Divide tweet corpus into two classes: positive and negative
● Count each time each word appears in either class
● Feature extraction with frequencies: [1, sum(pos.freq.), sum(neg.freq.)]
Preprocessing: Stemming, Lowercasing, Removing stopwords/punctuations/
handles/URLs(, Lemmatization - Porter algorithm?)
Example 2: Sentiment Analysis with Naive Bayes
● NB assumption: independence and
relative frequency of classes
● Error source: Removing
punctuation, removing words, word
order, adversarial attack
Example 3: Vector space models
3. NLP with Probabilistic Models
Example 1: Hidden Markov Models
Example 2: N-grams
4. NLP with Sequence Models
NLP with Sequence Models
Reference: RNN by Tuan Nguyen (nttuan8.com)
https://drive.google.com/file/d/14IPsM6i6t6GH7zSyivky82DWEJC-SHB6/view?usp
=sharing
5. NLP with Attention Models
NLP with Attention Models
Reference: Seq2seq, Attention, Self attention, Transformer, BERT by nttuan8
https://docs.google.com/presentation/d/1x9aY-LHTZ_H6tZ0AjC_oSMbOFL4FaxR
8/edit#slide=id.p1

Demo: HuggingFace model


HuggingFace Transformers
HuggingFace Transformers is an open-source, state-of-the-art library provides implementations
of many highly important, NLP models. Features:

● Pretrained Models: >10,000 models on different tasks in over 100 languages.


● Interoperability: Compatible with PyTorch and TensorFlow.
● Comprehensive: all types of Transformer models- Llama, GPT-2, RoBERTa,...
● Flexibility: Fine-tuning capabilities and ability to create your own models.
● Pipeline API: Easy-to-use API for performing tasks like sentiment analysis, question
answering, and named entity recognition.
● Datasets: Provides a wide range of curated datasets for different NLP tasks.
● Tokenizers: Efficient and fast tokenization for a variety of transformer models.
6. NLP applications
Applications across industries
● Healthcare: NLP aids in extracting
medical information from patient
records
● E-Commerce: chatbots help in
customer service, while sentiment
analysis can help understand
customer reviews.
● Education: intelligent tutoring
systems, essay scoring, and
plagiarism detection.
Combination with other techniques
● NLP for Code
○ Understanding and generating code,
aiding in automated debugging, and
code reviews.
○ Intelligent coding assistants that provide
recommendations to developers.
● NLP in Computer Vision
○ Systems that can understand and
describe the content of images and
videos.
○ Assistive technologies for the visually
impaired.
Concerns
● Data privacy and security, sensitive
information (emails, confidential reports)
● Bias: unfair or harmful outputs, impacting
certain groups disproportionately.
● Misinterpretation: Struggle with
understanding context, sarcasm, irony,
and cultural nuances
● Ethical Concerns: fake news generation,
privacy invasion
● Language Resources: leaves certain
languages and communities behind.
● Over-reliance on Technology: less human
interaction and reduced language
learning/practice
7. NLP study resources
Courses
https://web.stanford.edu/class/cs224n/
● Probabilistic model & word vectors: Word2Vec -> GLoVe, Gensim,...
● Deep parsing: Universal Dependencies (UD)
● (Additional parts for LLMs)
http://web.stanford.edu/class/cs224u/
https://people.cs.umass.edu/~miyyer/cs685/
https://www.phontron.com/class/anlp2022/
https://github.com/fastai/course-nlp
https://huggingface.co/learn/nlp-course/chapter1/1
Books, videos, and repositories
Book: https://web.stanford.edu/~jurafsky/slp3/
Videos:
● https://www.nlpdemystified.org/course (Video-based)
● NLP Youtube playlist:
https://www.youtube.com/playlist?list=PLNvQn5fLVQdhWMqZWOdBFBZpKg
PdTVGVF
Repos:
● https://github.com/keon/awesome-nlp
● https://github.com/nlp-with-transformers/notebooks
References
References
● Natural Language Processing Specialization
● Natural-Language-Processing-Specialization - Github
● Tuan Nguyen’s slides
● ChatGPT :)))

You might also like