0% found this document useful (0 votes)

9 views

NLP Text Classification Week4

NLP Machine Learning

Uploaded by

vhawsbd

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

NLP Text Classification Week4

NLP Machine Learning

Uploaded by

vhawsbd

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 26

Text Classification

1. Automatic or semi-
automatic processing of
human language
Natural 2. Can be used for various
Language applications like

Processing a. Sentiment Analysis

b. Intent Classification

c. Topic Labeling
General Process
Pre-process to Feed the data to
desired text format the model Output the class

Feature Predicti
Data Model Output
s on

Transform the text Set prediction

to vectors criteria once the
(numbers) model converge
Dataset / Text Corpus
- Dictionary or vocabulary which is
used to train the model

● Either tagged (for supervised learning) or

untagged (for unsupervised).

● Size depends on the algorithm used. Should be

pre-processed to remove unwanted characters,
to convert to wanted format, etc.
Dataset / Text Corpus
- Open-source dataset samples

● Amazon Reviews

● NYSK Dataset (News Articles

● Enrol Email Dataset

● Ling Spam Dataset

Feature Extraction
- Transforms texts to numbers (vector
space model)
- Choices:
● One-hot encoding

● Bag-of-words + TF*IDF

● Word2vec
One-hot encoding
- Creates a binary encoding of words.
1 is encoded on the index of the
word in the corpus
Bag-of-words
- Takes the word count of the target
word in the corpus as the feature
TF*IDF
- Term Frequency * Inverse Document
Frequency
● Frequently occurring words are typically not
important / has less weight (stopwords such as
“is, are, the, etc.”)

● Weights are assigned per word.

TF*IDF
- Term Frequency * Inverse Document
Frequency
BOW + TF*IDF
BOW + TF*IDF
word2vec
Uses the weights of the hidden layer
of a neural network as features of
the words
● Can predict a context or a word based on the
nearby words in the corpus

● Uses continuous bag-of-words or skip-gram

model + 1-1-1 neural network.
word2vec
- Gives better semantic/syntactic
relationships of words through
vectors
Schedule
Machine Learning Model
- A classifier algorithm that transforms
an input to the desired class

● Naive Bayes

● K-nearest neighbors

● Multilayer Perceptron

● Recurrent Neural Network + Long short-term

memory
Naive Bayes
- Probabilistic model that relies on
word count
● Uses bag of words as features

● Assumes that the position of words doesn't

matter and words are independent of each
other
Naive Bayes
- Probabilistic model that relies on
word count
K-Nearest Neighbors
- Classifies the class based on the
nearest distance from a known class
Multilayer Perceptron
- A feed-forward neural network

● Has at least 2 hidden layers

● Sigmoid function - binary classification

● Softmax function - multiclass classification

Multilayer Perceptron
Assessment
Option 1 Option 2
● Features: BOW + TF*IDF ● Features: word2vec word
embeddings
● ML Algorithm: Naive Bayes
● ML Algorithm: Multilayer
● Pros: Easier to implement
Perceptron
● Cons: Word count instead
● Pros: Produces better
of word sequence.
results, semantically and
● Ex. ‘Live to eat’ and ‘Eat to syntactically
live’ may mean the same’ ● Cons: Needs a big labeled
dataset to perform well
Main Blocks

ML.NET Learning Curve

- Still studying the framework.
- Not as well documented compared to Python
frameworks/libraries
- Ex. Has a method called TextCatalog.FeaturizeText() but
there’s no indication of the kind of feature extraction.
Supervised Learning Needs Big Data
- We can use open-source datasets for benchmark.
- But we need datasets with specific labels for the
algorithm to work.
Main Blocks

Model Update Criteria

- Retraining the model for every unknown word is
impractical.
- Suggestion:
- Set a minimum number of occurence of new words
before a model is to be retrained
- Ignore the rare, new words since it may not affect
the entire intent, sentiment, meaning, of the text.
Implementation Plan
- Email Cleaner
- Clean special characters, HTML tags, header and
footer of the email, etc.
- Set a standard file format (tsv, csv, txt, etc. or
transform to bin)
- Use spam dataset for the mean time as
benchmark (binary classification)
- Sentence Tokenizer + Feature Extraction
- Divide emails per sentence + word2vec
- Create Neural Network
- 1 input, 2 hidden, 1 output.
- Activation function - sigmoid
[1]D. Jurafsky and J. Martin, Speech
and language processing. Upper
Saddle River, N.J.: Pearson Prentice
Hall, 2009.

References [2]
https://developers.google.com/mac
hine-learning/

[3]bunch of stackoverflow /
stackexchange / Kaggle threads

[4]bunch of Medium posts

Natural language processing with TensorFlow Teach language to machines using Python s deep learning library 1st Edition Thushan Ganegedara 2024 scribd download
50% (2)
Natural language processing with TensorFlow Teach language to machines using Python s deep learning library 1st Edition Thushan Ganegedara 2024 scribd download
62 pages
Natural language processing with TensorFlow Teach language to machines using Python s deep learning library 1st Edition Thushan Ganegedara pdf download
100% (2)
Natural language processing with TensorFlow Teach language to machines using Python s deep learning library 1st Edition Thushan Ganegedara pdf download
68 pages
NLP m4
No ratings yet
NLP m4
97 pages
13. TEXT CLASSIFICATION USING NLP
No ratings yet
13. TEXT CLASSIFICATION USING NLP
28 pages
big data analytics Chap 11
No ratings yet
big data analytics Chap 11
8 pages
ML7 - Text Classification
No ratings yet
ML7 - Text Classification
13 pages
Data Science & Data Analytics Project - Documentation
No ratings yet
Data Science & Data Analytics Project - Documentation
10 pages
Get Natural language processing with TensorFlow Teach language to machines using Python s deep learning library 1st Edition Thushan Ganegedara PDF ebook with Full Chapters Now
100% (2)
Get Natural language processing with TensorFlow Teach language to machines using Python s deep learning library 1st Edition Thushan Ganegedara PDF ebook with Full Chapters Now
65 pages
NLP 9
No ratings yet
NLP 9
44 pages
text classification reseach paper
No ratings yet
text classification reseach paper
4 pages
CSC 528 Lecture 3
No ratings yet
CSC 528 Lecture 3
42 pages
A Comprehensive Guide To Understand and Implement Text Classification in Python
No ratings yet
A Comprehensive Guide To Understand and Implement Text Classification in Python
34 pages
UNIT-III Text Classification
No ratings yet
UNIT-III Text Classification
4 pages
Introduction To NLP
No ratings yet
Introduction To NLP
50 pages
Unit 5b - Natural Language Processing
No ratings yet
Unit 5b - Natural Language Processing
41 pages
NLP Intro
No ratings yet
NLP Intro
74 pages
2009 Tutorial Nips
No ratings yet
2009 Tutorial Nips
113 pages
Minorproject Ishant
No ratings yet
Minorproject Ishant
18 pages
NLP PPT
No ratings yet
NLP PPT
58 pages
NLP_DeepNLP
No ratings yet
NLP_DeepNLP
61 pages
A Guide To Text Classification (NLP)
No ratings yet
A Guide To Text Classification (NLP)
17 pages
Dataiku - Get Up To Speed With NLP
No ratings yet
Dataiku - Get Up To Speed With NLP
16 pages
Text-Processing-For-NLP-Text-Processing (6)
No ratings yet
Text-Processing-For-NLP-Text-Processing (6)
15 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
NLP handwritten notes_copy
No ratings yet
NLP handwritten notes_copy
26 pages
AI&NLP
No ratings yet
AI&NLP
1 page
Unit - 4 DL
No ratings yet
Unit - 4 DL
33 pages
NLP For ML - Spam Classifier
No ratings yet
NLP For ML - Spam Classifier
14 pages
The 7 NLP Techniques That Will Change How You Communicate in the Future (Part I)
No ratings yet
The 7 NLP Techniques That Will Change How You Communicate in the Future (Part I)
19 pages
TextFeatureEnginerring-NLP lec2
No ratings yet
TextFeatureEnginerring-NLP lec2
60 pages
Natural Language Processing_NOTES
No ratings yet
Natural Language Processing_NOTES
4 pages
Unit 5 - Aiaaia
No ratings yet
Unit 5 - Aiaaia
19 pages
What Is Natural Language Processing (NLP)
No ratings yet
What Is Natural Language Processing (NLP)
15 pages
MOD-1
No ratings yet
MOD-1
71 pages
DL Unit-IV
No ratings yet
DL Unit-IV
20 pages
14-LookingForward
No ratings yet
14-LookingForward
48 pages
Natural language processing notes
No ratings yet
Natural language processing notes
61 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
Dealing With Textual Data
No ratings yet
Dealing With Textual Data
67 pages
Pretrained Model
No ratings yet
Pretrained Model
50 pages
Week 6: Introduction To Natural Language Processing
No ratings yet
Week 6: Introduction To Natural Language Processing
18 pages
Thuyết Trình TWP
No ratings yet
Thuyết Trình TWP
7 pages
006 NLP-pipelineSLides
No ratings yet
006 NLP-pipelineSLides
12 pages
NLP - 1_250119_222702 (1)
No ratings yet
NLP - 1_250119_222702 (1)
71 pages
DLT Unit-5
No ratings yet
DLT Unit-5
48 pages
Natural Language Processing
No ratings yet
Natural Language Processing
8 pages
SocrAI Day 3
No ratings yet
SocrAI Day 3
43 pages
Unit 3 AI-ML Driven Data Science and Automation
No ratings yet
Unit 3 AI-ML Driven Data Science and Automation
49 pages
NLP PREP
No ratings yet
NLP PREP
14 pages
NLP
No ratings yet
NLP
13 pages
Spark NLP Training-Public-Oct 2020
No ratings yet
Spark NLP Training-Public-Oct 2020
50 pages
Parvathy V J, Engineer Special Programs, Livewire, Trivandrum
No ratings yet
Parvathy V J, Engineer Special Programs, Livewire, Trivandrum
35 pages
nlp_1
No ratings yet
nlp_1
11 pages
Natural Language Processing
No ratings yet
Natural Language Processing
8 pages
Natural Language Processing manual
No ratings yet
Natural Language Processing manual
39 pages
Hocken Maier 25
No ratings yet
Hocken Maier 25
46 pages
Unit 5 NLP
No ratings yet
Unit 5 NLP
24 pages
Schematron: A language for validating XML
From Everand
Schematron: A language for validating XML
Erik Siegel
No ratings yet
Introduction to Programming Languages
From Everand
Introduction to Programming Languages
IntroBooks Team
4/5 (1)
Implementing Domain-Specific Languages with Xtext and Xtend - Second Edition
From Everand
Implementing Domain-Specific Languages with Xtext and Xtend - Second Edition
Lorenzo Bettini
4/5 (1)
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
BTP Sixth Sem Report
No ratings yet
BTP Sixth Sem Report
31 pages
Samuel Cantrell
No ratings yet
Samuel Cantrell
7 pages
Image Dehazing Using Artificial Intelligence and Multi Exposure
No ratings yet
Image Dehazing Using Artificial Intelligence and Multi Exposure
50 pages
Logistic Regression - AI-ML Developer Course
100% (1)
Logistic Regression - AI-ML Developer Course
14 pages
ML Unit-Ii Notes
No ratings yet
ML Unit-Ii Notes
17 pages
Machine Learning KNN Presentation
No ratings yet
Machine Learning KNN Presentation
28 pages
A Novel Approach To Predict Students Performance in Online Courses Through Machine Learning
No ratings yet
A Novel Approach To Predict Students Performance in Online Courses Through Machine Learning
6 pages
Iris Flower Classification Using ML - by Modassir - Medium
No ratings yet
Iris Flower Classification Using ML - by Modassir - Medium
21 pages
Machine Learning Megapack
No ratings yet
Machine Learning Megapack
6 pages
Sentiment Analysis: An Overview of Concepts and Selected Techniques
No ratings yet
Sentiment Analysis: An Overview of Concepts and Selected Techniques
15 pages
COMP1942 Question Paper
No ratings yet
COMP1942 Question Paper
5 pages
Minor Final
No ratings yet
Minor Final
26 pages
LEDGAR A Large-Scale Multilabel Corpus For 2020.lrec-1.155
No ratings yet
LEDGAR A Large-Scale Multilabel Corpus For 2020.lrec-1.155
7 pages
UNit 1 Introduction To ML
No ratings yet
UNit 1 Introduction To ML
225 pages
A Survey On Decision Tree Algorithms of Classification in Data Mining
No ratings yet
A Survey On Decision Tree Algorithms of Classification in Data Mining
5 pages
Chapter 16. People To People Reciprocal Recommenders
No ratings yet
Chapter 16. People To People Reciprocal Recommenders
23 pages
ida unit-4
No ratings yet
ida unit-4
19 pages
Week 1
No ratings yet
Week 1
50 pages
Python - Programming
No ratings yet
Python - Programming
9 pages
Copy of ANN Components of Artifical Neural Networks-1
No ratings yet
Copy of ANN Components of Artifical Neural Networks-1
30 pages
Assessment of Plant Disease Identification Using GLCM and KNN Algorithms
No ratings yet
Assessment of Plant Disease Identification Using GLCM and KNN Algorithms
5 pages
Abbas Mustafaoglu
No ratings yet
Abbas Mustafaoglu
21 pages
GENAI COURSE PROJECT DETAILS
No ratings yet
GENAI COURSE PROJECT DETAILS
3 pages
PDF Exploratory Data Analysis Using R 1st Edition Ronald K. Pearson download
100% (10)
PDF Exploratory Data Analysis Using R 1st Edition Ronald K. Pearson download
66 pages
答案解析
No ratings yet
答案解析
15 pages
(eBook PDF) Business Analytics 4th Edition by Jeffrey D. Camminstant download
100% (2)
(eBook PDF) Business Analytics 4th Edition by Jeffrey D. Camminstant download
60 pages
Leveraging Advanced Machine Learning Techniques For Phishing Website Detection
No ratings yet
Leveraging Advanced Machine Learning Techniques For Phishing Website Detection
6 pages
Text Mining Applications and Theory
100% (1)
Text Mining Applications and Theory
5 pages
Lecture Notes For Chapter 1 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 1 Introduction To Data Mining: by Tan, Steinbach, Kumar
34 pages