0% found this document useful (0 votes)

15 views28 pages

Text Classification Using NLP

The document presents an overview of text classification using Natural Language Processing (NLP), covering topics such as the definition of text classification, the importance of NLP, and various algorithms including Naïve Bayes, Support Vector Machines, and Deep Learning methods like CNN and RNN. It explains how NLP enables computers to understand and process human language, and discusses techniques for improving text classification performance. The presentation concludes with a summary of the discussed algorithms and their applications.

Uploaded by

maheeeesharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views28 pages

Text Classification Using NLP

Uploaded by

maheeeesharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

TEXT

CLASSIFICATION
USING NLP
PRESENTATION BY:
SURYANSH DEV(54912)
HRITIK BATHLA(54892)
ANJALI PANDEY(54886)
TOPICS WE WILL BE DISCUSSING:

1. What is Text Classification?

2. What is NLP?
3. Why NLP?
4. Text Classification in details with Code.
5. Algorithms of Text Classification.
 Algorithms1: Naïve Bayes
 Method 2: Word Embedding using Word2Vec--Word2Vec
 Method 3: Language model using Google Bert--Google Bert

Conclusion.
What is Text Classification?
Text Classification: Text classification also known as text tagging or text
categorization is the process of categorizing text into organized
groups. By using Natural Language Processing (NLP), text classifiers can
automatically analyze text and then assign a set of pre-defined tags or
categories based on its content.
Corpus: The corpus is nothing, but a collection of documents.
For example, you have been studying for certain topics let us say an
operating system in computer science, you start collecting various
information related to chapter the from the web some research papers
and so on. So, at the end , you might be having about you know 100 or
115 documents that you have in hand. So, that collection is called a
corpus
And you know you can name that corpus as an operating system
corpus.
What is NLP?
 Natural language processing (NLP) refers to the branch of computer science—
and more specifically, the branch of artificial intelligence or AI—concerned with
giving computers the ability to understand text and spoken words in much the
same way human beings can.
 NLP combines computational linguistics—rule-based modeling of human
language—with statistical, machine learning, and deep learning models.
Together, these technologies enable computers to process human language in
the form of text or voice data and to ‘understand’ its full meaning, complete
with the speaker or writer’s intent and sentiment.
 NLP drives computer programs that translate text from one language to
another, respond to spoken commands, and summarize large volumes of text
rapidly—even in real time. There’s a good chance you’ve interacted with NLP in
the form of voice-operated GPS systems, digital assistants, speech-to-text
dictation software, customer service chatbots, and other consumer
conveniences. But NLP also plays a growing role in enterprise solutions that help
streamline business operations, increase employee productivity, and simplify
mission-critical business processes.
Why NLP?
 NLP is all about processing text data.
 Now this abandons of text data and this is all quite unstructured.
 So, to be able to make use of this information and build some nice
applications, one should know how to process this data.
 That is why NLP comes into picture.
 Processing of Text data is essential in almost all the search engines,
query and query completion.
 For NLP, it includes:
▪ Text cleaning
▪ Stopwords Removal
▪ Stemming and lemmatization.
Text Classification with codes.
Stemming Code
Lemmatization Code
Text Classification Algorithms
1. Naïve Bayes:
The Naive Bayes algorithm is a probabilistic classifier that makes use of Bayes' Theorem – a
rule that uses probability to predict the tag of a text based on prior knowledge of conditions
that might be related. It calculates the probability of each tag for a given text, and then
predicts the tag with the highest probability.
You can also improve Naive Bayes’ performance by applying various techniques:
• Removing stopwords: common words that don’t add value. For example, such as, able to,
either, else, ever, etc.
• Lemmatizing words: Grouping different inflections of the same word. For example, draft,
drafted, drafts, drafting, etc.
• N-grams: The n-gram is the probability of the appearance of a word or a sequence of
words of ‘n length’ within a text.
• TF-IDF: Short for term frequency-inverse document frequency, TF-IDF is a metric that
quantifies how important a word is to a document in a document set. It is very powerful
when used to score words, i.e. it increases proportionally to the number of times a specific
word appears in a document, but is offset by the number of documents that contain said
word.
2. Support Vector Machines
Support Vector Machines (SVM) is a classification algorithm that performs
at its best when handling a limited amount of data. It determines the best
result between vectors that belong to a given group or category, as well as
the vectors that don’t belong to the group.
For example, let’s say you have two tags: expensive and cheap and the
data has two features: x and y. For each pair of coordinates (x, y) there
should be a result for each one to determine which is expensive and which
one is cheap. To do this, SVM creates a divisory line between the data
points, known as the decision boundary, and classifies everything that
falls on one side as expensive and everything that falls on the other side
as cheap.
The decision boundary divides a space into two subspaces, one for
vectors that belong to a group and another for vectors that don’t belong
to that group. Here, vectors represent training text and a group
represents the tag you use to tag your texts. A perk of using SVM is that it
doesn’t require a lot of training data to produce accurate results, although
it does require more computational resources than Naive Bayes to yield
more accurate results.
3. Deep Learning
Deep Learning is comprised of algorithms and techniques that are designed to mimic the
human brain. With text classification, there are two main deep learning models that are
widely used: Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN).
CNN is a type of neural network that consists of an input layer, an output layer, and multiple
hidden layers that are made of convolutional layers. Convolutional layers are the major
building blocks used in CNN, while a convolution is the linear operation that automatically
applies filters to inputs – resulting in activations. These complex layers are key ingredients in a
convolutional neural network as they assign importance to various inputs and differentiate
one from the other.
Within the context of text classification, CNN represents a feature that is applied to words or
n-grams to extract high-level features.
RNN are specialized neural-based networks that process sequential information. For each
input sequence, the RNN performs calculations that are conditioned to the previous
computer outputs. The key advantage of using RNN is the ability to memorize the results of
previous computations and use it for current ones.
It’s important to remember that deep learning algorithms require millions of tagged
examples, as they work best when fed more data.
Thank YOU.

A Quick Guide To Artificial Intelligence
100% (3)
A Quick Guide To Artificial Intelligence
41 pages
BOGE - C 10, 15, 20 L Series
No ratings yet
BOGE - C 10, 15, 20 L Series
62 pages
Teamlease Services Limited: Earnings Rs. Deduction Rs
No ratings yet
Teamlease Services Limited: Earnings Rs. Deduction Rs
1 page
Unit 2 Notes
No ratings yet
Unit 2 Notes
27 pages
Cook P. Fundamentals of HTML, SVG, CSS and JavaScript For Data Visual. 2022
No ratings yet
Cook P. Fundamentals of HTML, SVG, CSS and JavaScript For Data Visual. 2022
87 pages
Introduction To NLP
No ratings yet
Introduction To NLP
50 pages
B+V Manual - Hinge Casing Spider 200 SH Tons
No ratings yet
B+V Manual - Hinge Casing Spider 200 SH Tons
7 pages
NLP Handwritten Notes
No ratings yet
NLP Handwritten Notes
26 pages
An Introduction To Submarine Cables
100% (1)
An Introduction To Submarine Cables
7 pages
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper
No ratings yet
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper
74 pages
Iso 14001 Static 16x9
100% (1)
Iso 14001 Static 16x9
13 pages
Anitha S. Pillai and Roberto Tedesco - Machine Learning and Deep Learning in Natural Language Processing-CRC Press (2024)
100% (2)
Anitha S. Pillai and Roberto Tedesco - Machine Learning and Deep Learning in Natural Language Processing-CRC Press (2024)
245 pages
Text Processing For NLP Text Processing
No ratings yet
Text Processing For NLP Text Processing
15 pages
PROLOG For AI
No ratings yet
PROLOG For AI
19 pages
Bangladesh Telecommunications Company LTD.: Subscriber Copy ADSL Bill
No ratings yet
Bangladesh Telecommunications Company LTD.: Subscriber Copy ADSL Bill
3 pages
Kijoms S 24 00282
No ratings yet
Kijoms S 24 00282
16 pages
Untitled
No ratings yet
Untitled
3 pages
Speech and Language Processing - J&M
No ratings yet
Speech and Language Processing - J&M
599 pages
API ISCAN-LITE Scanner
No ratings yet
API ISCAN-LITE Scanner
4 pages
Natural Language Processing in Python Master Data Science and Machine Learning for Spam Detection, Sentiment Analysis, Latent Semantic Analysis, And Article Spinning (Machine Learning in Python) by Un (Z-li
No ratings yet
Natural Language Processing in Python Master Data Science and Machine Learning for Spam Detection, Sentiment Analysis, Latent Semantic Analysis, And Article Spinning (Machine Learning in Python) by Un (Z-li
163 pages
Beginner's Ubuntu Handbook
No ratings yet
Beginner's Ubuntu Handbook
102 pages
G.H Patel College of Engineering and Technology: Text Analysis, Summarization and Extraction
No ratings yet
G.H Patel College of Engineering and Technology: Text Analysis, Summarization and Extraction
98 pages
Introduction (BT4222) YL
No ratings yet
Introduction (BT4222) YL
48 pages
CSC 528 Lecture 3
No ratings yet
CSC 528 Lecture 3
42 pages
NLP DeepNLP
No ratings yet
NLP DeepNLP
61 pages
NLP m4
No ratings yet
NLP m4
97 pages
Unit-3NaturalLanguageProcessing (NLP) 1 T1743588944524
No ratings yet
Unit-3NaturalLanguageProcessing (NLP) 1 T1743588944524
83 pages
Presentasi Bulldozer D6N LGP
No ratings yet
Presentasi Bulldozer D6N LGP
28 pages
Unit 5 - Aiaaia
No ratings yet
Unit 5 - Aiaaia
19 pages
Unit - 4 DL
No ratings yet
Unit - 4 DL
33 pages
Classification Survey
No ratings yet
Classification Survey
40 pages
Unit 2
No ratings yet
Unit 2
26 pages
Kshitij Text Classification
No ratings yet
Kshitij Text Classification
20 pages
127 1498038923 - 21-06-2017 PDF
No ratings yet
127 1498038923 - 21-06-2017 PDF
9 pages
NLP 160709201345
No ratings yet
NLP 160709201345
61 pages
A Complete Process of Text Classification System Using State of The Art NLP Models
No ratings yet
A Complete Process of Text Classification System Using State of The Art NLP Models
26 pages
NLP Module 3
No ratings yet
NLP Module 3
66 pages
What Is Natural Language Processing (NLP)
No ratings yet
What Is Natural Language Processing (NLP)
15 pages
ML7 - Text Classification
No ratings yet
ML7 - Text Classification
13 pages
A M3 RD Ipjn Yd Ps GKF
No ratings yet
A M3 RD Ipjn Yd Ps GKF
20 pages
17 - Project Report - NLP-2-27
No ratings yet
17 - Project Report - NLP-2-27
26 pages
Application - Generator Protection
No ratings yet
Application - Generator Protection
13 pages
DW 144
No ratings yet
DW 144
98 pages
NLP Text Classification Week4
No ratings yet
NLP Text Classification Week4
26 pages
Lecture 6 - Word2Vec and Text Classification
No ratings yet
Lecture 6 - Word2Vec and Text Classification
66 pages
Java
No ratings yet
Java
9 pages
Manitowoc Cranes: 7974100011 (Sheet 1 of 6)
No ratings yet
Manitowoc Cranes: 7974100011 (Sheet 1 of 6)
6 pages
Unit 3 AI-ML Driven Data Science and Automation
No ratings yet
Unit 3 AI-ML Driven Data Science and Automation
49 pages
IR - Group1
No ratings yet
IR - Group1
27 pages
NLP Materia
No ratings yet
NLP Materia
29 pages
07 - 09 - 22 - KR - Ok
No ratings yet
07 - 09 - 22 - KR - Ok
25 pages
Ai Unit5
No ratings yet
Ai Unit5
16 pages
Impact of Convolutional Neural Network and Fasttext Embedding On Text Classification
No ratings yet
Impact of Convolutional Neural Network and Fasttext Embedding On Text Classification
17 pages
Best Text To Speech Ai - Aitech - Studio
No ratings yet
Best Text To Speech Ai - Aitech - Studio
8 pages
Big Data Analytics Chap 11
No ratings yet
Big Data Analytics Chap 11
8 pages
Lect 05
No ratings yet
Lect 05
17 pages
BeatProfiler Multimodal in Vitro Analysis of Cardiac Function Enables Machine Learning Classification of Diseases and Drugs
No ratings yet
BeatProfiler Multimodal in Vitro Analysis of Cardiac Function Enables Machine Learning Classification of Diseases and Drugs
12 pages
What Is Text Classification - Exxact
No ratings yet
What Is Text Classification - Exxact
12 pages
NLP For ML - Spam Classifier
No ratings yet
NLP For ML - Spam Classifier
14 pages
GMP 11 Good Measurement Practice For Assignment and Adjustment of Calibration Intervals For Laboratory Standards
No ratings yet
GMP 11 Good Measurement Practice For Assignment and Adjustment of Calibration Intervals For Laboratory Standards
10 pages
Chalegaui: Deapat
No ratings yet
Chalegaui: Deapat
16 pages
Vader - Sentiment - Analysis
No ratings yet
Vader - Sentiment - Analysis
8 pages
NLP Q2 21SAL54 Scheme
No ratings yet
NLP Q2 21SAL54 Scheme
6 pages
Data Science & Data Analytics Project - Documentation
No ratings yet
Data Science & Data Analytics Project - Documentation
10 pages
MS-Word Assignment
No ratings yet
MS-Word Assignment
13 pages
LECTURE 3 - Corporate Image
No ratings yet
LECTURE 3 - Corporate Image
10 pages
AI Introduction
No ratings yet
AI Introduction
12 pages
Natural Language Processing Unit 4
No ratings yet
Natural Language Processing Unit 4
37 pages
B V M Catalogue
No ratings yet
B V M Catalogue
24 pages
Dynamic Embedding Projection-Gated
No ratings yet
Dynamic Embedding Projection-Gated
10 pages
Unit 2
No ratings yet
Unit 2
6 pages
CSC 327 (DBMS Ii)
No ratings yet
CSC 327 (DBMS Ii)
8 pages
Lec # 4-1
No ratings yet
Lec # 4-1
15 pages
Talking Points
No ratings yet
Talking Points
8 pages
Text Classification Research Paper 2
No ratings yet
Text Classification Research Paper 2
7 pages
What Is AI 1610590751
No ratings yet
What Is AI 1610590751
8 pages
Research Paper 3
No ratings yet
Research Paper 3
7 pages
CV - (Hadziq Mufid Mahmud) (Middleware Developer)
No ratings yet
CV - (Hadziq Mufid Mahmud) (Middleware Developer)
6 pages
Mining Text Data and Classificatin
No ratings yet
Mining Text Data and Classificatin
4 pages
UNIT-III Text Classification
No ratings yet
UNIT-III Text Classification
4 pages
Clapingo Android Internship Assignment
No ratings yet
Clapingo Android Internship Assignment
5 pages
Natural Language Processing
No ratings yet
Natural Language Processing
6 pages
SLG Math 10 Quarter 2 Week 2
No ratings yet
SLG Math 10 Quarter 2 Week 2
5 pages
A Survey On Different Types of Approaches To Text Categorization
No ratings yet
A Survey On Different Types of Approaches To Text Categorization
3 pages
Lec # 9
No ratings yet
Lec # 9
18 pages
Research On Web Text Classification Algorithm Based On Improved CNN and SVM
No ratings yet
Research On Web Text Classification Algorithm Based On Improved CNN and SVM
4 pages
850 Universal Interface Manual UI 5000
No ratings yet
850 Universal Interface Manual UI 5000
4 pages
Unit-III NLP
No ratings yet
Unit-III NLP
15 pages
How To Apply, Submission of Application and Printing of Admit Card
No ratings yet
How To Apply, Submission of Application and Printing of Admit Card
3 pages
Practical Amazon EC2, SQS, Kinesis, and S3: A Hands-On Approach To AWS
No ratings yet
Practical Amazon EC2, SQS, Kinesis, and S3: A Hands-On Approach To AWS
1 page
Aa270625068397p - SCN25062025 GST
No ratings yet
Aa270625068397p - SCN25062025 GST
1 page
Minutes of Meeting Held Between M/S Ultra Tech Sewagram Cements LTD and M/S S.N Enviro Solutions PVT LTD
No ratings yet
Minutes of Meeting Held Between M/S Ultra Tech Sewagram Cements LTD and M/S S.N Enviro Solutions PVT LTD
1 page
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
From Everand
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
Alexandra George
No ratings yet
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
From Everand
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
Timothy King
No ratings yet
Deep Learning With Python Illustrated Guide For Beginners & Intermediates: The Future Is Here!: The Future Is Here!, #2
From Everand
Deep Learning With Python Illustrated Guide For Beginners & Intermediates: The Future Is Here!: The Future Is Here!, #2
William Sullivan
1/5 (1)

Text Classification Using NLP

Uploaded by

Text Classification Using NLP

Uploaded by

TEXT

1. What is Text Classification?

You might also like