0% found this document useful (0 votes)
4 views15 pages

Day19 Machine Learning

ML

Uploaded by

manikantraju123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views15 pages

Day19 Machine Learning

ML

Uploaded by

manikantraju123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

NATIONAL INSTITUTE OF ELECTRONICS AND INFORMATION TECHNOLOGY

Sumit Complex, A-1/9, Vibhuti Khand, Gomti Nagar, Lucknow,

Setting Up User Accounts

Machine Learning using Python


1 Day 19
Course: Machine Learning using Python
Module: Day 19
2 Index
 Text Classification
 Applications of NLP
 Text transformation
 Step 1: Collect Data
 Step 2: Design the Vocabulary
 Step 3: Sparse Matrix
 Text Processing using Python
 References
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: DayM2-R5:
19 Introduction to ICT Resources
3 Text Classification
 Text classification also known as text tagging or text categorization is the process of categorizing
text into organized groups. By using Natural Language Processing (NLP), text classifiers can
automatically analyze text and then assign a set of pre-defined tags or categories based on its
content.
 Natural Language Processing
 NLP stands for Natural Language Processing, which is a part of Computer Science, Human
language, and Artificial Intelligence. It is the technology that is used by machines to understand,
analyses, manipulate, and interpret human's languages. It helps developers to organize
knowledge for performing tasks such as translation, automatic summarization, Named Entity
Recognition (NER), speech recognition, relationship extraction, and topic segmentation.
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: DayM2-R5:
19 Introduction to ICT Resources
4 Applications of NLP
 Question Answering
 Spam Detection
 Sentiment Analysis
 Machine Translation
 Speech Recognition
 Chatbot
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: DayM2-R5:
19 Introduction to ICT Resources
5 Text transformation
 A text transformation is a technique that is used to control the capitalization of the text.
 Bag-of-Words Model
 The bag-of-words model is used to represent text data for machine learning algorithms. The bag-
of-words model is simple to understand and implement and has seen great success in language
modeling and text classification.
 A bag-of-words is a representation of text that describes the occurrence of words within a
document. It involves two things:
 A vocabulary of known words.
 A measure of the presence of known words.
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: DayM2-R5:
19 Introduction to ICT Resources
6 Step 1: Collect Data
For example, An English man speaks some text to villagers:
 Good Job Done
 Very Good Job Done
 Bad Job Done
 Very Bad Job Done
 Bad Job
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: DayM2-R5:
19 Introduction to ICT Resources
7 Step 2: Design the Vocabulary
 Now we can make a list of all of the words in our model vocabulary. The unique words here
(ignoring case and punctuation) are:

 Good
 Job
 Done
 Very
 Bad
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: DayM2-R5:
19 Introduction to ICT Resources
8 Step 3: Sparse Matrix
 Matrices that contain mostly zero values are called sparse matrix. The sparse matrix of above
example can be constructed as:

Dictionary Good Job Done Very Bad


Good Job Done 1 1 1 0 0
Very Good Job
Done 1 1 1 1 0

Bad Job Done


0 1 1 0 1
Very Bad Job Done
0 1 1 1 1
Bad Job
0 1 0 0 1
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: DayM2-R5:
19 Introduction to ICT Resources
9 Continue..
 The above matrix can be represented as sparse matrix:

Row/Column index
0 1 2 3 4

0 1 1 1 0 0

1 1 1 1 1 0

2 0 1 1 0 1

3 0 1 1 1 1

4 0 1 0 0 1
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: DayM2-R5:
19 Introduction to ICT Resources
10 Text Processing using Python
 CountVectorizer():

The CountVectorizer() function provides a simple way to both tokenize a collection of text
documents and build a vocabulary of known words. It also used to encode new documents
using that vocabulary.

Sample Text

simple_text=["Good Job Done", "Very Good Job Done", "Bad Job Done", "Very Bad Job Done",
"Bad Job"]
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: DayM2-R5:
19 Introduction to ICT Resources
11 Continue
 Building the vocabulary using CountVectorizer()

from sklearn.feature_extraction.text import CountVectorizer

vect=CountVectorizer()

#Fit : Learn the "Dictionary" of the data provided.

vect.fit(simple_text)

 To see the dictionary made from data


 get_feature_names() - Returns a list of feature names, ordered by their indices.

vect.get_feature_names()
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: DayM2-R5:
19 Introduction to ICT Resources
12 Continue….
 To prepare data matrix i.e. index having the value 1

data_matrix=vect.transform(simple_text)

print(data_matrix)

 To prepare dense matrix

dense_matrix=data_matrix.toarray()

print(dense_matrix)
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: DayM2-R5:
19 Introduction to ICT Resources
13 Continue…
 Converting the transformed data into dataframe

import pandas as pd

df=pd.DataFrame(data_matrix.toarray(),columns=vect.get_feature_names())

print(df)

 Note: These operations must be performed on the text data on which you want to perform
Machine Learning. Now your training data is ready.
Course: Machine Learning using Python
Module: Day 19
14 References
• Wikipedia.org

• Tutorialspoint.com

• https://www.geeksforgeeks.org/

• https://www.kaggle.com/

• https://github.com/
Course: Machine Learning using Python
Module: Day 19
15

Thank
You ! ! !

You might also like