Day19 Machine Learning
Day19 Machine Learning
Good
Job
Done
Very
Bad
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: DayM2-R5:
19 Introduction to ICT Resources
8 Step 3: Sparse Matrix
Matrices that contain mostly zero values are called sparse matrix. The sparse matrix of above
example can be constructed as:
Row/Column index
0 1 2 3 4
0 1 1 1 0 0
1 1 1 1 1 0
2 0 1 1 0 1
3 0 1 1 1 1
4 0 1 0 0 1
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: DayM2-R5:
19 Introduction to ICT Resources
10 Text Processing using Python
CountVectorizer():
The CountVectorizer() function provides a simple way to both tokenize a collection of text
documents and build a vocabulary of known words. It also used to encode new documents
using that vocabulary.
Sample Text
simple_text=["Good Job Done", "Very Good Job Done", "Bad Job Done", "Very Bad Job Done",
"Bad Job"]
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: DayM2-R5:
19 Introduction to ICT Resources
11 Continue
Building the vocabulary using CountVectorizer()
vect=CountVectorizer()
vect.fit(simple_text)
vect.get_feature_names()
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: DayM2-R5:
19 Introduction to ICT Resources
12 Continue….
To prepare data matrix i.e. index having the value 1
data_matrix=vect.transform(simple_text)
print(data_matrix)
dense_matrix=data_matrix.toarray()
print(dense_matrix)
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: DayM2-R5:
19 Introduction to ICT Resources
13 Continue…
Converting the transformed data into dataframe
import pandas as pd
df=pd.DataFrame(data_matrix.toarray(),columns=vect.get_feature_names())
print(df)
Note: These operations must be performed on the text data on which you want to perform
Machine Learning. Now your training data is ready.
Course: Machine Learning using Python
Module: Day 19
14 References
• Wikipedia.org
• Tutorialspoint.com
• https://www.geeksforgeeks.org/
• https://www.kaggle.com/
• https://github.com/
Course: Machine Learning using Python
Module: Day 19
15
Thank
You ! ! !