0% found this document useful (0 votes)
44 views39 pages

Machine Learning Numpy

The document discusses several machine learning concepts including text classification using Bayes' theorem, k-means clustering, and principal component analysis (PCA). It explains k-means clustering as an iterative process of calculating distances between data points and cluster centroids, and updating the centroids. It also discusses calculating the elbow method to determine the optimal number of clusters. The document provides three key uses of PCA: 1) data visualization by projecting high-dimensional data onto two dimensions, 2) reducing noise by rejecting less informative attributes, and 3) reducing memory usage by dropping redundant dimensions. It illustrates PCA by projecting random data points and showing how different component axes can better distinguish the classes.

Uploaded by

Amalina Sulaiman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views39 pages

Machine Learning Numpy

The document discusses several machine learning concepts including text classification using Bayes' theorem, k-means clustering, and principal component analysis (PCA). It explains k-means clustering as an iterative process of calculating distances between data points and cluster centroids, and updating the centroids. It also discusses calculating the elbow method to determine the optimal number of clusters. The document provides three key uses of PCA: 1) data visualization by projecting high-dimensional data onto two dimensions, 2) reducing noise by rejecting less informative attributes, and 3) reducing memory usage by dropping redundant dimensions. It illustrates PCA by projecting random data points and showing how different component axes can better distinguish the classes.

Uploaded by

Amalina Sulaiman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Machine learning numpy,

school of AI Kuala Lumpur


Husein Zolkepli
Bayes theorem text classification

Likelihood probability, probability Prior probability,


of vector X when class C probability of class C
going to occur

Posterior probability, probability of Marginal probability, probability of


class C going to happen when vector X, most of the case, its
vector is X unobserve
Rebranding bayes theorem
Rebranding bayes theorem
Rebranding bayes theorem
Rebranding bayes theorem
Rebranding bayes theorem
Text classification
index i like chicken meat label

1 1 1 1 0 0

2 1 1 0 1 1
Kmean

1. Initiate random centroids, or use kmeans++.


Kmean

2. Keep iterating to calculate distances between individuals and centroids, and


mean clustered individuals.
Kmean
3. To calculate ELBOW,

Iterate N K-means, every iteration, calculate sum of distances between centroids


and grouped individuals, and plot.
Principal Component Analysis
Principal Component Analysis
1. Visualization
Principal Component Analysis
1. Visualization

Height, x Weight, y Bmi, z Score, a Hair Age, c Steps, d


length, b
Principal Component Analysis
1. Visualization

Height, x Weight, y Bmi, z Score, a Hair Age, c Steps, d


length, b

It does not makes sense if you want to plot this table into a vector space, we have
7 dimensions!
Principal Component Analysis
2. Reduce noise

Let say you want to study stress level of a student, based on,
Principal Component Analysis
2. Reduce noise

Let say you want to study stress level of a student, based on,

Height, x Weight, y Bmi, z Score, a Hair Age, c Steps, d


length, b

Not all these 7 dimensions bring important information! We want to reject some
attributes.
Principal Component Analysis
2. Reduce noise

Let say you want to study stress level of a student, based on,

Height, x Weight, y Bmi, z Score, a Hair Age, c Steps, d


length, b

Not all these 7 dimensions bring important information! We want to reject some
attributes. Maybe 7 does not hurt much. What happen if you have 512 * 512 * 3
(image) dimension?! insane!
Principal Component Analysis
3. Reduce memory (computer science)

Height, x Weight, y Bmi, z Score, a Hair Age, c Steps, d


length, b

Let say a float took 1 bytes, we have 7 columns and 1 billion of rows.

7 * 1,000,000,000 * 1 = 7,000,000,000 bytes == 70 GB!

Drop a column will save us 10 GB of memory!


Principal Component Analysis
I have data points
Principal Component Analysis
I have data points

I have data points


Principal Component Analysis
I have data points

I have data points

Let say, this plane is Rn , we only visualize it on R2 , I want to visualize the data
points at axis-0, which is x-axis.
Principal Component Analysis
Principal Component Analysis
Principal Component Analysis
Principal Component Analysis

We cannot distinguish between oranges and blues!


Principal Component Analysis

We cannot distinguish between oranges and blues! How about axis-2, which is,
axis-y?
Principal Component Analysis
Principal Component Analysis

It is quite okay, just a few data points overlapped each others.


Principal Component Analysis

It is quite okay, just a few data points overlapped each others. But we don’t
overlapping right?!
Principal component analysis

Eigenvector, R1, of our


covariance matrix
Principal component analysis
Principal component analysis
Principal component analysis

Im too tired man to draw one-by-one :(


Principal component analysis
How to make sense of it?
Principal component analysis
Principal component analysis
Principal component analysis

[5, 4], [5, -4],


[4, 6] [-4, 6]

Value 1 is y axis, 0 correlation


[5, 0],
[0, 1]
Principal component analysis

[1., 0.], lambda = 5

[5, 0],
[0, 1]

l, v = np.linalg.eig(np.array([[5,0],[0,1]]))
l, v
(array([5., 1.]), array([[1., 0.],
[0., 1.]]))

You might also like