0% found this document useful (0 votes)
6 views2 pages

TWP

The document provides an overview of Scikit-Learn, a powerful Python library for machine learning, along with essential libraries like Matplotlib, Seaborn, and NumPy for data visualization and manipulation. It discusses K-means clustering as an unsupervised learning algorithm, highlighting its advantages and drawbacks, as well as potential solutions for its limitations. Additionally, it mentions the use of PCA (Principal Components Analysis) in the context of machine learning.

Uploaded by

Thanh Vu Vu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views2 pages

TWP

The document provides an overview of Scikit-Learn, a powerful Python library for machine learning, along with essential libraries like Matplotlib, Seaborn, and NumPy for data visualization and manipulation. It discusses K-means clustering as an unsupervised learning algorithm, highlighting its advantages and drawbacks, as well as potential solutions for its limitations. Additionally, it mentions the use of PCA (Principal Components Analysis) in the context of machine learning.

Uploaded by

Thanh Vu Vu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Maybe you have known about Scikit-Learn (Sklearn), the most useful and robust

library for machine learning in Python. It provides a selection of efficient tools for
machine learning and statistical modeling including classification, regression,
clustering and dimensionality reduction via a consistence interface in Python.
We will use Python’s libraries such that Matplotlib, Numpy and Seaborn to ploting
and visualizing data.
#Matplotlib:
Matplotlib is a powerful plotting library in Python used for creating static, animated,
and interactive visualizations. Matplotlib’s primary purpose is to provide users with
the tools and functionality to represent data graphically, making it easier to analyze
and understand.
#Seaborn:
Seaborn is a library for making statistical graphics in Python. It builds on top of
matplotlib and integrates closely with pandas data structures. Seaborn helps you
explore and understand your data. Its plotting functions operate on dataframes and
arrays containing whole datasets and internally perform the necessary semantic
mapping and statistical aggregation to produce informative plots.
#Numpy:
NumPy is the fundamental package for scientific computing in Python. It is a Python
library that provides a multidimensional array object, various derived objects (such
as masked arrays and matrices), and an assortment of routines for fast operations
on arrays, including mathematical, logical, shape manipulation, sorting, selecting,
I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations,
random simulation and much more.
Especially, we use Pandas because it can clean messy data sets, and make them
readable and relevant. Pandas is a Python library used for working with data sets.
Pandas allows us to analyze big data and make conclusions based on statistical
theories.

https://www.tutorialspoint.com/scikit_learn/scikit_learn_introduction.htm
https://seaborn.pydata.org/tutorial/introduction.html
https://www.geeksforgeeks.org/python-introduction-matplotlib/
https://numpy.org/doc/stable/user/whatisnumpy.html
Some familiar algorithm will be used there: K-means clustering,
PCA( Principal Components Analysis), Forest Isolation, sau đây chúng tôi sẽ khái
quát về chúng:
#K-means clustering
K-Means Clustering is an Unsupervised Machine Learning algorithm, which
groups the unlabeled dataset into different clusters. The article aims to explore the
fundamentals and working of k mean clustering along with the implementation. Of
course it has some drawbacks:
 We have to determine the exactly number of clusterings
 Convergent speed depends on initial centroids
 Number of points in each cluster has to be approximately with others
 Clusters need to be circle-shaped
 having troubles when a cluster is inside another cluster
Và các giải pháp kèm theo:
 K-Means++ method: most efficient way to find initial centroids for each clusters
 Run the algorithm with different initial centroids then choose the one that has the
minimum value of loss function
 Elbow method: determine the number of clusters in a dataset

# PCA

You might also like