0% found this document useful (0 votes)

9 views3 pages

Machine Algorithm

The document discusses the impact of outliers on regression analysis, explaining how they can skew results. It outlines the steps involved in the k-NN algorithm and contrasts it with the K-means clustering algorithm, highlighting their purposes and learning types. Additionally, it describes various clustering methods, including partitioning, density-based, distribution model-based, and hierarchical clustering, and introduces Pearson's correlation coefficient as a measure of linear relationship strength between two variables.

Uploaded by

metacit 7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views3 pages

Machine Algorithm

Uploaded by

metacit 7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Machine learning algorithms

1. How can outliers impact regression analysis?

An outlier is a data point that differs significantly from other observations. An outlier may be due to
a variability in the measurement, an indication of data which may be collected or it may be the
result of experimental error. They can significantly skew the results of regression analysis by
distorting the regression line and affecting the accuracy of predictions.
2. What are the steps involved in k-NN algorithm?

Select the Number K of Neighbors: Decide on the value of K, which is the number of nearest
neighbors to consider for classification.

Calculate Distances: Compute the distance (usually Euclidean) between the new data point and
each point in the training dataset.

Identify K Nearest Neighbors: Select the K data points that are closest to the new data point based
on the calculated distances.

Count the Categories: For the K nearest neighbors, count the number of occurrences of each
category or class label.

Assign the New Data Point: Assign the new data point to the category with the majority vote
among the K nearest neighbors.

Model is Ready: The model is ready to classify new data points based on the majority vote of the K
nearest neighbors.

This sequence ensures that the new data point is classified according to the most common class among its
K nearest neighbors.

3. What do you understand by K-means algorithm? How is it different from KNN?

In K-means clustering, updating the centers involves recalculating the position of each cluster’s center
(also called the centroid) after assigning data points to clusters.

Steps involved:

1. Current Centers: Initialize the cluster centers.

2. Assign Points: Assign each data point to the nearest cluster center based on distance (usually
Euclidean distance), forming clusters.
3. Calculate New Centers:
o For each cluster, gather all data points assigned to that cluster.
o Compute the average position (mean) of these data points for the cluster.
o Update the cluster center to this average position.

This process is repeated iteratively until the cluster centers stabilize and no longer change significantly, or
a predefined number of iterations is reached.

(Ex only for understading ….)

 Imagine you have a group of people (data points) standing in different locations (clusters).
 Each group has a leader (cluster center).
 To find a new leader, you calculate the average position of everyone in the group.
 The new leader (center) is now at this average position, and the process helps ensure that the leader
is in the best possible spot to represent the group.
Difference between KNN and K-Means

Purpose:

o K-Means Clustering: Groups data into clusters.

o KNN: Predicts the label or value of a new data point based on its neighbors.

Type of Learning:

o K-Means Clustering: Unsupervised learning (no predefined labels).

o KNN: Supervised learning (uses labels from training data).

4. What are the different types of clustering?

Partitioning Clustering:

 Definition: Divides data into separate, non-overlapping groups.

 Also Known As: Centroid-based clustering.
 Example: K-Means Clustering.
 How It Works: You choose how many groups (K) you want. The algorithm finds the center of
each group and assigns data points to the nearest center, aiming to make the groups as distinct as
possible.

Density-Based Clustering:

 Definition: Groups data based on how dense (crowded) the data points are.
 Challenge: Can be tricky with data that has uneven density or many dimensions.
 How It Works: Finds clusters in areas where data points are closely packed together. It can handle
clusters of different shapes and is good at finding outliers (points that don’t fit any cluster).

Distribution Model-Based Clustering:

 Definition: Groups data by assuming that the data follows specific statistical distributions.
 Common Distribution: Gaussian (bell-shaped curve).
 Example: Gaussian Mixture Models (GMM) with Expectation-Maximization.
 How It Works: Assumes that the data comes from a mix of several distributions. It tries to estimate
the best-fit distributions to form clusters.

Hierarchical Clustering:

 Definition: Builds clusters in a tree-like structure without needing to decide the number of clusters
beforehand.
 Types:
o Agglomerative: Starts with individual data points and combines them into larger clusters.
o Divisive: Starts with one big cluster and splits it into smaller ones.

5. What is Pearson’s cofficent? Write its formula too.

Pearson's correlation coefficient (often denoted as Pearson's r) is one of the crucial factors to
consider when assessing the appropriateness of regression analysis. Pearson's r measures the
strength and direction of the linear relationship between two continuous variables.

The requirements when considering the use of Pearson's correlation coefficient are:
1. Scale of measurement should be interval or ratio.
2. The association should be linear.
4. There should be no outliers in the data.

Clustering K-Means
100% (2)
Clustering K-Means
28 pages
DSV - Unit 3 - Data Analysis in Depth
No ratings yet
DSV - Unit 3 - Data Analysis in Depth
53 pages
ch 5
No ratings yet
ch 5
34 pages
Lecture - 10 Unsupervised Learning & K-Means Clustering
No ratings yet
Lecture - 10 Unsupervised Learning & K-Means Clustering
31 pages
Cluster Analysis 1731695796
No ratings yet
Cluster Analysis 1731695796
91 pages
ML Unit-2 (CEC)
No ratings yet
ML Unit-2 (CEC)
96 pages
UNIT 3 ML Distance Based Learning
No ratings yet
UNIT 3 ML Distance Based Learning
19 pages
ML Unit 4
No ratings yet
ML Unit 4
110 pages
Clustering Partitioning-Hierarchical-DensityBased
No ratings yet
Clustering Partitioning-Hierarchical-DensityBased
87 pages
knn and k means
No ratings yet
knn and k means
38 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Clustering Part1
No ratings yet
Clustering Part1
84 pages
کتاب چهارم بارگزاری شده
No ratings yet
کتاب چهارم بارگزاری شده
63 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
1 Akshada Assignment No.21(KM Clustering)
No ratings yet
1 Akshada Assignment No.21(KM Clustering)
22 pages
Lect 10 - Unsupervised Learning
No ratings yet
Lect 10 - Unsupervised Learning
50 pages
Unit 4
No ratings yet
Unit 4
125 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
20 pages
Week 9
No ratings yet
Week 9
66 pages
Introduction To Unsupervised Learning:: Clustering
No ratings yet
Introduction To Unsupervised Learning:: Clustering
21 pages
ML4 Unsupervised Learning
No ratings yet
ML4 Unsupervised Learning
60 pages
Lecture Unsupervised (17!04!2024)
No ratings yet
Lecture Unsupervised (17!04!2024)
61 pages
Q & a Unit 3 - Clustering Methods
No ratings yet
Q & a Unit 3 - Clustering Methods
21 pages
Datamining Lect8
No ratings yet
Datamining Lect8
79 pages
(KtabPDF Com) xrwA7TEBGp
No ratings yet
(KtabPDF Com) xrwA7TEBGp
32 pages
Unit 4
No ratings yet
Unit 4
29 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
83 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
CS8091 - Big Data Analytics - Unit 2
No ratings yet
CS8091 - Big Data Analytics - Unit 2
44 pages
ADL LAB Manual
No ratings yet
ADL LAB Manual
27 pages
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
No ratings yet
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
40 pages
ML Unit3
No ratings yet
ML Unit3
21 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
Big Data
No ratings yet
Big Data
21 pages
Clustering Techniques - Hierarchical, K-Means Clustering
No ratings yet
Clustering Techniques - Hierarchical, K-Means Clustering
22 pages
Unsupervised Learning: K-Means Clustering
No ratings yet
Unsupervised Learning: K-Means Clustering
23 pages
Aiml Unit 4
No ratings yet
Aiml Unit 4
20 pages
Outline: Three Basic Algorithms
No ratings yet
Outline: Three Basic Algorithms
34 pages
K Mean
No ratings yet
K Mean
5 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
K Means Clustering
No ratings yet
K Means Clustering
6 pages
Clustering
No ratings yet
Clustering
24 pages
Unit 3 & 4 (p18)
No ratings yet
Unit 3 & 4 (p18)
18 pages
UNIT 4 K-Means Clustring
No ratings yet
UNIT 4 K-Means Clustring
13 pages
Unit-4 Unsupervised Algorithm
No ratings yet
Unit-4 Unsupervised Algorithm
18 pages
DAA - Chapter 03
No ratings yet
DAA - Chapter 03
19 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
KNN VS Kmeans
No ratings yet
KNN VS Kmeans
3 pages
Machine Learning & Data Mining: Understanding
No ratings yet
Machine Learning & Data Mining: Understanding
7 pages
Unit 4
No ratings yet
Unit 4
16 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
Predict Classify Cluster
No ratings yet
Predict Classify Cluster
12 pages
6 Clustering
No ratings yet
6 Clustering
15 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
6 Kmeans
No ratings yet
6 Kmeans
15 pages
Week 10
No ratings yet
Week 10
41 pages
K Mean
No ratings yet
K Mean
7 pages
1 Kmeans
No ratings yet
1 Kmeans
13 pages
Non Tech Data Analytics Roadmap 1689017100
No ratings yet
Non Tech Data Analytics Roadmap 1689017100
10 pages
2025 CBSE XII Maths (Q.P. Code 65-1-1 65-1-2 65-1-3 Series W1XZY)
No ratings yet
2025 CBSE XII Maths (Q.P. Code 65-1-1 65-1-2 65-1-3 Series W1XZY)
24 pages
K - Means Clustering
No ratings yet
K - Means Clustering
13 pages
Introduction To Simulation
100% (1)
Introduction To Simulation
43 pages
fixed random trắc nghiệm tự luận
100% (1)
fixed random trắc nghiệm tự luận
12 pages
Wolaita Sodo University: By: Firomsa Regasa Adviser: - DR Demo D
No ratings yet
Wolaita Sodo University: By: Firomsa Regasa Adviser: - DR Demo D
36 pages
PDF of Book List 2022 23
No ratings yet
PDF of Book List 2022 23
18 pages
B9 Term 2 Computing All Weeks
No ratings yet
B9 Term 2 Computing All Weeks
46 pages
Stata Tutorial
No ratings yet
Stata Tutorial
88 pages
Commission On Higher Education: CHED Central O!fice gEC0RO5 5?CTI0I
No ratings yet
Commission On Higher Education: CHED Central O!fice gEC0RO5 5?CTI0I
10 pages
Mini
No ratings yet
Mini
16 pages
Presentation 3
No ratings yet
Presentation 3
34 pages
Note Making - Grade Xi-1
No ratings yet
Note Making - Grade Xi-1
8 pages
Astros
No ratings yet
Astros
20 pages
The Worklife Interface A Critical Factor Between Work Stressors and Job Satisfaction2019Personnel Review
No ratings yet
The Worklife Interface A Critical Factor Between Work Stressors and Job Satisfaction2019Personnel Review
18 pages
Association Rule Mining:: Dm-Unit-2
No ratings yet
Association Rule Mining:: Dm-Unit-2
16 pages
Sample Data Analysis
No ratings yet
Sample Data Analysis
78 pages
Preacher Kelley 2011
No ratings yet
Preacher Kelley 2011
23 pages
Chapter 13 Simple Regression
No ratings yet
Chapter 13 Simple Regression
44 pages
Panel Data, Var, Non Linear Regression
No ratings yet
Panel Data, Var, Non Linear Regression
14 pages
Practice Based Research With Psychologists in Training Presentation of A Supervision Model and Use of Routine Outcome Monitoring Investigaci N Basada
No ratings yet
Practice Based Research With Psychologists in Training Presentation of A Supervision Model and Use of Routine Outcome Monitoring Investigaci N Basada
27 pages
Preyesh
No ratings yet
Preyesh
23 pages
Final Exam Statistics For Social Sciences 6.2024
No ratings yet
Final Exam Statistics For Social Sciences 6.2024
8 pages
Acknowledgment: Dr. Verna R. Belarmino Department Coordinator and Member
No ratings yet
Acknowledgment: Dr. Verna R. Belarmino Department Coordinator and Member
9 pages
Pengaruh Komitmen Organisasi Dan Kepuasan Kerja Terhadap Turnover Intention Karyawan Alfamart Se-Kecamatan Sawan
No ratings yet
Pengaruh Komitmen Organisasi Dan Kepuasan Kerja Terhadap Turnover Intention Karyawan Alfamart Se-Kecamatan Sawan
10 pages
AISO7003 - Assessment 02 - Details - v2 - 21jan2025
No ratings yet
AISO7003 - Assessment 02 - Details - v2 - 21jan2025
5 pages
20191120122749-Data Science Certification Training
No ratings yet
20191120122749-Data Science Certification Training
4 pages
POSTER Writing - Class 11
No ratings yet
POSTER Writing - Class 11
3 pages
RESEARCH METHODS LESSON 18 - Multiple Regression
No ratings yet
RESEARCH METHODS LESSON 18 - Multiple Regression
6 pages
CH 6 Transforming Data
No ratings yet
CH 6 Transforming Data
12 pages
XI. Worksheet Ch-7 Binominal Theorem, Ch-8 Sequences and Series
No ratings yet
XI. Worksheet Ch-7 Binominal Theorem, Ch-8 Sequences and Series
3 pages
Parametric Test
No ratings yet
Parametric Test
2 pages
OMG 322 Assigment # 3
No ratings yet
OMG 322 Assigment # 3
3 pages
MCS 221
No ratings yet
MCS 221
3 pages

Machine Algorithm

Uploaded by

Machine Algorithm

Uploaded by

Machine learning algorithms

1. How can outliers impact regression analysis?

3. What do you understand by K-means algorithm? How is it different from KNN?

1. Current Centers: Initialize the cluster centers.

(Ex only for understading ….)

o K-Means Clustering: Groups data into clusters.

o K-Means Clustering: Unsupervised learning (no predefined labels).

4. What are the different types of clustering?

 Definition: Divides data into separate, non-overlapping groups.

Distribution Model-Based Clustering:

5. What is Pearson’s cofficent? Write its formula too.

You might also like