Assignment 1ML

The document discusses overfitting in machine learning, explaining it as a model's inability to generalize due to learning noise in training data, and suggests mitigation techniques such as data augmentation, regularization, cross-validation, feature selection, and early stopping. It differentiates between classification tasks, which predict discrete labels, and regression tasks, which predict continuous values. Additionally, it explains the k-means clustering algorithm, outlining its steps and providing an example of clustering a set of 2D points.

Uploaded by

beverlineomondi827

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views5 pages

Assignment 1ML

Uploaded by

beverlineomondi827

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

ASSIGNMENT 1

MACHINE LEARNING
QUESTION 1
What is overfitting in machine learning? How can it be mitigated?
Overfitting happens when a model learns not only the underlying pattern in the training data but
also the noise or random fluctuations. The mode performs very well on the trained data but
poorly on the new dataset and fails to generalize.
How it can be Mitigated
Using techniques that aim to reduce model complexity and encourage better generalization to
new data. They include
i. Data augmentation - Creating additional training data by applying transformations
like rotation, flipping, or scaling to existing data, which helps the model learn
more robust features
ii. Regularization Adding a penalty term to the loss function that discourages the
model from assigning large weights to parameters, effectively limiting model
complexity.
 L1 Regularization (Lasso): Encourages sparsity by pushing some
coefficients to zero.
 L2 Regularization (Ridge): Penalizes large weights, preventing them
from becoming too extreme.

iii. Cross-Validation-Splitting the dataset into multiple folds, training the model on
different subsets of data, and evaluating its performance on the remaining folds to
get a more robust estimate of generalization ability
iv. Feature selection- Choosing only the most relevant features to train the model on,
eliminating noise and reducing model complexity

v. Early stopping cross- Monitoring the model's performance on a validation set

during training and stopping the training process when performance on the
validation set starts to decline, preventing the model from overfitting to the
training data
QUESTION 2
Differentiate between classification and regression tasks.
Classification is a type of supervised machine learning task where the goal is to predict a discrete
label/category. It classifies input data into one or more classes. E.g. Image recognition
While
Regression is a type of machine learning task where the goal is to predict a continuous output
value, typically numerical. E.g. house price prediction

QUESTION 3
How does the k-means cluster algorithm work? Provide an example.
K-means groups together similar data points into clusters by minimizing the distance between
data points in a cluster with their centroid or k mean value. The primary goal of the k-means
algorithm is to minimize the total distances between points and their assigned cluster centroid
Example
Steps:
1. Initialize centroids: Choose K initial centroids randomly. These can either be chosen
randomly from the dataset or using some other method like K-means++.
2. Assign data points to the nearest centroid: For each data point in the dataset, compute
the distance to each of the K centroids (commonly using Euclidean distance), and assign
each point to the cluster whose centroid is closest.
3. Recompute centroids: After assigning the data points to the clusters, recalculate the
centroid of each cluster. The new centroid is the mean (average) of all the data points
assigned to that cluster.
4. Repeat: Repeat steps 2 and 3 until the centroids no longer change (or the change is below
a certain threshold) or a maximum number of iterations is reached.
Example:
Imagine you have the following dataset of 2D points:

(1, 2), (1.5, 1.8), (5, 8), (8, 8), (1, 0.6), (9, 11)
Let's say we want to divide these points into 2 clusters (i.e., K = 2).
1. Initialize centroids: Suppose we randomly pick two points as the initial centroids:
o Centroid 1: (1, 2)

o Centroid 2: (5, 8)

2. Assign points to the nearest centroid:

o For each point, compute the distance to each centroid and assign the point to the
nearest centroid.
o Points close to (1, 2) might be:

 (1, 2), (1.5, 1.8), (1, 0.6)

o Points close to (5, 8) might be:

 (5, 8), (8, 8), (9, 11)

3. Recompute centroids:
 The new centroid for the first cluster is the mean of the points assigned to
it:
 Mean of (1, 2), (1.5, 1.8), (1, 0.6) = (1.17, 1.47)
o The new centroid for the second cluster is the mean of the points assigned to it:

 Mean of (5, 8), (8, 8), (9, 11) = (7.33, 9.0)

4. Repeat:
o Reassign each point to the new centroids:

 Points closer to (1.17, 1.47) are in Cluster 1.

 Points closer to (7.33, 9.0) are in Cluster 2.
o Recompute centroids again and repeat the process until the centroids stabilize (no
significant change).
Result:
After several iterations, the centroids will converge, and the dataset will be partitioned into two
clusters:
 Cluster 1: Points closer to centroid (1.17, 1.47).
 Cluster 2: Points closer to centroid (7.33, 9.0).
Key Points:
 K-means works best when clusters are spherical and roughly equal in size.
 The algorithm is sensitive to the initial choice of centroids. If the initial centroids are
poorly chosen, the result may not be optimal.
 It's often recommended to run the algorithm multiple times with different initializations
and choose the best result.

Machine Learning Notes Anna University
100% (1)
Machine Learning Notes Anna University
14 pages
Unit 4
No ratings yet
Unit 4
29 pages
Module 2: Divide and Conquer: Design and Analysis of Algorithms 18CS42
No ratings yet
Module 2: Divide and Conquer: Design and Analysis of Algorithms 18CS42
82 pages
09.unsupervised Learning
No ratings yet
09.unsupervised Learning
50 pages
Unit IV
No ratings yet
Unit IV
51 pages
K Means
No ratings yet
K Means
26 pages
ML Unit 4
No ratings yet
ML Unit 4
110 pages
ML Unit-2 (CEC)
No ratings yet
ML Unit-2 (CEC)
96 pages
ML Mod 5
No ratings yet
ML Mod 5
47 pages
Unit IV
No ratings yet
Unit IV
96 pages
ML Algorithms Week 3
No ratings yet
ML Algorithms Week 3
30 pages
Kmea
No ratings yet
Kmea
53 pages
Lect 10 - Unsupervised Learning
No ratings yet
Lect 10 - Unsupervised Learning
50 pages
Wa0033.
No ratings yet
Wa0033.
38 pages
Presentation 1
No ratings yet
Presentation 1
47 pages
Unit - 4 DWDM
No ratings yet
Unit - 4 DWDM
27 pages
Data Mining
No ratings yet
Data Mining
32 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
Unit 4
No ratings yet
Unit 4
22 pages
Digital Computer Concept and Practice: Unsupervised Learning
No ratings yet
Digital Computer Concept and Practice: Unsupervised Learning
21 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Machine Learning With Python - Machine Learning Algorithms - K-Means Clustering Algo
No ratings yet
Machine Learning With Python - Machine Learning Algorithms - K-Means Clustering Algo
25 pages
K Means Algorithms
No ratings yet
K Means Algorithms
27 pages
Unit 4 Aam
No ratings yet
Unit 4 Aam
26 pages
(KtabPDF Com) xrwA7TEBGp
No ratings yet
(KtabPDF Com) xrwA7TEBGp
32 pages
ML Unit5 Notes
No ratings yet
ML Unit5 Notes
18 pages
AI Algoritm Course
No ratings yet
AI Algoritm Course
19 pages
ML-Notes - 4 and 5 - 16 Marks
No ratings yet
ML-Notes - 4 and 5 - 16 Marks
21 pages
Unsupervised ML Clustering
No ratings yet
Unsupervised ML Clustering
15 pages
Unit 4 Machine Learning
No ratings yet
Unit 4 Machine Learning
12 pages
3.1 K - Means
No ratings yet
3.1 K - Means
16 pages
Ai&ml 2
No ratings yet
Ai&ml 2
15 pages
ML - Unit-6 KMeans
No ratings yet
ML - Unit-6 KMeans
20 pages
K Means Clustering
No ratings yet
K Means Clustering
11 pages
A Paper With 12pt Global Font Size
No ratings yet
A Paper With 12pt Global Font Size
13 pages
Unit 4
No ratings yet
Unit 4
19 pages
U1 - KMeans - 5th Sem - DS
No ratings yet
U1 - KMeans - 5th Sem - DS
14 pages
DWM Exp4
No ratings yet
DWM Exp4
9 pages
UNIT 4 K-Means Clustring
No ratings yet
UNIT 4 K-Means Clustring
13 pages
Pa ZG512 Ec-3r First Sem 2022-2023
No ratings yet
Pa ZG512 Ec-3r First Sem 2022-2023
5 pages
ML Clustering2
No ratings yet
ML Clustering2
11 pages
ML Assign4
No ratings yet
ML Assign4
7 pages
Artificial Intelligence Lab 10
No ratings yet
Artificial Intelligence Lab 10
8 pages
Introduction To Unsupervised Learning:: Clustering
No ratings yet
Introduction To Unsupervised Learning:: Clustering
21 pages
Lab07 KMeans Assignment
No ratings yet
Lab07 KMeans Assignment
13 pages
KMEANS
No ratings yet
KMEANS
9 pages
Kmeans
No ratings yet
Kmeans
5 pages
"These Are Just Rough Notes For References" What Is K-Means Clustering
No ratings yet
"These Are Just Rough Notes For References" What Is K-Means Clustering
9 pages
Clustering Personal
No ratings yet
Clustering Personal
9 pages
ML Notes 1
No ratings yet
ML Notes 1
3 pages
K-Means Algo
No ratings yet
K-Means Algo
4 pages
A Novel Approach of Implementing An Optimal K-Means Plus Plus Algorithm For Scalar Data
No ratings yet
A Novel Approach of Implementing An Optimal K-Means Plus Plus Algorithm For Scalar Data
6 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
Experiment No 7
No ratings yet
Experiment No 7
4 pages
Pilot
No ratings yet
Pilot
3 pages
Machine Algorithm
No ratings yet
Machine Algorithm
3 pages
Unsupervisd Learning Algorithm
No ratings yet
Unsupervisd Learning Algorithm
6 pages
Simple K Means
No ratings yet
Simple K Means
3 pages
13: Clustering: Unsupervised Learning - Introduction
No ratings yet
13: Clustering: Unsupervised Learning - Introduction
4 pages
Latihan Python
100% (1)
Latihan Python
4 pages
T2 Searching Algorithms
No ratings yet
T2 Searching Algorithms
25 pages
Module 1, Graph Theory 1
No ratings yet
Module 1, Graph Theory 1
133 pages
CH-6 DM Clustering
No ratings yet
CH-6 DM Clustering
28 pages
Dynamic Programming Technique
No ratings yet
Dynamic Programming Technique
3 pages
Sol Mock Exam
No ratings yet
Sol Mock Exam
10 pages
1.JAVA Practicals
No ratings yet
1.JAVA Practicals
33 pages
Polynomial Tic-Tac-Toe 1
No ratings yet
Polynomial Tic-Tac-Toe 1
1 page
Regula Falsi
No ratings yet
Regula Falsi
51 pages
Assignment Grade: X Subject: Mathematics Topic: Polynomials: 1 Mark Questions
No ratings yet
Assignment Grade: X Subject: Mathematics Topic: Polynomials: 1 Mark Questions
2 pages
Unit - 1: Part - A
No ratings yet
Unit - 1: Part - A
6 pages
Data Structures and Algorithms PYQ
No ratings yet
Data Structures and Algorithms PYQ
3 pages
Unit 4 - Digital Signal Processing - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Digital Signal Processing - WWW - Rgpvnotes.in
7 pages
DSA - TT2 - Practice Questions
No ratings yet
DSA - TT2 - Practice Questions
3 pages
Байкал - Abstracts - en2014
No ratings yet
Байкал - Abstracts - en2014
181 pages
Hw3 Updated
0% (1)
Hw3 Updated
2 pages
Today's Lecture: String Matching Algorithm Naïve / Brute Force RK
No ratings yet
Today's Lecture: String Matching Algorithm Naïve / Brute Force RK
20 pages
CS3452 - Theory of Computation - 01
No ratings yet
CS3452 - Theory of Computation - 01
2 pages
Data Compression (KCS-064) FIRST SESSIONAL EXAM 2020-21 EVEN SEMESTER B.TECH CSE-3RD YEAR
No ratings yet
Data Compression (KCS-064) FIRST SESSIONAL EXAM 2020-21 EVEN SEMESTER B.TECH CSE-3RD YEAR
10 pages
MIC College of Technology: Write A C Program To Search An Element in The List Using Binary Search Technique
No ratings yet
MIC College of Technology: Write A C Program To Search An Element in The List Using Binary Search Technique
51 pages
Project Proposal AI Assignment
No ratings yet
Project Proposal AI Assignment
9 pages
A Study On Complements of Zero-Divisor Graphs of Some Algebraic Structures
No ratings yet
A Study On Complements of Zero-Divisor Graphs of Some Algebraic Structures
35 pages
Practice Problem On RF
No ratings yet
Practice Problem On RF
8 pages
Laboratory Activity #4 Numsol
No ratings yet
Laboratory Activity #4 Numsol
4 pages
Simple AI Problem 2
No ratings yet
Simple AI Problem 2
6 pages
Consensus Clustering
No ratings yet
Consensus Clustering
7 pages
WWW - Manaresults.co - In: Set No. 1
No ratings yet
WWW - Manaresults.co - In: Set No. 1
2 pages
Xy Routing - Deterministic - Coding - VL
No ratings yet
Xy Routing - Deterministic - Coding - VL
2 pages
Study Guide 312
No ratings yet
Study Guide 312
1 page
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
From Everand
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
Fouad Sabry
No ratings yet

Assignment 1ML

Uploaded by

Assignment 1ML

Uploaded by

ASSIGNMENT 1

v. Early stopping cross- Monitoring the model's performance on a validation set

2. Assign points to the nearest centroid:

 (1, 2), (1.5, 1.8), (1, 0.6)

 (5, 8), (8, 8), (9, 11)

 Mean of (5, 8), (8, 8), (9, 11) = (7.33, 9.0)

 Points closer to (1.17, 1.47) are in Cluster 1.

You might also like