machine learning assignment2

The document discusses various exercises related to machine learning, including the impact of epochs on model performance, the effects of threshold values in uncertainty sampling, and the comparison between OSL and OAL algorithms. It also covers the implications of PCA on dimensionality reduction, particularly when eigenvalues are identical or when analyzing two-dimensional datasets. The document emphasizes the importance of selecting appropriate metrics and strategies to improve model accuracy and generalization.

Uploaded by

MAFOQ UL HASSAN

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

machine learning assignment2

Uploaded by

MAFOQ UL HASSAN

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Machine Learning Assignment

Exercise 1: a, b)
Number of input features = 128 (No. of columns/features in Dataset except for the last one)
Number of outputs = 4 (as we only have 4 labels in Ground-Truth i.e., 0, 1, 2, 3)

Surprisingly, we have observed better performance with just 1 epoch as compared to 10

epochs. There are several possible reasons that could have contributed to this behavior, such
as overfitting, improper learning rate, presence of outliers in data, or suboptimal choices of
layers, activation functions, and regularization in the neural network. It is also important to
consider that accuracy is not the only metric that matters in machine learning. Other metrics,
such as precision, recall, and F1-score, may be more relevant depending on the problem you
are trying to solve.

Exercise 1: c)
As we increase the threshold value (theta) in fixed uncertainty sampling, the algorithm
becomes more conservative in its selection of examples. That means it selects examples that
are more certain, i.e., the maximum prediction probability of the selected examples is higher.
When we set a higher threshold value, such as theta=0.8, the algorithm selects only the most
confident examples, which are easier to learn and contribute positively to the model's
generalization ability. This helps to eliminate noise or outliers in the data, which leads to
improved accuracy. However, setting a very high threshold can result in a smaller pool of
labeled examples, leading to under-fitting or insufficient data for the algorithm to learn from.

Exercise 1: d)

Initially, with the higher initial threshold of 0.8, the model is more selective in choosing the
samples for training, which means it misses out on important samples that improves the
model performance. As the model gets trained on more samples, the model's predictions
become more accurate, and the maximum predicted probabilities become more reliable.
Therefore, the threshold is lowered, allowing the model to explore a wider range of samples
to learn from. This is why, as time progresses (time-step increases), smaller values of theta
(0.6 and 0.4) start to outperform the larger initial accuracy value of 0.8, as they allow the
model to explore a wider range of samples to learn from and improve the model's
performance.
Exercise 1: e)

Over time, the OSL algorithm gets to train on more and more examples as compared to the
OAL algorithm that selectively chooses only those examples that are most uncertain. As more
data is seen, the OSL algorithm is generalizing better, leading to better performance. The
variable uncertainty sampling strategy with an initial threshold of 0.8 is very selective in
choosing only those examples that it is very uncertain about, resulting in a very sparse and
possibly biased set of examples. This led to overfitting and poor generalization over time.

Exercise 2: a)
i. Distance matrix is calculated using Python (Code file attached)

ii. New Centroids after 1st Iteration: (Code file attached)

` The new point will be added to cluster 1 because the Euclidean distance is
shortest between Centroid of cluster 1 and new datapoint.

Exercise 2: b)
If the resulting eigenvalues from PCA on a two-dimensional dataset are identical, it implies
that the two dimensions are equally significant and contribute equally to the variability in the
data. In other words, both dimensions are equally important to describe the underlying
patterns in the data.

Whether or not it is a good choice to pursue dimensionality reduction depends on the specific
situation. If the dataset contains many samples, reducing the dimensionality could be
beneficial for computational efficiency and ease of visualization. However, if the dataset is
relatively small and the two dimensions are equally important, reducing the dimensionality
could result in information loss and should be avoided.

Examples:

i. When the dataset consists of points on a line, there is only one dimension in the
data, the eigenvalues resulting from PCA would be identical, indicating that the
single dimension explains all the variability in the data. In this case, pursuing
dimensionality reduction would not be beneficial since there is only one
dimension to begin with.
ii. When the dataset has both dimensions (horizontal and vertical) and they are
equally important in describing the variability in the data, if we perform PCA on
this dataset, we expect to see identical eigenvalues. In this case PCA will result in
information loss and should be avoided.

Exercise 2: c)
If we apply PCA on a two-dimensional dataset and get the eigenvalues 6 and 2, it means that
the first principal component explains 75% of the variability in the data, while the second
principal component explains 25% of the variability in the data. If we pursue dimensionality
reduction on this dataset, we will keep only the first principal component, which explains most
of the variability in the data, and discard the second principal component. This would result
in a one-dimensional dataset that retains most of the information in the original dataset, while
reducing its dimensionality.
For a three-dimensional dataset with eigenvalues 0, 1, and 0, it means that two of the principal
components have zero variance, while the remaining principal component explains all the
variance in the data. This can happen if the three dimensions are linearly dependent, meaning
that one dimension is a linear combination of the other two dimensions. In this case, reducing
the dimensionality to one would retain all the information in the data, as the other two
dimensions provide no additional information.

SP18 CS182 Midterm Solutions - Edited
No ratings yet
SP18 CS182 Midterm Solutions - Edited
14 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Lesson Plan 1 Alphabetic Principle
No ratings yet
Lesson Plan 1 Alphabetic Principle
2 pages
The Education Act, 2004
100% (4)
The Education Act, 2004
25 pages
O4MD 01 Introduction
No ratings yet
O4MD 01 Introduction
10 pages
Unit 2
No ratings yet
Unit 2
18 pages
ML assignment
No ratings yet
ML assignment
7 pages
Machine Learning1
No ratings yet
Machine Learning1
8 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Machine Learning Bits
100% (2)
Machine Learning Bits
28 pages
5.Feauture Engineering
No ratings yet
5.Feauture Engineering
34 pages
UNIT1 ERM and PAC Learning
No ratings yet
UNIT1 ERM and PAC Learning
20 pages
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Machine Learning Fundamentals
No ratings yet
Machine Learning Fundamentals
52 pages
Unit 4 A
No ratings yet
Unit 4 A
16 pages
NIPS 2017 Information Theoretic Analysis of Generalization Capability of Learning Algorithms Paper
No ratings yet
NIPS 2017 Information Theoretic Analysis of Generalization Capability of Learning Algorithms Paper
10 pages
Machine Learning Interview Questions PDF
No ratings yet
Machine Learning Interview Questions PDF
14 pages
Unit -4-NNDL- Notes
No ratings yet
Unit -4-NNDL- Notes
14 pages
Computing For Data Sciences: Introduction To Regression Analysis
No ratings yet
Computing For Data Sciences: Introduction To Regression Analysis
9 pages
Stability and Generalization: CMAP, Ecole Polytechnique F-91128 Palaiseau, FRANCE
No ratings yet
Stability and Generalization: CMAP, Ecole Polytechnique F-91128 Palaiseau, FRANCE
28 pages
Best Generalisation Error PDF
No ratings yet
Best Generalisation Error PDF
28 pages
Online Variance Minimization: T T 1 T 1 T T 1
No ratings yet
Online Variance Minimization: T T 1 T 1 T T 1
24 pages
Unit 4 (PCA)
No ratings yet
Unit 4 (PCA)
12 pages
Unit-3
No ratings yet
Unit-3
47 pages
Exam Spring 10
No ratings yet
Exam Spring 10
10 pages
NEC ML UNIT-III Complete Final
No ratings yet
NEC ML UNIT-III Complete Final
22 pages
RESEARCH SEMINAR REPORT 2
No ratings yet
RESEARCH SEMINAR REPORT 2
6 pages
Occam's Razor: A Priori
No ratings yet
Occam's Razor: A Priori
4 pages
1904 04755 PDF
No ratings yet
1904 04755 PDF
43 pages
jpskycak-2018-intuiting-predictive-algorithms-1
No ratings yet
jpskycak-2018-intuiting-predictive-algorithms-1
16 pages
Prediction Under Uncertainty in Reservoir Modeling: Mike Christie, Sam Subbey, Malcolm Sambridge
No ratings yet
Prediction Under Uncertainty in Reservoir Modeling: Mike Christie, Sam Subbey, Malcolm Sambridge
8 pages
Building Good Training Sets
No ratings yet
Building Good Training Sets
51 pages
Curve-Fitting The Science and Art of Approximation (D. James Benton) (Z-Library)
No ratings yet
Curve-Fitting The Science and Art of Approximation (D. James Benton) (Z-Library)
162 pages
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
set3sol-2022
No ratings yet
set3sol-2022
3 pages
HW_02
No ratings yet
HW_02
3 pages
EE2211_Past_Paper
No ratings yet
EE2211_Past_Paper
14 pages
Finalexam01summer PDF
No ratings yet
Finalexam01summer PDF
2 pages
Random Manifold Sampling and Joint Sparse Regularization For Multi-Label Feature Selection
No ratings yet
Random Manifold Sampling and Joint Sparse Regularization For Multi-Label Feature Selection
17 pages
Machine Learning-2
No ratings yet
Machine Learning-2
16 pages
Stevens' Handbook of Experimental Psychology and Cognitive Neuroscience, Methodology
From Everand
Stevens' Handbook of Experimental Psychology and Cognitive Neuroscience, Methodology
Wiley
No ratings yet
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
NeurIPS-2018-the-price-of-fair-pca-one-extra-dimension-Paper
No ratings yet
NeurIPS-2018-the-price-of-fair-pca-one-extra-dimension-Paper
12 pages
Avishek Nag - Pragmatic Machine Learning With Python-BPB Publications (2020) - Pages-248-260
No ratings yet
Avishek Nag - Pragmatic Machine Learning With Python-BPB Publications (2020) - Pages-248-260
13 pages
L06 Features
No ratings yet
L06 Features
44 pages
ML Unit2 Classppt
No ratings yet
ML Unit2 Classppt
44 pages
UNIT04
No ratings yet
UNIT04
35 pages
3
No ratings yet
3
12 pages
EDA - Exploratory Data Analysis
No ratings yet
EDA - Exploratory Data Analysis
16 pages
AIDS C04-Session-20
No ratings yet
AIDS C04-Session-20
17 pages
Machine Learning in 10 Pages PDF
No ratings yet
Machine Learning in 10 Pages PDF
10 pages
MLquestions
No ratings yet
MLquestions
26 pages
Curse of Dimensionality, Dimensionality Reduction With PCA
No ratings yet
Curse of Dimensionality, Dimensionality Reduction With PCA
36 pages
CS231n Convolutional Neural Networks For Visual Recognition
No ratings yet
CS231n Convolutional Neural Networks For Visual Recognition
8 pages
E6 - Report: Problem 1
No ratings yet
E6 - Report: Problem 1
16 pages
Data Challenge - NC Soft
No ratings yet
Data Challenge - NC Soft
4 pages
assignment
No ratings yet
assignment
7 pages
Exercise - 6: DS203-2024-S1 Problem1:: Statistics
No ratings yet
Exercise - 6: DS203-2024-S1 Problem1:: Statistics
10 pages
hw1 PDF
No ratings yet
hw1 PDF
6 pages
Lec 1 - 2 - Linear Programming PDF
No ratings yet
Lec 1 - 2 - Linear Programming PDF
1 page
Design of Experiments For The NIPS 2003 Variable Selection Benchmark
No ratings yet
Design of Experiments For The NIPS 2003 Variable Selection Benchmark
30 pages
Chapter 3 Summary
No ratings yet
Chapter 3 Summary
8 pages
Internet-Of-Things (Iot) : Summer Engineering Program 2018 University of Notre Dame
No ratings yet
Internet-Of-Things (Iot) : Summer Engineering Program 2018 University of Notre Dame
50 pages
Internet-Of-Things (Iot) : Summer Engineering Program 2018 University of Notre Dame
No ratings yet
Internet-Of-Things (Iot) : Summer Engineering Program 2018 University of Notre Dame
42 pages
The 7 C's
No ratings yet
The 7 C's
24 pages
Introduction To Technical Writing
No ratings yet
Introduction To Technical Writing
11 pages
Feedback Control Systems: Root Locus
No ratings yet
Feedback Control Systems: Root Locus
25 pages
Signals Flow Graphs
No ratings yet
Signals Flow Graphs
14 pages
Feedback Control Systems: Chapter # 04
No ratings yet
Feedback Control Systems: Chapter # 04
22 pages
EEcon L15W11
No ratings yet
EEcon L15W11
10 pages
Digital literacy competency of elementary school teachers: A systematic literature review
No ratings yet
Digital literacy competency of elementary school teachers: A systematic literature review
9 pages
Toaz - Info Curriculum Map Pe Grade 10 PR
No ratings yet
Toaz - Info Curriculum Map Pe Grade 10 PR
9 pages
Research 10
No ratings yet
Research 10
12 pages
"A Study of Mutual Fund Indusrty": A Project Report ON
No ratings yet
"A Study of Mutual Fund Indusrty": A Project Report ON
4 pages
UNESCO and Comparative Education
86% (7)
UNESCO and Comparative Education
19 pages
Portofolio Bahasa Inggris: Yayasan Pendidikan Islam Al-Barkah Sma Al-Barkah Cikalongkulon
No ratings yet
Portofolio Bahasa Inggris: Yayasan Pendidikan Islam Al-Barkah Sma Al-Barkah Cikalongkulon
8 pages
Gagne's Nine Events
No ratings yet
Gagne's Nine Events
4 pages
Training Design: Department of Education
No ratings yet
Training Design: Department of Education
4 pages
Brief Summary of Sooner Safer Happier
No ratings yet
Brief Summary of Sooner Safer Happier
11 pages
Learning Theories of Constructivism AND Cognitive Constructivism by Jean Piaget
No ratings yet
Learning Theories of Constructivism AND Cognitive Constructivism by Jean Piaget
17 pages
Assignment Brief Unit 2799
No ratings yet
Assignment Brief Unit 2799
4 pages
Introduction To Generative Models - Post Quiz - Attempt Review
No ratings yet
Introduction To Generative Models - Post Quiz - Attempt Review
4 pages
LEAD 578 53 4225 Syllabus
No ratings yet
LEAD 578 53 4225 Syllabus
28 pages
math-6-Q4 DESCRIBING THE MEANING OF PROBABILITY
No ratings yet
math-6-Q4 DESCRIBING THE MEANING OF PROBABILITY
11 pages
Abbreviations and Acronyms Commonly Used Within APS and ACTL June 2020
No ratings yet
Abbreviations and Acronyms Commonly Used Within APS and ACTL June 2020
2 pages
Week 1
No ratings yet
Week 1
3 pages
Teaching-Guide-Catchup-English 8
No ratings yet
Teaching-Guide-Catchup-English 8
2 pages
UNLV/Department of Teaching & Learning Elementary Lesson Plan Template
No ratings yet
UNLV/Department of Teaching & Learning Elementary Lesson Plan Template
7 pages
Menon ExMachina
No ratings yet
Menon ExMachina
3 pages
Kra 1: Content Knowledge and Pedagogy
No ratings yet
Kra 1: Content Knowledge and Pedagogy
3 pages
ETEC 511: Foundations of Educational Technology Dr. Franc Feng
No ratings yet
ETEC 511: Foundations of Educational Technology Dr. Franc Feng
3 pages
CLC 12 - Capstone Project Draft Proposal
No ratings yet
CLC 12 - Capstone Project Draft Proposal
4 pages
[The Modern Language Journal vol. 73 iss. 2] Review by_ Terry L. Ballman - Initiatives in Communicative Language Teachingby Sandra J. Savignon_ Margie S. Berns (1989) [10.2307_326581] - libgen.li
No ratings yet
[The Modern Language Journal vol. 73 iss. 2] Review by_ Terry L. Ballman - Initiatives in Communicative Language Teachingby Sandra J. Savignon_ Margie S. Berns (1989) [10.2307_326581] - libgen.li
3 pages
Health: Quarter 4 - Module 4a: Professional Health Career Planning
No ratings yet
Health: Quarter 4 - Module 4a: Professional Health Career Planning
14 pages
The Limitations of Deep Learning - Part1
No ratings yet
The Limitations of Deep Learning - Part1
5 pages
Charles Spearman
50% (2)
Charles Spearman
8 pages
Educ 4-Technology For Teaching and Learning 1
100% (2)
Educ 4-Technology For Teaching and Learning 1
7 pages
Development in Infancy A Contemporary Introduction 5th Edition Marc H. Bornstein download
100% (1)
Development in Infancy A Contemporary Introduction 5th Edition Marc H. Bornstein download
48 pages

machine learning assignment2

Uploaded by

machine learning assignment2

Uploaded by

Machine Learning Assignment

Surprisingly, we have observed better performance with just 1 epoch as compared to 10

ii. New Centroids after 1st Iteration: (Code file attached)

You might also like