0% found this document useful (0 votes)

26 views

Introduction to Machine Learning

The document provides an introduction to machine learning, detailing its definition as a branch of artificial intelligence focused on algorithm design for behavior evolution based on data. It outlines the machine learning process lifecycle, including data acquisition, feature selection, and algorithm application, while discussing supervised and unsupervised learning methods. Additionally, it covers model evaluation metrics, overfitting vs. underfitting, regularization techniques, and ensemble methods, along with various applications and tools in the field.

Uploaded by

Asma Ayub

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

Introduction to Machine Learning

Uploaded by

Asma Ayub

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 116

INTRODUCTION TO MACHINE LEARNING

OVERVIEW OF ML
OVERVIEW OF ML

2
1

3 6

3
WHAT IS MACHINE LEARNING?

A branch of artificial intelligence, concerned with the design

and development of algorithms that allow computers to evolve
behaviors based on empirical data.

As intelligence requires knowledge, it is necessary for the

computers to acquire knowledge.
MACHINE LEARNING PROCESS LIFECYCLE

6
DATA

1/10/2023 Data Science in Practice 7

FEATURES

8
ALGORITHMS
The success of machine learning system also depends on
the algorithms.

The algorithms control the search to find and build the

knowledge structures.

The learning algorithms should extract useful

information from training examples.
LEARNING DIMENSIONS

10
FLAVORS OF CLASSIFICATION
SOME USEFUL CLASSIFIER
Decision Trees
--Examples are used to
--Learn topology
If patrons=full and day=Friday
K-nearest --Order of questions
then wait (0.3/0.7) neighbors
If wait>60 and Reservation=no
then wait (0.4/0.9)

Association rules
--Examples are used to
--Learn support and
confidence of association
rules SVMs
Neural Nets
--Examples are used to
--Learn topology
--Learn edge weights
Naïve bayes
(bayesnet learning)
--Examples are used to
--Learn topology
--Learn CPTs
SUPERVISED LEARNING

19
UNSUPERVISED LEARNING

28
INSTANCE BASED LEARNING ALGORITHMS

Instances are nothing but subsets of datasets, and instance based learning models work
on an identified instance or groups of instances that are critical to the problem.
The results across instances are compared, which can include an instance of new data
as well.
This comparison uses a particular similarity measure to find the best match and predict.
Instance based methods are also called lazy learning Algorithms
Here the focus is on the representation of the instances and similarity measures for
comparison between instances.
Basic idea:
⚫ If it walks like a duck, quacks like a duck, then it’s probably a duck

49
TRAINING AND TESTING

Data acquisition Practical usage

Universal set
(unobserved)

Training set Testing set

(observed) (unobserved)
TRAINING AND TESTING
Training is the process of making the system able to
learn.

No free lunch rule:

⚫ Training set and testing set come from the same distribution
⚫ Need to make some assumptions or bias
MODEL EVALUATION

a: TP (true positive)
PREDICTED CLASS b: FN (false
negative)
Class=Yes Class=No
c: FP (false positive)

Class=Yes a b d: TN (true negative)

ACTUAL
CLASS Class=No c d

1/10/2023 Data Science in Practice 62

TRUE POSITIVE & FALSE POSITIVE
Consider a dataset of dog images where 6 images are ‘dog’ images,
and 4 images are ‘No dog’ images.
So we have a binary classification here with ‘dog’ vs ‘no dog’.
Dog -> Positive class
No dog -> Negative class
Suppose our model has the following predictions.
TRUE POSITIVE & FALSE POSITIVE
Dog -> Positive class
No dog -> Negative class
Lets first talk about prediction which are ‘Dog’. Forget ‘No Dog’
prediction.

There are 7 dog predictions.

Out of 7, there are 4 correct predictions.
Correct predictions are called True Positive (TP). TP tells that how
many samples from positive class are predicted correctly.
TRUE NEGATIVE & FALSE NEGATIVE
Dog -> Positive class
No dog -> Negative class
Lets now talk about prediction which are ‘No Dog’. Forget ‘Dog’
prediction.
ACCURACY
PRECISION AND RECALL

For PRECISION, think about predictions as the base

For RECALL, think about truth as the base

PRECISION and RECALL are for individual classes.

PRECISION – POSITIVE CLASS(DOG)
RECALL- POSITIVE CLASS(DOG)
PRECISION – NEGATIVE CLASS(NO DOG)
RECALL – NEGATIVE CLASS(NO DOG)
F1 SCORE
F1 score is the harmonic mean of precision and recall.

It is used to evaluate binary classification systems.

F1 score is balancing precision and recall on the

positive class while accuracy looks at correctly classified
observations both positive and negative.
PERFORMANCE
There are several factors affecting the performance:
⚫ Types of training provided
⚫ The form and extent of any initial background knowledge
⚫ The type of feedback provided
⚫ The learning algorithms used

Two important factors:

⚫ Modeling
⚫ Optimization
Under-fitting VS. Over-fitting (fixed N)

error

(model = hypothesis + loss

functions)
OVERFIT & UNDERFIT MODEL
OVERFIT & UNDERFIT MODEL
Consider the example of house price prediction based on one
feature.
We are trying to predict the value of home price based on sq ft
area.
Figure shows the scatter plot of all the samples present in the
dataset.
To train a model, we split this dataset into train and test
samples.
OVERFIT & UNDERFIT MODEL
Consider all blue dots as training samples and orange dots as
test samples.
We can train a model that can fit the blue dots.
Our trained model can be an overfit model, an underfit model
or a balanced fit model.
OVER FIT MODEL

This figure shows an overfit model. An overfit model tries to

exactly fit the training samples and the training error becomes
close to zero.
But the test error can become very high.
OVER FIT MODEL
The training and test samples are selected at random. Usually 80/20
ratio.
Figure shows the same problem but with different selection of
train/test samples.
HIGH VARIANCE
When there is high variability in Test Error based on the selection of
training samples.
As samples are being selected randomly, the test error varies randomly
which is not good and a common issue in overfit models.
LOW VARIANCE
Lets consider a simple model, a linear model, which is underfitting
the training samples.
Even if we change the training and testing samples, there is not
bigger difference in the train and test dataset error.
LOW VARIANCE
Based on the samples selected for training and testing, the test error
doesn’t vary that much.
Variance is all about Test Error.
BIAS
Bias is a measurement of how accurately a model can capture a
pattern in training dataset.

Bias is all about Train error. When train error is high, it is said to
have high bias.

In case of overfiiting, bias will be low because the train error will
be minimum.
BALANCED FIT MODEL
BALANCED FIT MODEL
BULLS EYE DIAGRAM
Inner circle represents the ground truth.
WAYS TO GET BALANCED FIT MODEL
Cross Validation
Regularization
Dimensionality Reduction
Ensemble Techniques
K FOLD CROSS VALIDATION
OPTIONS TO TRAIN A MODEL
Use all data to train the model and then use some of that
data to test the model.
PROBLEMS WITH THIS APPROACH?
We are testing the model on the same data on which it
was trained.
Split available dataset into training and test sets
PROBLEMS WITH THIS APPROACH?
This approach works well most of the times.
However, suppose a case where most of the training samples
were from one class and only few training samples were from
other class.
And most of the test samples are from other class.
K FOLD CROSS VALIDATION
We divide the whole dataset into folds. Lets say 5 folds.
And then we run multiple iterations.
In first iteration, the first fold is used to test the model
and remaining four folds are used to train the model.
In second iteration, the second fold is used to test the
model and remaining folds are used to train the model.
This process is repeated till the last fold, where last fold
is used for testing and remaining folds are used to train
the model.
Lastly, results from each iteration are averaged.
K FOLD CROSS VALIDATION
L1 AND L2 REGULARIZATION
L1 and L2 regularizations are some of the techniques that can be
used to address the overfitting issue.
Consider the equation for overfit case, if in this equation I
somehow make sure that my theta 3 and theta 4 is almost close to
0, then the equation will change.
L1 AND L2 REGULARIZATION
the idea here is to shrink your parameters your parameters which
is theta 0, theta 3, theta 4, even theta 2, theta 1 if you can

reduce this parameter if you can keep these parameters smaller

then you can get a better equation for your prediction function.

Now how do we do that?

L1 AND L2 REGULARIZATION
when we run training we pass first sample and then we calculate
‘y’
predicted on some randomly initialized weights, then we compare
it with the truth value and then this is how we call calculate mean
square error or MSE.
L1 AND L2 REGULARIZATION
Here y predictor is actually h theta x i
where h theta x i could be higher order polynomial equation like
this
and x 1, x 2 are nothing but features.
L2 REGULARIZATION
So in this equation what if I add this particular parameter?
There is lambda which is a free parameter, we can control it it's
like a tuning knob, and we are making a square of each of these
theta parameters.
So now if the theta gets bigger this value will get bigger, the
error will be big
And the model will not converge.
L2 REGULARIZATION
Essentially we are penalizing higher values of theta here.

So whenever model tries to make the theta value higher, we are

adding a penalty here. So by adding this penalty we are making
sure that theta value doesn't go too high so they will remain very
small.

we can fine tune this thing using parameter lambda

So this is called L2 regularization it is called L2 because we are

using a square
L1 REGULARIZATION
L1 regularization, we used the absolute value. So that is
the only difference between L1 and L2 that in L1 we are
using absolute value of theta parameter.
Here again if theta is bigger the error overall error is big
and it kind of acts as a penalty so that during the training
overall the value of theta will remain smaller.
DIMENSIONALITY REDUCTION
Some example dimensionality reduction methods are listed as
follows:
⚫ Multidimensional scaling (MDS)
⚫ Principal component analysis (PCA)
⚫ Projection pursuit (PP)
⚫ Partial least squares (PLS) regression

102
DIMENSIONALITY REDUCTION
ENSEMBLE METHODS
This is a very powerful and widely adopted class of techniques.
As the name suggests, ensemble methods encompass multiple
models that are built independently and the results of these
models are combined and responsible for overall predictions.
It is critical to identify what independent models are to be
combined or included, how the results need to be combined, and
in what way to achieve the required result.
The subset of models that are combined is sometimes referred to
as weaker models as the results of these models need not
completely fulfill the expected outcome in isolation.

104
ENSEMBLE METHODS
ENSEMBLE METHODS
The following are some of the Ensemble method algorithms:
⚫ Random forest
⚫ Bagging: Sampling with replacement
⚫ AdaBoost
⚫ Bootstrapped Aggregation (Boosting)
⚫ Stacked generalization (blending)
⚫ Gradient boosting machines (GBM)

106
MACHINE LEARNING APPLICATIONS

113
MACHINE LEARNING TOOLS AND FRAMEWORKS

115
MACHINE LEARNING TOOLS AND FRAMEWORKS

116

Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Earned Value Practice Exercises
100% (1)
Earned Value Practice Exercises
4 pages
Identify Wyckoff Events 2. Identify Wyckoff Phases 3. Identify Changes of Character
100% (10)
Identify Wyckoff Events 2. Identify Wyckoff Phases 3. Identify Changes of Character
29 pages
First Periodical Grade 9
100% (4)
First Periodical Grade 9
3 pages
06 Regularizations
No ratings yet
06 Regularizations
42 pages
Overfitting & Feature Engineering.pptx
No ratings yet
Overfitting & Feature Engineering.pptx
37 pages
L2_Problems in ML & Performance Evaluation - Copy
No ratings yet
L2_Problems in ML & Performance Evaluation - Copy
30 pages
All DL
No ratings yet
All DL
72 pages
15-The Bias - Variance - Trade-Off-08-04-2024
No ratings yet
15-The Bias - Variance - Trade-Off-08-04-2024
23 pages
ML 5
No ratings yet
ML 5
14 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
32 pages
AI - W7L14
No ratings yet
AI - W7L14
22 pages
unit 2 (1)
No ratings yet
unit 2 (1)
23 pages
Lec-1 Bias-variance-Tradeoff
No ratings yet
Lec-1 Bias-variance-Tradeoff
24 pages
Model Generalization
No ratings yet
Model Generalization
117 pages
Gansp Awareness Quiz PDF
No ratings yet
Gansp Awareness Quiz PDF
13 pages
Machine Leafning
No ratings yet
Machine Leafning
5 pages
CMPE257 - W2C3 - ML Fundamentals_ Part 2
No ratings yet
CMPE257 - W2C3 - ML Fundamentals_ Part 2
34 pages
Lecture 9 - Evaluations
No ratings yet
Lecture 9 - Evaluations
68 pages
unit 4
No ratings yet
unit 4
34 pages
SML Updated UNIT 4
No ratings yet
SML Updated UNIT 4
44 pages
Underfitting and Overfitting Slides and Transcript
No ratings yet
Underfitting and Overfitting Slides and Transcript
13 pages
Theory in Machine Learning
No ratings yet
Theory in Machine Learning
60 pages
Classification
No ratings yet
Classification
53 pages
Chapter 7 - LAST
No ratings yet
Chapter 7 - LAST
29 pages
Machine Learning General: Definiton
No ratings yet
Machine Learning General: Definiton
14 pages
10: Advice For Applying Machine Learning: Deciding What To Try Next
No ratings yet
10: Advice For Applying Machine Learning: Deciding What To Try Next
8 pages
ML U-4
No ratings yet
ML U-4
63 pages
DL_Unit1 (1)
100% (1)
DL_Unit1 (1)
79 pages
Unit - 2 Deep Learning
No ratings yet
Unit - 2 Deep Learning
26 pages
Deep Neural Network Module 4 Regularization
No ratings yet
Deep Neural Network Module 4 Regularization
53 pages
Week 15
No ratings yet
Week 15
41 pages
6. ML Tips and Tricks
No ratings yet
6. ML Tips and Tricks
32 pages
Complete ML Concepts
No ratings yet
Complete ML Concepts
30 pages
Machine Learning Juunit2.pdf Lands
No ratings yet
Machine Learning Juunit2.pdf Lands
7 pages
1729585037_ML11_Generalization
No ratings yet
1729585037_ML11_Generalization
40 pages
Week11_regularization and optimization
No ratings yet
Week11_regularization and optimization
75 pages
Choosing Model and Tuning
No ratings yet
Choosing Model and Tuning
20 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
19_ML_intro
No ratings yet
19_ML_intro
33 pages
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
No ratings yet
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
11 pages
ML MU Unit 2
100% (2)
ML MU Unit 2
42 pages
Unit IV
No ratings yet
Unit IV
51 pages
Quiz 1 Materials
No ratings yet
Quiz 1 Materials
159 pages
Evaluating Machine Learning Algorithms and Model Selection
No ratings yet
Evaluating Machine Learning Algorithms and Model Selection
10 pages
APS1070 Lecture (3) Slides
No ratings yet
APS1070 Lecture (3) Slides
70 pages
DL UNIT2
No ratings yet
DL UNIT2
22 pages
Fall 2022 Midterm Notes PDF
No ratings yet
Fall 2022 Midterm Notes PDF
15 pages
Lecture 2 Ai
No ratings yet
Lecture 2 Ai
24 pages
Model Evaluation
No ratings yet
Model Evaluation
29 pages
Advanced Machine Learning: Neural Networks Decision Trees Random Forest Xgboost
No ratings yet
Advanced Machine Learning: Neural Networks Decision Trees Random Forest Xgboost
61 pages
ML 01
No ratings yet
ML 01
24 pages
Machine Learning Models: by Mayuri Bhandari
No ratings yet
Machine Learning Models: by Mayuri Bhandari
48 pages
Underfitting and Overfitting
No ratings yet
Underfitting and Overfitting
4 pages
LECTURE - 1
No ratings yet
LECTURE - 1
35 pages
module3_DS_ppt
No ratings yet
module3_DS_ppt
68 pages
ML 19.03 Sidenotes
No ratings yet
ML 19.03 Sidenotes
30 pages
U&O Fitting
No ratings yet
U&O Fitting
6 pages
ML Answer Key (M.tech)
No ratings yet
ML Answer Key (M.tech)
31 pages
Overfitting
No ratings yet
Overfitting
7 pages
Ensemble methods_b45145f8047e51ea0d65d32fc07eb528
No ratings yet
Ensemble methods_b45145f8047e51ea0d65d32fc07eb528
21 pages
19 ML Intro
No ratings yet
19 ML Intro
31 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
Lecture 7
No ratings yet
Lecture 7
29 pages
Assignment 3
No ratings yet
Assignment 3
1 page
Automata
No ratings yet
Automata
20 pages
Lecture-22-Pipelining
No ratings yet
Lecture-22-Pipelining
13 pages
Lecture 8
No ratings yet
Lecture 8
26 pages
Automata
No ratings yet
Automata
27 pages
Lecture-4
No ratings yet
Lecture-4
37 pages
Lecture-21-Exception Handling
No ratings yet
Lecture-21-Exception Handling
15 pages
Reinforcement Learning.pptx
No ratings yet
Reinforcement Learning.pptx
59 pages
Lecture-2-a
No ratings yet
Lecture-2-a
10 pages
Computer Architecture Introduction
No ratings yet
Computer Architecture Introduction
27 pages
K- Nearest Neighbors.pptx
No ratings yet
K- Nearest Neighbors.pptx
33 pages
Gradient Descent and Cost Function.pptx
No ratings yet
Gradient Descent and Cost Function.pptx
35 pages
Naive Bayes.ppt
No ratings yet
Naive Bayes.ppt
24 pages
dimensionalityReduction.pptx
No ratings yet
dimensionalityReduction.pptx
117 pages
Simple Linear and Logistic Regression
No ratings yet
Simple Linear and Logistic Regression
81 pages
HEREDITY Board papers with Answers
No ratings yet
HEREDITY Board papers with Answers
9 pages
Discourse Studies 2009 Park 79 104
No ratings yet
Discourse Studies 2009 Park 79 104
27 pages
Hypixel SkyBlock Minions Sheet v1.67
No ratings yet
Hypixel SkyBlock Minions Sheet v1.67
102 pages
LKP 1 Lo 1
No ratings yet
LKP 1 Lo 1
70 pages
Forex Godfather Elite PDF PDF Foreign Exchange Market Margin (Finance)
No ratings yet
Forex Godfather Elite PDF PDF Foreign Exchange Market Margin (Finance)
1 page
Ire
No ratings yet
Ire
80 pages
A Detailed Lesson Plan in Technology and Livelihood Education
50% (2)
A Detailed Lesson Plan in Technology and Livelihood Education
5 pages
Map - Okavango River Basin
100% (8)
Map - Okavango River Basin
1 page
Cryptographer's Way - Bradford Hardie III
No ratings yet
Cryptographer's Way - Bradford Hardie III
100 pages
Battle Drill
No ratings yet
Battle Drill
7 pages
Apache RTR 200 4V
No ratings yet
Apache RTR 200 4V
2 pages
Study of Power System Security in Indian Utility 62 Bus System
No ratings yet
Study of Power System Security in Indian Utility 62 Bus System
10 pages
Pi 5
No ratings yet
Pi 5
19 pages
1-5-6 Sprue
No ratings yet
1-5-6 Sprue
45 pages
Form of Absence in Teaching Laboratory Session (2023-2024 Semester 2)
No ratings yet
Form of Absence in Teaching Laboratory Session (2023-2024 Semester 2)
1 page
Laser Based Manufacturing - Assignments Merged (2023)
No ratings yet
Laser Based Manufacturing - Assignments Merged (2023)
32 pages
Rads4 Data Manual
No ratings yet
Rads4 Data Manual
42 pages
The Money Train
No ratings yet
The Money Train
3 pages
Ap Grade 7
100% (1)
Ap Grade 7
5 pages
Key Features: You Communicate. We Integrate
No ratings yet
Key Features: You Communicate. We Integrate
3 pages
The Truth Commission Files
No ratings yet
The Truth Commission Files
15 pages
TFM Imaging For NDT: Phased-Array Technologies
100% (1)
TFM Imaging For NDT: Phased-Array Technologies
26 pages
Kreatryx Control System
No ratings yet
Kreatryx Control System
33 pages
Welcome To The Mandatory 8-Hours Safety Seminars For Workers
No ratings yet
Welcome To The Mandatory 8-Hours Safety Seminars For Workers
29 pages
Op - Description: Wps No. Bm4135Fw1-1 A Se Folosi Dispozitiv de Sudura
No ratings yet
Op - Description: Wps No. Bm4135Fw1-1 A Se Folosi Dispozitiv de Sudura
69 pages
Hermit Crab Care Research
No ratings yet
Hermit Crab Care Research
36 pages
Part 1. VPBank - EY Vietnam - IT Strategy Proposal - Main Content
No ratings yet
Part 1. VPBank - EY Vietnam - IT Strategy Proposal - Main Content
55 pages

Introduction to Machine Learning

Uploaded by

Introduction to Machine Learning

Uploaded by

INTRODUCTION TO MACHINE LEARNING

A branch of artificial intelligence, concerned with the design

As intelligence requires knowledge, it is necessary for the

1/10/2023 Data Science in Practice 7

The algorithms control the search to find and build the

The learning algorithms should extract useful

Data acquisition Practical usage

Training set Testing set

No free lunch rule:

Class=Yes a b d: TN (true negative)

1/10/2023 Data Science in Practice 62

There are 7 dog predictions.

For PRECISION, think about predictions as the base

For RECALL, think about truth as the base

PRECISION and RECALL are for individual classes.

It is used to evaluate binary classification systems.

F1 score is balancing precision and recall on the

Two important factors:

(model = hypothesis + loss

This figure shows an overfit model. An overfit model tries to

reduce this parameter if you can keep these parameters smaller

Now how do we do that?

So whenever model tries to make the theta value higher, we are

we can fine tune this thing using parameter lambda

So this is called L2 regularization it is called L2 because we are

You might also like