0% found this document useful (0 votes)
13 views

Introduction_Machine_Learning

The document provides an overview of machine learning, including its definitions, history, and importance in various sectors. It discusses different types of machine learning such as supervised, unsupervised, and reinforcement learning, along with data preprocessing techniques and algorithms. The document also highlights the evolution of machine learning technologies and their applications in real-world scenarios.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Introduction_Machine_Learning

The document provides an overview of machine learning, including its definitions, history, and importance in various sectors. It discusses different types of machine learning such as supervised, unsupervised, and reinforcement learning, along with data preprocessing techniques and algorithms. The document also highlights the evolution of machine learning technologies and their applications in real-world scenarios.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Fundamentals of Machine Learning (DSE 2222)

by
Shavantrevva S. B.
Assistant Professor
Dept. of Data Science and Computer Applications
Manipal Institute of Technology, Manipal

January 14, 2024

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 1 / 53


Overview

1 Machine Learning

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 2 / 53


Overview

1 Machine Learning

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 3 / 53


References
1 K. Murphy, Machine Learning: A Probabilistic Perspective, MIT Press,
2012.

2 G. James, D. Witten, T Hastie, R Tibshirani, An introduction to sta-


tistical learning with applications in R, Springer, 2013.

3 J. Han, M. Kamber, J. Pei, Data Mining concepts and techniques,


(2e), Morgan Kaufmann-Elsevier, 2011.

4 T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical


Learning, (2e), Springer, 2009.

5 T. M. Mitchell, Machine Learning, (Indian Edition), MacGraw Hill,


2017.

6 C. Bishop, Neural Networks for Pattern Recognition, Oxford University


Press, 2019
Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 4 / 53
Introduction to Machine Learning
What is Learning?? The acquisition of knowledge or skills through
study,observations,experience, or being taught is called learning.

Human learn through study or


by observing events or when
they taught.(analogy to ma-
chine learning)

What is Machine Learning?? It is a branch of artificial intelligence


that develops algorithms by learning the hidden patterns of the
datasets used it to make predictions on new similar type data, without
being explicitly programmed for each task.
Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 5 / 53
History of Machine Learning
Computer machinery and intelligence:
1950: In 1950, Alan Turing published a seminal paper, ”Computer
Machinery and Intelligence,” on the topic of artificial intelligence. In
his paper, he asked, ”Can machines think?“
Machine intelligence in Games:
1952: Arthur Samuel, who was the pioneer of machine learning, created
a program that helped an IBM computer to play a checkers game. It
performed better more it played.
1959: In 1959, the term ”Machine Learning” was first coined by
Arthur Samuel.
The first ”AI” winter
The duration of 1974 to 1980 was the tough time for AI and ML re-
searchers, and this duration was called as AI winter.
In this duration, failure of machine translation occurred, and people had
reduced their interest from AI, which led to reduced funding by the
government to the researches.

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 6 / 53


Machine Learning from theory to reality

1959: In 1959, the first neural network was applied to a real-world


problem to remove echoes over phone lines using an adaptive filter.

1985: In 1985, Terry Sejnowski and Charles Rosenberg invented a


neural network NETtalk, which was able to teach itself how to correctly
pronounce 20,000 words in one week.

1997: The IBM’s Deep blue intelligent computer won the chess game
against the chess expert Garry Kasparov, and it became the first com-
puter which had beaten a human chess expert.

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 7 / 53


Machine Learning at 21st century

2006: In the year 2006, computer scientist Geoffrey Hinton has given
a new name to neural net research as ”deep learning,” and nowadays,
it has become one of the most trending technologies.

2012: In 2012, Google created a deep neural network which learned


to recognize the image of humans and cats in YouTube videos.

2014: In 2014, the Chatbot ”Eugen Goostman” cleared the Turing


Test. It was the first Chatbot who convinced the 33human judges that
it was not a machine.
2014: DeepFace was a deep neural network created by Facebook, and
they claimed that it could recognize a person with the same precision
as a human can do.

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 8 / 53


Machine Learning at 21st century

2016: AlphaGo beat the world’s number second player Lee sedol at
Go game. In 2017 it beat the number one player of this game Ke Jie.

2017: In 2017, the Alphabet’s Jigsaw team built an intelligent system


that was able to learn the online trolling. It used to read millions of
comments of different websites to learn to stop online trolling.

ML today
Self-driving cars
Amazon Alexa,
Chatbots
Recommender system, and many more.

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 9 / 53


Machine Learning Vs. Traditional Programming

Machine Learning Traditional Programming


Learns from data patterns Involves writing explicit
to make predictions instructions/rules
Adapts and improves Requires manual intervention
over time with new data for changes/adaptation
Generalizes patterns to new, Solves specific,
unseen data predefined problems
Focuses on data preparation, Direct use of programming
model training languages
Suitable for complex, Suitable for problems with
pattern-based problems clear rules/logic
Required for data preprocessing, Needed for coding, debugging,
model training ensuring correctness
Errors from biased data,
Errors often result from bugs in the code
overfitting, etc.

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 10 / 53


AI, ML, DL, Data Science & Analytics

Figure 1.1: AI, ML, DL, Data Science & Analytics

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 11 / 53


Programmatic Vs. Machine Learning Solution

Figure 1.2: Programmatic Vs. Machine Learning Solution

Importance of Machine Learning


Rapid increment in the production of data
Solving complex problems, which are difficult for a human
Decision making in various sector including finance
Finding hidden patterns and extracting useful information from data.

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 12 / 53


Formal Definition of Machine Learning

Arthur Samuel (1959)


Field of study that gives com-
puters the ability to learn
without being explicitly pro-
grammed.
Tom Mitchel (1998)
A computer program is said
to learn
from experience E
with respect to some task T
and some performance mea-
sure P,
Performance on T, as mea-
sured by P, improves with ex-
perience E.

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 13 / 53


Data pattern

Input data can be:


Structured data (ex. Tabular data)
Unstructured data (ex. Audio, Video, Text)
Most of the classical machine learning algorithms are suitable for nu-
merical structured data.
There are dedicated algorithms for working with image and text data.
Almost all kinds of data require data pre-processing and preparation
for model building.
Data can be divided into two broad categories based on values
Continuous data (ex. Price-1,2,3,....1000)
Categorical data (ex. Gender- Male/Female, Yes/No)
These categories help to perform suitable Exploratory Data Analysis
(EDA).
They also help to decide which supervised task should be performed.

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 14 / 53


Type of data for model building

Labelled data – consists of input output pair. For every set input
features the output/response/label is present in dataset.
ex- labelled image as cat’s or dog’s photo
Sample structure: (x1 , y1 ), (x2 , y2 ), (x3 , y3 ) ... (xn , yn )
Supervised model can be developed when the label is available

Unlabelled data - There is no output/response/label for the input fea-


tures in data.
ex. news articles, tweets, audio
Sample structure: x1 , x2 , x3 , x4 , x5 ... xn
Unsupervised models are built on unlabelled data

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 15 / 53


Data Quality
Poor data quality negatively affects many data processing efforts.
Examples of data quality problems:
Noise and outliers
Missing values
Duplicate data
Wrong data
Noisy data Examples: distortion of a person’s voice when talking on a
poor phone and “snow” on television screen
We can talk about signal to noise ratio. Left image of 2 sine waves
has low or zero SNR; the right image are the two waves combined with
noise and has high SNR

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 16 / 53


Data Pre-processing: Need

Missing, inconsistent, and noisy due to their heterogeneous origin.


The technique of preparing (cleaning and organizing) the raw data
to make it suitable for a building and training Machine Learning
models.
Duplicate or missing values may give an incorrect view of the overall
statistics of data.
Outliers and inconsistent data points often tend to disturb the model’s
overall learning, leading to false predictions.
Transforms raw data into an understandable and readable format.
Machines prefer to process neat and orderly information; they read data
as binary – 1s and 0s.
unstructured data, such as text and photos, must be prepped and for-
matted with the help of data preprocessing

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 17 / 53


Data Pre-processing
Import libraries
Steps in data pre-processing Numpy: import numpy as np
Collecting the dataset Matplotlib: import mat-
plotlib.pyplot as mpt
Importing libraries
Pandas: import pandas as pd
Importing datasets Read dataset: data set=
pd.read csv(/path to/-
Finding Missing Data Dataset.csv)
LabelEncoder()
Encoding Categorical Data Extracting independent
variable: x = data set.iloc[:
Splitting dataset into training
, : −1].values
and test set
Extracting dependent vari-
Feature scaling able: y = data set.iloc[:
, 3].values

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 18 / 53


Data pre-processing: code snippet

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 19 / 53


Data Pre-processing
LabelEncoder: used to transform categorical labels (strings or inte-
gers) into numerical labels from 0 to n classes - 1.

OneHotEncoder: It works by converting categorical variables into


binary vectors with a 1 for the presence of a particular category and
0 otherwise.

StandardScaler: is a preprocessing tool used to standardize features


by removing the mean and scaling them to unit variance.

x−xmean
for a feature x, the standardize value xstd is :xstd = stadeviation(x)

Normalization: (Min-Max Scaling) scales features to a fixed range,


usually between 0 and 1.
x−minx
xnorm = (max(x)−min(x))

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 20 / 53


Classification of Machine Learning

Figure 1.3: Classification of Machine Learning.

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 21 / 53


Supervised Learning

Predict y from x
Given a labelled set of input-
output pairs,
Map input x to output y
Given: Training set (xi , yi ) —
i = 1 ... n
Find: A good approximation to
f : X × Y where y ϵ 1, . . . ,
C,
Classification – y is categorical
Regression – y is real values
Examples spam detection, Digit
Recognition, stock prices

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 22 / 53


Supervised Learning: Train & Test

Figure 1.4: Supervised Learning: Train & Test

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 23 / 53


Types of of Supervised Learning
Classification:
Accurately assigning data to
different categories or classes.
Example:The output variable
is a category, such as
”spam” or ”not spam” for
emails, or recognizing types of
animals based on images.

Regression:
Predicts a numeric value and
outputs are continuous rather
than discrete.
Example: predicting house
prices based on features like
size, location, etc.

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 24 / 53


Regression/Classification Algorithms in ML

Regression Algorithms
Linear Regression
Classification Algorithms
Polynomial Regression Logistic Regression
Support Vector Machine Re- Support Vector Machines
gression
Decision Trees
Decision Tree Regression
Random Forests
Random Forest Regression
Naive Baye

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 25 / 53


Unsupervised Learning
Unsupervised learning allows the model to discover patterns and rela-
tionships in unlabeled data.
Clustering algorithms group similar data points together based on their
inherent characteristics.

Feature extraction captures es-


sential information from the
data, enabling the model to
make meaningful distinctions.
Label association assigns cate-
gories to the clusters based on
the extracted patterns and char-
acteristics.

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 26 / 53


Types of Unsupervised Learning
Clustering: A clustering problem is where you want to discover the
inherent groupings in the data, such as grouping customers by purchas-
ing behavior.
Examples: K-means clustering, Principal Component Analysis

Dimensionality reduction: It simplifies inputs by mapping them into


a lower-dimensional space. Topic modeling is a related problem, where
a program is given a list of human language documents and is tasked
to find out which documents cover similar topics.

Figure 1.5: Clustering Example:feature


Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 27 / 53
Reinforcement Learning
Agent is acting in an environment.
Needs learn action to take at every step
Action depends on reward or punishment signals that the agent gets in
each state

Figure 1.7: Framework of Reinforcement Learning


Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 28 / 53
Parametric and Non-parametric Machine Learning

Parametric learning
Fixed Model Structure: The model makes specific assumptions
about the functional form or structure of the relationship between in-
put data and output predictions.
Fixed Number of Parameters: These models have a fixed set of pa-
rameters that define the model’s architecture or behavior.
Learning Parameters: During the learning process, these parameters
are estimated from the training data, allowing the model to make pre-
dictions or decisions based on these learned parameters.
Examples:
Linear Regression
Logistic Regression
Naive Bayes

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 29 / 53


Parametric and Non-parametric Machine Learning

Non-parametric learning
Model with lot of data and no prior knowledge

no need to choose the right features.

more flexible, but often computationally intractable

Examples:
Decision Trees
K-Nearest Neighbor
Support Vector Machines

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 30 / 53


Experimental Evaluation of Learning Algorithms

Evaluating the performance of learning systems is important because


Learning systems are designed to predict the class of future unlabeled
data points.
1 Experimental Model Evaluation include
Error
Accuracy
Precision/Recall
2 Typical Choices for Sampling Methods
Rndom train/test split: Error term also called residual represents the
distance of the observed value from the value predicted by regression
line.
K-fold Cross-validation

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 31 / 53


Experimental Model Evaluation for Prediction

How is error measured?


Let’s aim to predict the value of a specific target attribute for a given
instance, denoted as example x.
y is the observed value (ground truth) of target feature on example x
ŷ is the predicted value of target feature on example x
ŷ = h(x)
For Regression Problems
1
Pn
Absolute error n |h (x) − y |
i=1
n 2
Sum of square error n1 i=1 (h (x) − y )
P

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 32 / 53


Experimental Model Evaluation for classification

Suppose we want to make a prediction of a value for a target feature


on example x.
y is the observed value of target feature on example x
ŷ is the predicted value of target feature on example x
ŷ = h(x)

For Classification Problems: Number of Misclassification


1 Pn
n i=1 δ (h (x) − y )

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 33 / 53


Error: Bias & Variance

Bias: A difference occurs between prediction values made by the model


and actual values or expected values, and this difference is known as bias
errors or Errors due to bias.
Inability of ML algorithms to capture the true relationship between
the data points.
Each algorithm begins with some amount of bias because bias occurs
from assumptions in the model.
Low Bias: A low bias model will make fewer assumptions about the
form of the target function.
High Bias: A model with a high bias makes more assumptions, and
the model becomes unable to capture the important features of our
dataset. A high bias model also cannot perform well on new
data.

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 34 / 53


Error: Bias & Variance

Ways to reduce High Bias:


Increase the input features as the model is underfitted.
Decrease the regularization term.
Use more complex models, such as including some polynomial features.
How much a random variable is different from its expected value.
Variance errors are either of low variance or high variance.
Low variance means there is a small variation in the prediction of the
target function with changes in the training data set.
High variance shows a large variation in the prediction of the target
function with changes in the training dataset.
A model that shows high variance learns a lot and perform well
with the training dataset, and does not generalize well with the
unseen dataset.

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 35 / 53


Bias-Vaiance Tradeoff

If we decrease the variance, it will increase the bias.


If we decrease the bias, it will increase the variance.
Bias-Variance trade-off is about finding the sweet spot to make a bal-
ance between bias and variance errors.

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 36 / 53


Overfitting & Underfitting in ML

Underfitting:The inability of the model to learn the training data


effectively result in poor performance both on the training and testing
data.
Reasons for Underfitting
The model is too simple, So it may be not capable to represent the
complexities in the data.
The input features which is used to train the model is not the adequate
representations of underlying factors influencing the target variable.
The size of the training dataset used is not enough.
Excessive regularization are used to prevent the overfitting, which con-
straint the model to capture the data well.
Features are not scaled.

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 37 / 53


Overfitting & Underfitting in ML
Techniques to Reduce Underfitting
Increase model complexity.
Increase the number of features, performing feature engineering.
Remove noise from the data.
Increase the number of epochs or increase the duration of training to
get better results.
Overfitting : Overfitting is a problem where the performance of machine
learning algorithms on training data is different from unseen data.

Techniques to Reduce Over-


Reasons for Overfitting: fitting
High variance and low bias. Increase training data
The model is too complex. Reduce model complexity
The size of the training data. Early stopping
Regularization
Use dropout for neural net-
works
Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 38 / 53
Overfitting and Underfitting

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 39 / 53


Model Evaluation in Classification

True Positives (TP): when the


actual value is Positive and pre-
dicted is also Positive.
True negatives (TN): when
the actual value is Negative and
prediction is also Negative.
False positives (FP): When
the actual is negative but pre-
diction is Positive. Also known
as the Type 1 error.
False negatives (FN): When
the actual is Positive but the
prediction is Negative. Also
known as the Type 2 error

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 40 / 53


Model Evaluation: Metrics& Example
.

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 41 / 53


When to use Accuracy / Precision / Recall / F1-Score?

Accuracy is used when the True Positives and True Negatives are
more important. Accuracy is a better metric for Balanced Data.

Whenever False Positive is much more important use Precision.

Whenever False Negative is much more important use Recall.

F1-Score is used when the False Negatives and False Positives are
important. F1-Score is a better metric for Imbalanced Data.
Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 42 / 53
Compare Models

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 43 / 53


Typical Choices for Sampling Methods

First, the data needs to be shuf-


fled.
Then, we split the rows of the
(a) Train:Test ratio
data into 2 section train and
test.
from sklearn.model selection
import train test split
Validation set used to tune
model parameters
(b) Train:valid:test

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 44 / 53


train:test split example

import numpy as np; import matplotlib.pyplot as plt; import pandas as


pd
dataset= pd.read csv(‘Data.csv’)

X= dataset.iloc[:,:-1].values

y= dataset.iloc[:,3].values

X train, X test, y train, y test = train test split(X, y, test size=0.33)

print(X train.shape, X test.shape, y train.shape, y test.shape)

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 45 / 53


Typical Choices for Sampling Methods: K-fold cross
validation

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 46 / 53


K-fold cross validation

If your dataset is small you might end up with a tiny training, validation
set, such that they dont well represent your data.
Cross validation, performs several trainin-validation splits and train and
evaluate the model across all of them to find the hyperparameters that
performs the best on the data in general.
split the training data into K folds (K¿2 to K=10)
Each fold gets a turn at being the validation set.
Each round will produce a score so after K fold cross validation, it will
produce K scores. We usually average over the K results.
Note that cross-validation doesnt shuffle the data; its done in train
test split.
There are 2 ways we can do cross-validation with sklearn:
.cross val score()
.cross validate()

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 47 / 53


Supervised Learning set up

1 perform necessary pre-processing


2 Given training data with X and y.
3 We split our data into X train, y train, X test, y test.
4 Hyperparameter optimization using cross-validation on X train and y
train.
5 We assess the best model using X test and y test.
6 The test score tells us how well our model generalizes.
7 If the test score is reasonable, we deploy the model.

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 48 / 53


Vapnik-Chervonenkis Dimension (VC)

The greatest number of points that a classifier is capable of shat-


tering is specified by the VC dimension, which measures a classifier’s
ability to classify data.

VC dimension, is a model capacity measurement used in statistics


and machine learning.

Used to guide the model selection process while developing machine


learning applications.

Classifier is able to properly distinguish any potential labeling of the


points, it is said to be ”shattering” a collection of points.
VC dimension of line hypothesis is 3
VC dimension of rectangle hypothesis is 4

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 49 / 53


VC-Dimension

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 50 / 53


VC-Dimension

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 51 / 53


No Free Lunch theorem in Machine Learning
There are so many algorithms for optimization, so what is the best
one???
Answer: NOooo...
WHY????
Due to varied complexity and diversity of real world problem
Single method can not cope with all types of problems
There is no model works well for all problems- NFLT
States that no universal algorithm for all problem.

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 52 / 53


Thank You

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 53 / 53

You might also like