Introduction_Machine_Learning
Introduction_Machine_Learning
by
Shavantrevva S. B.
Assistant Professor
Dept. of Data Science and Computer Applications
Manipal Institute of Technology, Manipal
1 Machine Learning
1 Machine Learning
1997: The IBM’s Deep blue intelligent computer won the chess game
against the chess expert Garry Kasparov, and it became the first com-
puter which had beaten a human chess expert.
2006: In the year 2006, computer scientist Geoffrey Hinton has given
a new name to neural net research as ”deep learning,” and nowadays,
it has become one of the most trending technologies.
2016: AlphaGo beat the world’s number second player Lee sedol at
Go game. In 2017 it beat the number one player of this game Ke Jie.
ML today
Self-driving cars
Amazon Alexa,
Chatbots
Recommender system, and many more.
Labelled data – consists of input output pair. For every set input
features the output/response/label is present in dataset.
ex- labelled image as cat’s or dog’s photo
Sample structure: (x1 , y1 ), (x2 , y2 ), (x3 , y3 ) ... (xn , yn )
Supervised model can be developed when the label is available
x−xmean
for a feature x, the standardize value xstd is :xstd = stadeviation(x)
Predict y from x
Given a labelled set of input-
output pairs,
Map input x to output y
Given: Training set (xi , yi ) —
i = 1 ... n
Find: A good approximation to
f : X × Y where y ϵ 1, . . . ,
C,
Classification – y is categorical
Regression – y is real values
Examples spam detection, Digit
Recognition, stock prices
Regression:
Predicts a numeric value and
outputs are continuous rather
than discrete.
Example: predicting house
prices based on features like
size, location, etc.
Regression Algorithms
Linear Regression
Classification Algorithms
Polynomial Regression Logistic Regression
Support Vector Machine Re- Support Vector Machines
gression
Decision Trees
Decision Tree Regression
Random Forests
Random Forest Regression
Naive Baye
Parametric learning
Fixed Model Structure: The model makes specific assumptions
about the functional form or structure of the relationship between in-
put data and output predictions.
Fixed Number of Parameters: These models have a fixed set of pa-
rameters that define the model’s architecture or behavior.
Learning Parameters: During the learning process, these parameters
are estimated from the training data, allowing the model to make pre-
dictions or decisions based on these learned parameters.
Examples:
Linear Regression
Logistic Regression
Naive Bayes
Non-parametric learning
Model with lot of data and no prior knowledge
Examples:
Decision Trees
K-Nearest Neighbor
Support Vector Machines
Accuracy is used when the True Positives and True Negatives are
more important. Accuracy is a better metric for Balanced Data.
F1-Score is used when the False Negatives and False Positives are
important. F1-Score is a better metric for Imbalanced Data.
Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 42 / 53
Compare Models
X= dataset.iloc[:,:-1].values
y= dataset.iloc[:,3].values
If your dataset is small you might end up with a tiny training, validation
set, such that they dont well represent your data.
Cross validation, performs several trainin-validation splits and train and
evaluate the model across all of them to find the hyperparameters that
performs the best on the data in general.
split the training data into K folds (K¿2 to K=10)
Each fold gets a turn at being the validation set.
Each round will produce a score so after K fold cross validation, it will
produce K scores. We usually average over the K results.
Note that cross-validation doesnt shuffle the data; its done in train
test split.
There are 2 ways we can do cross-validation with sklearn:
.cross val score()
.cross validate()