0% found this document useful (0 votes)
27 views14 pages

Machine Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views14 pages

Machine Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Dr. Kapil K.

Misal Unit 1- Machine Learning

Unit 1: Introduction to Machine Learning

Introduction: What is Machine Learning, Examples of Machine Learning applications,


Training versus Testing, Positive and Negative Class, Cross-validation. Types of Learning:
Supervised, Unsupervised and Semi-Supervised Learning. Dimensionality Reduction:
Introduction to Dimensionality Reduction, Subset Selection, Introduction to Principal
Component Analysis

1. What is Machine Learning:


Machine learning (ML) is a type of artificial intelligence (AI) that allows software
applications to become more accurate at predicting outcomes without being explicitly
programmed to do so. Machine learning algorithms use historical data as input to
predict new output values.
In the real world, we are surrounded by humans who can learn everything from their
experiences with their learning capability, and we have computers or machines which
work on our instructions. But can a machine also learn from experiences or past data
like a human does? So here comes the role of Machine Learning.
Dr. Kapil K. Misal Unit 1- Machine Learning

Machine learning enables a machine to automatically learn from data, improve


performance from experiences, and predict things without being explicitly
programmed.

 How does Machine Learning work?


A Machine Learning system learns from historical data, builds the prediction models,
and whenever it receives new data, predicts the output for it. The accuracy of
predicted output depends upon the amount of data, as the huge amount of data helps
to build a better model which predicts the output more accurately.
Suppose we have a complex problem, where we need to perform some predictions,
so instead of writing a code for it, we just need to feed the data to generic algorithms,
and with the help of these algorithms, machine builds the logic as per the data and
predict the output. Machine learning has changed our way of thinking about the
problem. The below block diagram explains the working of Machine Learning
algorithm:
Dr. Kapil K. Misal Unit 1- Machine Learning

2. Examples of Machine Learning applications:

Machine learning is a buzzword for today's technology, and it is growing very rapidly
day by day. We are using machine learning in our daily life even without knowing it
such as Google Maps, Google assistant, Alexa, etc. Below are some most trending real-
world applications of Machine Learning:
Dr. Kapil K. Misal Unit 1- Machine Learning

3. Training versus Testing:


Let’s say you want to create a model based on some database. In machine learning,
this data is divided into two parts: training and testing data.

 Training data : Training data is the one you feed to a machine learning model, so
it can analyse it and discover some patterns and dependencies. This training set has 3
main characteristics:
 Size. The training set normally has more data than testing data. The more data you
feed to the machine, the better quality model you have. Once a machine learning
algorithm is provided with data from your records, it learns patterns from it and makes
a model for decision-making.
 Label. A label is a value of what we try to predict (response variables). For example, if
we want to forecast if the patient will be diagnosed with cancer, based on their
symptoms, the response variable will be Yes/No for the cancer diagnosis. The training
data can be labelled and unlabelled. Both types can be used in machine learning for
different cases.
 Case details. Algorithms make decisions based on the information you give them. You
need to make sure that the data is relevant and has various cases with different
outcomes. For instance, if you need a model that can score potential borrowers, you
need to include in the training set the information you normally know about your
potential client during the application process:
 Name and contact details, location;
 Demographics, social and behavioural characteristics;
 Source of origin (Meta Ads, website landing page, third party, etc.)
 Factors connected to the behaviour/activity on websites, conversions, time spent on
a website, number of clicks, and more.

 Testing Data :
Dr. Kapil K. Misal Unit 1- Machine Learning

 After the machine learning model is built, you need to check its work. The AI
platform uses testing data to evaluate the performance of your model and adjust
or optimize it for better forecasts. The testing set should have the following
characteristics:
 Unseen. You cannot reuse the same information that was in the training set.
 Large. The data set should be large enough so that the machine can make
predictions.
 Representative. The data should represent the actual dataset.

Luckily, you don’t need to collect new data and compare predictions with actual
data manually. The AI can split the existing data into two parts, put testing set
aside while training, and then run tests comparing predictions and actual results
all by itself. Data science has different options for data split, but the most common
proportions are 70/30, 80/20, and 90/10.
So having a massive data set at hand, we can check if it’s possible to make
predictions based on that model or not.
 To make it simple to understand, let’s consider this following definition:
 Wolf is a positive class
 No wolf is a negative class
 True Positive (TP): is the result that we get if we correctly predict the
positive class
 False Positive (FP): is the outcome that we get if we predict a negative class
as a positive class
 True Negative (TN): is the result that we get if we correctly predict the
negative class
 False Negative (FN): is the outcome that we get if we predict a positive class
as a negative class
Dr. Kapil K. Misal Unit 1- Machine Learning

4. Cross-validation :
Cross-validation is a technique for evaluating ML models by training several ML
models on subsets of the available input data and evaluating them on the
complementary subset of the data. Use cross-validation to detect overfitting, ie, failing
to generalize a pattern.
In Amazon ML, you can use the k-fold cross-validation method to perform cross-
validation. In k-fold cross-validation, you split the input data into k subsets of data
(also known as folds). You train an ML model on all but one (k-1) of the subsets, and
then evaluate the model on the subset that was not used for training. This process
is repeated k times, with a different subset reserved for evaluation (and excluded
from training) each time.
The following diagram shows an example of the training subsets and complementary
evaluation subsets generated for each of the four models that are created and trained
during a 4-fold cross-validation. Model one uses the first 25 percent of data for
evaluation, and the remaining 75 percent for training. Model two uses the second
subset of 25 percent (25 percent to 50 percent) for evaluation, and the remaining
three subsets of the data for training, and so on.
Dr. Kapil K. Misal Unit 1- Machine Learning

Cross-validation is a technique in which we train our model using the subset of the
data-set and then evaluate using the complementary subset of the data-set. The three
steps involved in cross-validation are as follows:
 Reserve some portion of sample data-set.
 Using the rest data-set train the model.
 Test the model using the reserve portion of the data-set.

Validation In this method, we perform training on the 50% of the given data-set
and rest 50% is used for the testing purpose. The major drawback of this method
is that we perform training on the 50% of the dataset, it may possible that the
remaining 50% of the data contains some important information which we are
leaving while training our model i.e higher bias. LOOCV (Leave One Out Cross
Validation) In this method, we perform training on the whole data-set but leaves
only one data-point of the available data-set and then iterates for each data-point.
It has some advantages as well as disadvantages also. An advantage of using this
method is that we make use of all data points and hence it is low bias. The major
drawback of this method is that it leads to higher variation in the testing model as
we are testing against one data point. If the data point is an outlier it can lead to
higher variation. Another drawback is it takes a lot of execution time as it iterates
over ‘the number of data points’ times. K-Fold Cross Validation In this method, we
split the data-set into k number of subsets(known as folds) then we perform
training on the all the subsets but leave one(k-1) subset for the evaluation of the
trained model. In this method, we iterate k times with a different subset reserved
for testing purpose each time.

5. Types of Machine Learning:


 Supervised Machine Learning:
Supervised learning is a type of machine learning in which the algorithm is
trained on the labeled dataset. It learns to map input features to targets based
on labeled training data. In supervised learning, the algorithm is provided with
Dr. Kapil K. Misal Unit 1- Machine Learning

input features and corresponding output labels, and it learns to generalize


from this data to make predictions on new, unseen data.
There are two main types of supervised learning:
 Regression: Regression is a type of supervised learning where the
algorithm learns to predict continuous values based on input features.
The output labels in regression are continuous values, such as stock
prices, and housing prices. The different regression algorithms in
machine learning are: Linear Regression, Polynomial Regression, Ridge
Regression, Decision Tree Regression, Random Forest Regression,
Support Vector Regression, etc
 Classification: Classification is a type of supervised learning where the
algorithm learns to assign input data to a specific category or class
based on input features. The output labels in classification are discrete
values. Classification algorithms can be binary, where the output is one
of two possible classes, or multiclass, where the output can be one of
several classes. The different Classification algorithms in machine
learning are: Logistic Regression, Naive Bayes, Decision Tree, Support
Vector Machine (SVM), K-Nearest Neighbors (KNN), etc

 Unsupervised Machine Learning:


Unsupervised learning is a type of machine learning where the algorithm learns
to recognize patterns in data without being explicitly trained using labeled
examples. The goal of unsupervised learning is to discover the underlying
structure or distribution in the data.
There are two main types of unsupervised learning:
 Clustering: Clustering algorithms group similar data points together
based on their characteristics. The goal is to identify groups, or clusters,
of data points that are similar to each other, while being distinct from
other groups. Some popular clustering algorithms include K-means,
Hierarchical clustering, and DBSCAN.
Dr. Kapil K. Misal Unit 1- Machine Learning

 Dimensionality reduction: Dimensionality reduction algorithms reduce


the number of input variables in a dataset while preserving as much of
the original information as possible. This is useful for reducing the
complexity of a dataset and making it easier to visualize and analyze.
Some popular dimensionality reduction algorithms include Principal
Component Analysis (PCA), t-SNE, and Autoencoders.

 Reinforcement Machine Learning:


Reinforcement learning is a type of machine learning where an agent learns to
interact with an environment by performing actions and receiving rewards or
penalties based on its actions. The goal of reinforcement learning is to learn a
policy, which is a mapping from states to actions, that maximizes the expected
cumulative reward over time.
There are two main types of reinforcement learning:
 Model-based reinforcement learning: In model-based reinforcement
learning, the agent learns a model of the environment, including the
transition probabilities between states and the rewards associated
with each state-action pair. The agent then uses this model to plan its
actions in order to maximize its expected reward. Some popular model-
based reinforcement learning algorithms include Value Iteration and
Policy Iteration.
 Model-free reinforcement learning: In model-free reinforcement
learning, the agent learns a policy directly from experience without
explicitly building a model of the environment. The agent interacts with
the environment and updates its policy based on the rewards it
receives. Some popular model-free reinforcement learning algorithms
include Q-Learning, SARSA, and Deep Reinforcement Learning.

6. What is Dimensionality Reduction?


Dimensionality reduction is a technique used to reduce the number of features in a
dataset while retaining as much of the important information as possible. In other
Dr. Kapil K. Misal Unit 1- Machine Learning

words, it is a process of transforming high-dimensional data into a lower-dimensional


space that still preserves the essence of the original data.
In machine learning, high-dimensional data refers to data with a large number of
features or variables. The curse of dimensionality is a common problem in machine
learning, where the performance of the model deteriorates as the number of features
increases. This is because the complexity of the model increases with the number of
features, and it becomes more difficult to find a good solution. In addition, high-
dimensional data can also lead to overfitting, where the model fits the training data
too closely and does not generalize well to new data.
Dimensionality reduction can help to mitigate these problems by reducing the
complexity of the model and improving its generalization performance. There are two
main approaches to dimensionality reduction: feature selection and feature
extraction.
 Feature Selection: Feature selection involves selecting a subset of the original
features that are most relevant to the problem at hand. The goal is to reduce the
dimensionality of the dataset while retaining the most important features. There are
several methods for feature selection, including filter methods, wrapper methods, and
embedded methods. Filter methods rank the features based on their relevance to the
target variable, wrapper methods use the model performance as the criteria for
selecting features, and embedded methods combine feature selection with the model
training process.
 Feature Extraction: Feature extraction involves creating new features by combining
or transforming the original features. The goal is to create a set of features that
captures the essence of the original data in a lower-dimensional space. There are
several methods for feature extraction, including principal component analysis (PCA),
linear discriminant analysis (LDA), and t-distributed stochastic neighbor embedding (t-
SNE). PCA is a popular technique that projects the original features onto a lower-
dimensional space while preserving as much of the variance as possible.
 Why is Dimensionality Reduction important in Machine Learning and Predictive
Modeling?
An intuitive example of dimensionality reduction can be discussed through a simple
e-mail classification problem, where we need to classify whether the e-mail is spam
Dr. Kapil K. Misal Unit 1- Machine Learning

or not. This can involve a large number of features, such as whether or not the e-mail
has a generic title, the content of the e-mail, whether the e-mail uses a template, etc.
However, some of these features may overlap. In another condition, a classification
problem that relies on both humidity and rainfall can be collapsed into just one
underlying feature, since both of the aforementioned are correlated to a high degree.
Hence, we can reduce the number of features in such problems. A 3-D classification
problem can be hard to visualize, whereas a 2-D one can be mapped to a simple 2-
dimensional space, and a 1-D problem to a simple line. The below figure illustrates this
concept, where a 3-D feature space is split into two 2-D feature spaces, and later, if
found to be correlated, the number of features can be reduced even further.

 Components of Dimensionality Reduction:


There are two components of dimensionality reduction:
 Feature selection: In this, we try to find a subset of the original set of variables,
or features, to get a smaller subset which can be used to model the problem.
It usually involves three ways:
 Filter
 Wrapper
Dr. Kapil K. Misal Unit 1- Machine Learning

 Embedded
 Feature extraction: This reduces the data in a high dimensional space to a
lower dimension space, i.e. a space with lesser no. of dimensions.
 Methods of Dimensionality Reduction: The various methods used for
dimensionality reduction include:
 Principal Component Analysis (PCA)
 Linear Discriminant Analysis (LDA)
 Generalized Discriminant Analysis (GDA)

Dimensionality reduction may be both linear and non-linear, depending upon


the method used. The prime linear method, called Principal Component
Analysis, or PCA

7. Principal Component Analysis


This method was introduced by Karl Pearson. It works on the condition that while the
data in a higher dimensional space is mapped to data in a lower dimension space, the
variance of the data in the lower dimensional space should be maximum.

It involves the following steps:


 Construct the covariance matrix of the data.
 Compute the eigenvectors of this matrix.
Dr. Kapil K. Misal Unit 1- Machine Learning

Eigenvectors corresponding to the largest eigenvalues are used to reconstruct a large


fraction of variance of the original data.
Hence, we are left with a lesser number of eigenvectors, and there might have been
some data loss in the process. But, the most important variances should be retained
by the remaining eigenvectors.

 Advantages of Dimensionality Reduction:


 It helps in data compression, and hence reduced storage space.
 It reduces computation time.
 It also helps remove redundant features, if any.
 Improved Visualization: High dimensional data is difficult to visualize, and
dimensionality reduction techniques can help in visualizing the data in 2D or 3D, which
can help in better understanding and analysis.
 Overfitting Prevention: High dimensional data may lead to overfitting in machine
learning models, which can lead to poor generalization performance. Dimensionality
reduction can help in reducing the complexity of the data, and hence prevent
overfitting.
 Feature Extraction: Dimensionality reduction can help in extracting important features
from high dimensional data, which can be useful in feature selection for machine
learning models.
 Data Pre-processing: Dimensionality reduction can be used as a pre-processing step
before applying machine learning algorithms to reduce the dimensionality of the data
and hence improve the performance of the model.
 Improved Performance: Dimensionality reduction can help in improving the
performance of machine learning models by reducing the complexity of the data, and
hence reducing the noise and irrelevant information in the data.

 Disadvantages of Dimensionality Reduction


 It may lead to some amount of data loss.
Dr. Kapil K. Misal Unit 1- Machine Learning

 PCA tends to find linear correlations between variables, which is sometimes


undesirable.
 PCA fails in cases where mean and covariance are not enough to define datasets.
 We may not know how many principal components to keep- in practice, some thumb
rules are applied.
 Interpretability: The reduced dimensions may not be easily interpretable, and it may
be difficult to understand the relationship between the original features and the
reduced dimensions.
 Overfitting: In some cases, dimensionality reduction may lead to overfitting, especially
when the number of components is chosen based on the training data.
 Sensitivity to outliers: Some dimensionality reduction techniques are sensitive to
outliers, which can result in a biased representation of the data.
 Computational complexity: Some dimensionality reduction techniques, such as
manifold learning, can be computationally intensive, especially when dealing with
large datasets.

You might also like