Notes Machine Learning
Notes Machine Learning
Machine learning is a subset of artificial intelligence (AI) that involves the development of
algorithms and statistical models that enable computers to perform specific tasks without being
explicitly programmed. Instead of following predefined rules, machine learning systems learn
patterns and make decisions based on data. Here’s a detailed overview:
Key Concepts in Machine Learning
1. Training Data:
o Data used to train machine learning models. The quality and quantity of this data
significantly impact the model's performance.
2. Algorithms:
o Methods and techniques used to build machine learning models. Examples
include decision trees, support vector machines, and neural networks.
3. Model:
o A mathematical representation derived from training data used to make
predictions or decisions.
4. Features:
o Individual measurable properties or characteristics of the data used as input to the
model.
5. Labels:
o The target variable or output the model aims to predict (used in supervised
learning).
Types of Machine Learning
1. Supervised Learning:
o Definition: The model is trained on labeled data, where the input data and
corresponding output labels are provided.
o Examples: Classification (e.g., spam detection), Regression (e.g., predicting
house prices).
o Algorithms: Linear Regression, Logistic Regression, Decision Trees, Support
Vector Machines.
2. Unsupervised Learning:
o Definition: The model is trained on unlabeled data and must find patterns and
relationships within the data.
o Examples: Clustering (e.g., customer segmentation), Association (e.g., market
basket analysis).
o Algorithms: K-means, Hierarchical Clustering, Apriori Algorithm.
3. Semi-Supervised Learning:
o Definition: The model is trained on a combination of labeled and unlabeled data.
o Examples: Useful when acquiring labeled data is expensive or time-consuming.
o Algorithms: Semi-Supervised Support Vector Machines, Co-training.
4. Reinforcement Learning:
o Definition: The model learns by interacting with an environment and receiving
feedback in the form of rewards or penalties.
o Examples: Robotics, Game AI, Autonomous driving.
o Algorithms: Q-Learning, Deep Q-Networks, Policy Gradients.
Key Steps in Machine Learning
1. Data Collection:
o Gathering relevant data from various sources for training the model.
2. Data Preprocessing:
o Cleaning and transforming the data to make it suitable for analysis. This includes
handling missing values, normalizing data, and encoding categorical variables.
3. Feature Engineering:
o Creating new features or modifying existing ones to improve the model’s
performance.
4. Model Training:
o Using algorithms to learn patterns from the training data and build the model.
5. Model Evaluation:
o Assessing the model's performance using metrics such as accuracy, precision,
recall, and F1 score.
6. Model Deployment:
o Implementing the model in a real-world application where it can make predictions
Designing a learning
system
The formal definition of Machine learning as discussed in the
previous blogs of the Machine learning series is “A computer
program is said to learn from experience E with respect to
some class of tasks T and performance measure P, if its
performance at tasks in T, as measured by P, improves with
experience E’’.
as or in vector, form as
For convenience, we will sometimes write the perceptron
function as
Simple Linear
Regression
Simple linear regression is a well-known statistical method
for obtaining a formula to predict values of one variable
from another variable when there is a causal relationship
between the two variables.
Multiple Linear
Regression
Multiple linear regression in short can also be termed as
MLR, simply it is referred as multiple regression. It is a
statistical technique which uses various explanatory
variables to predict the outcome of a response variable
(dependent variable). Multiple linear regression (MLR) is
used to express the linear relationship between the
independent variables (explanatory variable) and dependent
variable (Response variable).
where,
for i=n observations
yi=dependent variable
xi=explanatory variables
β0=y-intercept (constant term)
βp=slope coefficients for each explanatory variable
ϵ=the model’s error term (residual)
Derivativation of Back
Propagation Rule
Backpropagation’s purpose is to find the partial derivatives
of the cost function C for every weight w or bias b in the
network. It is a supervised learning algorithm used for
Multilayer Perceptrons (Artificial Neural Networks).
We’ll use the product of some constant alpha and the partial
derivative of that quantity with respect to the cost function
to update the weights and biases in the network once we
get these partial derivatives. This is the famously-known
gradient descent method.
Random Variable:
A random variable may be thought of as the name of a
probabilistic experiment. Its value is the outcome of the
experiment.
Probability Distribution:
A probability distribution is a statistical function that
specifies all possible values and probabilities for a random
variable in a given range.
Y defines the chance Pr(Y = yi) that Y will take on the value
yi for each potential value yi for a random variable.
Expected Value:
The expected value (EV) is the value that investment is
predicted to have at some time in the future.
Dimensionality
Reduction
Dimensionality reduction is the process of decreasing the
dimensions of the feature set. In machine learning, there are
often many factors or variables through which final
classification is done. These variables are called as features.
If there are a greater number of features, it will be harder to
visualize the training set. Also, if features are more, it may
lead to correlation and hence redundant. Here is where
Dimensionality reduction plays a key role by reducing the
number of rando variables. It can also be divided into
feature selection and feature extraction.
Curse of Dimensionality
Curse of Dimensionality means the problems arise when we
work with high dimensions, which does not occur in low
dimensions. If the number of features increase, number of
samples also increases proportionally. More number of
samples need all different combinations of feature values
which are to be represented in our sample. Due to this,
certain algorithms struggle to train effective models. This is
known as Curse of Dimensionality.
Classification models
Classification is the process of predicting the class of given
data by drawing some conclusions. Classification predictive
modeling is the approximation of a mapping function from
input variables to discrete output variables. It belongs to the
supervised learning in Machine Learning where the targets
are provided with input data.
Classification can be applied on Structured and Unstructured
data. The main theme of a classification model is to discover
the category or class to which the new data falls under.
Classes are also termed as targets or labels or categories.
If we give one or more than one inputs to a classification
model, then the model will try to predict the values of one or
more outcomes. Whereas the outcomes are labels which can
be applied to datasets.
Classification models have many applications in many
domains like medical diagnosing, target marketing etc.,
If our sample is small and if the mean has error in it. We can
improve the estimate of the mean by using the bootstrap
procedure:
1. Create many random sub-samples of our dataset with
replacement so that same sample can be selected more than
once.
2. Compute the mean of each sub-sample.
3. Calculate the average of all of our collected means and refer
that as our estimated mean for the data.
BAGGING
Bootstrap Aggregation also called as Bagging is a simple yet
powerful ensemble method. It is one of the applications of
the Bootstrap procedure to a high-variance machine
learning algorithm, typically decision trees.
K NEAREST
NEIGHBOURS
K-Nearest Neighbors is one of the most important
classification algorithms in Machine Learning. It belongs to
the supervised learning field and has a powerful application
in pattern recognition, data mining and disturbance
detection.
The k-nearest-neighbor is a “lazy learner” algorithm, as it
does not build a model using the training set until a request
of the data set is performed.
KNN algorithm can be used for both classification and
regression problems. However, it is more widely used in
classification problems. When building a KNN model there
are only a few parameters that need to be chosen to
improve performance of a model.
K Means Clustering
K-means clustering is simplest and popular unsupervised
machine learning algorithm. Clustering is one of the most
common experimental data analysis technique used to get a
perception about the structure of the data. It is defined as
the task of defining subgroups in the data such that data
points in the same subgroup or cluster are very similar while
data points in different clusters are very different.
Principal Component
Analysis
Principal Component Analysis is usually termed as PCA. This
technique is used in unsupervised learning technique as it
does not consider about features but only concentrates on
variation of data in order to reduce the dimensions. In the
real time, the data is so huge which needs to be reduced in
order to avoid over fitting and some other problems during
the predictions of the model. As the dimensions of data
increases, we will face difficulty in visualization and also
performing calculations on it also increases. Hence, we have
to decrease the dimensions of the data by using
Dimensionality Reduction techniques.
Principal component analysis is one of the best
Dimensionality reduction techniques used to reduce the
dimensions. The main idea behind the principal component
analysis (PCA) is to reduce the dimensionality of a data set.
The variables of the dataset may be correlated with each
other either heavily or lightly, while retaining the difference
or variation present in the dataset up to the maximum
extent.
Genetic Algorithms:
Motivation and Genetic
Algorithm-Representing
Genetic algorithm- Motivation
Genetic algorithms are preferred for solving optimization
problems as they can yield a result in optimum time and is
also relatively faster.
3. Time-efficient.
Many problems such as the Travelling Salesperson Problem
or TSP has many practical uses such as pathfinding and VLSI
design. GA can give efficient results in optimum time. For
example, how much of a trouble it would be if our GPS took
hours to give us the path from our source to destination. But
with GA involved we can get a good result in a small time.
Genotype representing
Genotype representation is very important during the
implementation of a genetic algorithm, as it directly impacts
the performance of the genetic algorithm. A proper
representation is where the phenotype and the genotype
mappings have proper spaces between them.
1. Binary representation
Binary representation is one of the most common methods
of representing GA.
2. Integer representation
In the case of discrete-valued genes, we cannot always limit
the results to binary, so instead, we use integer
representation. For example, if we had to encode the three
directions, we could have encoded them as: {1,2,3,4}, and
represented using integer representation.
3. Permutation representation
Whenever the solutions are represented by an order of
elements we can use permutation representation. We can
take the same example of TSP, let us assume the person has
to travel all the cities, visiting one city at a time, and then
comes back to the source city. As we can see the order of
the TSP is naturally becoming a permutation, therefore we
can use the permutation representation.
Negative Reinforcement
Negative Reinforcement is defined as strengthening of a
behavior due to a negative condition is either stopped or
avoided.
Merits of reinforcement learning
It increases behavior of the model.
It provides the opposition to minimum standard of performance.
Demerits of reinforcement learning
It only provides enough to require for the minim
Evaluating Hypotheses:
Basics of Sampling
Theory
For estimating hypothesis accuracy, statistical methods are
applied. In this blog, we’ll have a look at evaluating
hypotheses and the basics of sampling theory.
Probability Distribution:
A probability distribution is a statistical function that
specifies all possible values and probabilities for a random
variable in a given range.
This range will be bounded by the minimum and greatest
possible values, but where the possible value will be plotted
on the probability distribution will be determined by a
variety of factors.
Y defines the chance Pr(Y = yi) that Y will take on the value
yi for each potential value yi for a random variable.
Expected Value:
The expected value (EV) is the value that investment is
predicted to have at some time in the future.
Standard Deviation:
The standard deviation is a statistic that calculates the
square root of the variance and measures the dispersion of
a dataset relative to its mean.
When data points are further from the mean, there is more
variation within the data set; as a result, the larger the
standard deviation, the more spread out the data is.
Normal Distribution:
The standard distribution, also known as the Gaussian
distribution, is the probability of a measure of distribution
based on the definition, indicating that the data about the
definition occurs more often than the data at a distance. The
normal distribution will appear as a metal grid on the graph.
N % confidence interval:
An N% confidence interval estimate for parameter p is an
interval that includes p with probability N%.
Bayesian Learning:
Introduction
Bayesian machine learning is a subset of probabilistic
machine learning approaches (for other probabilistic models,
see Supervised Learning). In this blog, we’ll have a look at a
brief introduction to bayesian learning.
In Bayesian learning, model parameters are treated as
random variables, and parameter estimation entails
constructing posterior distributions for these random
variables based on observed data.