0% found this document useful (0 votes)
20 views34 pages

Machine Learning: Upendra Verma

Uploaded by

Abhishek Rawat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views34 pages

Machine Learning: Upendra Verma

Uploaded by

Abhishek Rawat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Machine Learning

Upendra Verma

01/21/2025
Machine Learning

• Machine Learning is the field of study that gives computers the


capability to learn without being explicitly programmed.

• As it is evident from the name, it gives the computer that


makes it more similar to humans: The ability to learn. Machine
learning is actively being used today, perhaps in many more
places than one would expect.

SSI
2
SSI
3
Features of Machine learning
• Machine learning is data driven technology. Large amount of
data generated by organizations on daily bases. So, by notable
relationships in data, organizations makes better decisions.

• Machine can learn itself from past data and automatically


improve.

• From the given dataset it detects various patterns on data.

• For the big organizations branding is important and it will


become more easy to target relatable customer base.

• It is similar to data mining because it is also deals with the


huge amount of data.

SSI
4
Types of machine learning problems
• Supervised learning: The model or algorithm is presented
with example inputs and their desired outputs and then finds
patterns and connections between the input and the output.
The goal is to learn a general rule that maps inputs to outputs.
The training process continues until the model achieves the
desired level of accuracy on the training data.

Image Classification, Prediction/Regression etc.

SSI
5
Supervised learning

SSI
6
Supervised learning
• Regression: Regression is a type of supervised learning where
the algorithm learns to predict continuous values based on
input features. The output labels in regression are continuous
values, such as stock prices, and housing prices.

• The different regression algorithms in machine learning are:


Linear Regression, Polynomial Regression, Ridge Regression, Decision Tree
Regression, Random Forest Regression, Support Vector Regression, etc

SSI
7
Supervised learning
• Classification: Classification is a type of supervised learning
where the algorithm learns to assign input data to a specific
category or class based on input features. The output labels in
classification are discrete values. Classification algorithms can
be binary, where the output is one of two possible classes, or
multiclass, where the output can be one of several classes. The
different Classification algorithms in machine learning are:
Logistic Regression, Naive Bayes, Decision Tree, Support
Vector Machine (SVM), K-Nearest Neighbors (KNN), etc

SSI
8
Unsupervised learning
• Unsupervised learning: No labels are given to the learning
algorithm, leaving it on its own to find structure in its input. It
is used for clustering populations in different groups.

Clustering, High-Dimension Visualization, Generative


Models, etc.

SSI
9
Unsupervised learning

SSI
10
Unsupervised learning

• Clustering: Clustering algorithms group similar data points


together based on their characteristics. The goal is to identify
groups, or clusters, of data points that are similar to each
other, while being distinct from other groups. Some popular
clustering algorithms include K-means, Hierarchical
clustering, and DBSCAN.

• Dimensionality reduction: Dimensionality reduction


algorithms reduce the number of input variables in a dataset
while preserving as much of the original information as
possible. This is useful for reducing the complexity of a
dataset and making it easier to visualize and analyze. Some
popular dimensionality reduction algorithms include Principal
Component Analysis (PCA), t-SNE, and Autoencoders.

SSI
11
Semi-supervised learning

• Semi-supervised learning: Problems where you have a large


amount of input data and only some of the data is labeled, are
called semi-supervised learning problems. These problems sit
in between both supervised and unsupervised learning. For
example, a photo archive where only some of the images are
labeled, (e.g. dog, cat, person) and the majority are unlabeled.

SSI
12
SSI
13
Reinforcement learning

• Reinforcement learning: A computer program interacts with


a dynamic environment in which it must perform a certain goal
(such as driving a vehicle or playing a game against an
opponent). The program is provided feedback in terms of
rewards and punishments as it navigates its problem space.

SSI
14
Reinforcement learning

SSI
15
Reinforcement learning

• Model-based reinforcement learning: In model-based


reinforcement learning, the agent learns a model of the
environment, including the transition probabilities between
states and the rewards associated with each state-action pair.
The agent then uses this model to plan its actions in order to
maximize its expected reward. Some popular model-based
reinforcement learning algorithms include Value Iteration and
Policy Iteration.
• Model-free reinforcement learning: In model-free
reinforcement learning, the agent learns a policy directly
from experience without explicitly building a model of the
environment. The agent interacts with the environment and
updates its policy based on the rewards it receives. Some
popular model-free reinforcement learning algorithms include
Q-Learning, SARSA, and Deep Reinforcement Learning.
SSI
16
Terminologies of Machine Learning
• Model A model is a specific representation learned from data by
applying some machine learning algorithm. A model is also called
a hypothesis.
• Feature A feature is an individual measurable property of our
data. A set of numeric features can be conveniently described by a
feature vector. Feature vectors are fed as input to the model.
• Target (Label) A target variable or label is the value to be
predicted by our model. For the fruit example discussed in the
features section, the label with each set of input would be the
name of the fruit like apple, orange, banana, etc.
• Training The idea is to give a set of inputs(features) and its
expected outputs(labels), so after training, we will have a model
(hypothesis) that will then map new data to one of the categories
trained on.
• Prediction Once our model is ready, it can be fed a set of inputs to
which it will provide a predicted output(label). But make sure if
the machine performs well on unseen data, then only we can say
the machine performs well.
SSI
17
ML Training and Prediction

SSI
18
Need for machine learning
• Predictive modeling: Machine learning can be used to build
predictive models that can help businesses make better decisions.
For example, machine learning can be used to predict which
customers are most likely to buy a particular product, or which
patients are most likely to develop a certain disease.
• Natural language processing: Machine learning is used to build
systems that can understand and interpret human language. This is
important for applications such as voice recognition, chatbots, and
language translation.
• Computer vision: Machine learning is used to build systems that
can recognize and interpret images and videos. This is important for
applications such as self-driving cars, surveillance systems, and
medical imaging.
• Fraud detection: Machine learning can be used to detect fraudulent
behavior in financial transactions, online advertising, and other
areas.
• Recommendation systems: Machine learning can be used to build
recommendation systems that suggest products, services, or content
to users based on their past behavior and preferences. SSI
19
Get started with machine learning

1. Define the Problem


2. Collect Data
3. Explore the Data
4. Pre-process the Data
5. Split the Data
6. Choose a Model
7. Train the Model
8. Evaluate the Model
9. Fine-tune the Model
10. Deploy the Model
11. Monitor the Model

SSI
20
Applications of Machine Learning

• Automation: Machine learning, which works entirely autonomously in any field


without the need for any human intervention. For example, robots perform the
essential process steps in manufacturing plants.
• Finance Industry: Machine learning is growing in popularity in the finance
industry. Banks are mainly using ML to find patterns inside the data but also to
prevent fraud.
• Government organization: The government makes use of ML to manage public
safety and utilities. Take the example of China with its massive face recognition.
The government uses Artificial intelligence to prevent jaywalking.
• Healthcare industry: Healthcare was one of the first industries to use machine
learning with image detection.
• Marketing: Broad use of AI is done in marketing thanks to abundant access to
data. Before the age of mass data, researchers develop advanced mathematical
tools like Bayesian analysis to estimate the value of a customer. With the boom of
data, the marketing department relies on AI to optimize customer relationships and
marketing campaigns.
• Retail industry: Machine learning is used in the retail industry to analyze
customer behavior, predict demand, and manage inventory. It also helps retailers to
personalize the shopping experience for each customer by recommending products
based on their past purchases and preferences.
• Transportation: Machine learning is used in the transportation industry to
optimize routes, reduce fuel consumption, and improve the overall efficiency of
transportation systems. It also plays a role in autonomous vehicles, where ML
algorithms are used to make decisions about navigation and safety. SSI
22
Challenges and Limitations

• The primary challenge of machine learning is the lack of


data or the diversity in the dataset.
• A machine cannot learn if there is no data available.
Besides, a dataset with a lack of diversity gives the
machine a hard time.
• A machine needs to have heterogeneity to learn
meaningful insight.
• It is rare that an algorithm can extract information when
there are no or few variations.
• It is recommended to have at least 20 observations per
group to help the machine learn. This constraint leads to
poor evaluation and prediction.

SSI
23
Python libraries for Machine Learning

Python libraries that are used in Machine Learning


are:

• Numpy
• Scipy
• Scikit-learn
• Theano
• TensorFlow
• Keras
• PyTorch
• Pandas
• Matplotlib

SSI
24
Least Squares(OLS) method of linear regression

• A linear regression model establishes the relation between a


dependent variable(y) and at least one independent variable(x)
as :

• In OLS method, we have to choose the values of and such that,


the total sum of squares of the difference between the
calculated and observed values of y, is minimized.
Formula for OLS:

SSI
25
Linear Regression

• Linear regression is a type of supervised machine learning algorithm


that computes the linear relationship between a dependent variable
and one or more independent features. When the number of the
independent feature, is 1 then it is known as Univariate Linear
regression, and in the case of more than one feature, it is known as
multivariate linear regression.

• Simple Linear Regression


• This is the simplest form of linear regression, and it involves only
one independent variable and one dependent variable.

• Multiple Linear Regression


• This involves more than one independent variable and one
dependent variable.
SSI
26
Hypothesis function in Linear Regression
• Let’s assume there is a linear relationship between X and Y
then the salary can be predicted using:
y=a+bx

The model gets the best regression fit line by finding the best a
and b values

SSI
27
How to update a and b values to get the best-fit line

• Cost function
• In Linear Regression, the Mean Squared Error (MSE) cost
function is employed, which calculates the average of the
squared errors between the predicted values and the actual
values.

• Utilizing the MSE function, the iterative process of gradient


descent is applied to update the values of a & b. This ensures
that the MSE value converges to the global minima, signifying
the most accurate fit of the linear regression line to the dataset.
• This process involves continuously adjusting the parameters a
and b based on the gradients calculated from the MSE.

SSI
28
Gradient Descent for Linear Regression
• A linear regression model can be trained using the optimization
algorithm gradient descent by iteratively modifying the model’s
parameters to reduce the mean squared error (MSE) of the
model on a training dataset. To update a and b values in order
to reduce the Cost function (minimizing RMSE value) and
achieve the best-fit line the model uses Gradient Descent.

SSI
29
Gradient Descent for Linear Regression

SSI
30
Evaluation Metrics for Linear Regression
• A variety of evaluation measures can be used to determine the
strength of any linear regression model. These assessment
metrics often give an indication of how well the model is
producing the observed outputs.
• Coefficient of Determination (R-squared)
• R-Squared is a statistic that indicates how much variation the
developed model can explain or capture. It is always in the
range of 0 to 1. In general, the better the model matches the
data, the greater the R-squared number.

• Residual sum of Squares (RSS):The sum of squares of the


residual for each data point in the plot or data is known as the
residual sum of squares, or RSS. It is a measurement of the
difference between the output that was observed and what was
anticipated. SSI
31
Evaluation Metrics for Linear Regression
• Total Sum of Squares (TSS): The sum of the data points’
errors from the answer variable’s mean is known as the total
sum of squares, or TSS.

• Root Mean Squared Error (RMSE)


• The square root of the residuals’ variance is the Root Mean
Squared Error. It describes how well the observed data points
match the expected values, or the model’s absolute fit to the
data.

SSI
32
Decision Tree
• A decision tree is a type of supervised learning algorithm that is
commonly used in machine learning to model and predict
outcomes based on input data. It is a tree-like structure where
each internal node tests on attribute, each branch corresponds
to attribute value and each leaf node represents the final
decision or prediction.

• It can be used to solve both regression and classification


problems.

SSI
33
Decision Tree Terminologies
• Root Node: A decision tree’s root node, which represents the original choice
or feature from which the tree branches, is the highest node.
• Internal Nodes (Decision Nodes): Nodes in the tree whose choices are
determined by the values of particular attributes. There are branches on
these nodes that go to other nodes.
• Leaf Nodes (Terminal Nodes): The branches’ termini, when choices or
forecasts are decided upon. There are no more branches on leaf nodes.
• Branches (Edges): Links between nodes that show how decisions are made
in response to particular circumstances.
• Splitting: The process of dividing a node into two or more sub-nodes based
on a decision criterion. It involves selecting a feature and a threshold to
create subsets of data.
• Parent Node: A node that is split into child nodes. The original node from
which a split originates.
• Child Node: Nodes created as a result of a split from a parent node.
• Decision Criterion: The rule or condition used to determine how the data
should be split at a decision node. It involves comparing feature values
against a threshold.
• Pruning: The process of removing branches or nodes from a decision tree to
improve its generalization and prevent overfitting. SSI
34
Why Decision Tree
• Decision trees are so versatile in simulating intricate decision-
making processes
• Their portrayal of complex choice scenarios that take into
account a variety of causes and outcomes is made possible by
their hierarchical structure.
• They provide comprehensible insights into the decision logic,
decision trees are especially helpful for tasks involving
categorization and regression.
• They are proficient with both numerical and categorical data,
and they can easily adapt to a variety of datasets thanks to
their autonomous feature selection capability.
• Decision trees also provide simple visualization, which helps to
comprehend and elucidate the underlying decision processes in
a model.

SSI
35

You might also like