Upload Unit 1
Upload Unit 1
PSO 1 Students will have the ability to apply software engineering principles to
design, build, test, and deliver solutions for Software Industry.
PSO 2 The students will be able to use programming, database, and networking and
web development concepts for developing solutions for real-life problems.
Course Objectives:
In this course we will study the basic component of an intelligence system i.e. machine learning, their
functions, mechanisms, policies and techniques used in their implementation and examples.
COURSE CONTENTS:
Unit –I
Introduction to machine learning, scope and limitations, regression, probability, statistics and
linear algebra for machine learning, convex optimization, data visualization, hypothesis function
and testing, data distributions, data preprocessing, data augmentation, normalizing data sets,
machine learning models, supervised and unsupervised learning.
Unit –II
Linearity vs non linearity, activation functions like sigmoid, ReLU, etc., weights and bias,
loss function, gradient descent, multilayer network, backpropagation, weight initialization,
training, testing, unstable gradient problem, auto encoders, batch normalization, dropout, L1 and
L2 regularization, momentum, tuning hyper parameters.
Unit –III
Convolutional neural network, flattening, subsampling, padding, stride, convolution layer,
pooling layer, loss layer, dance layer 1x1 convolution, inception network, input channels,
transfer learning, one shot learning, dimension reductions, implementation of CNN like
tensor flow, keras etc.
Unit –IV
Recurrent neural network, Long short-term memory, gated recurrent unit, translation, beam
search and width, Bleu score, attention model, Reinforcement Learning, RL-framework,
MDP, Bellman equations, Value Iteration and Policy Iteration, , Actor-critic model, Q
learning, SARSA
Unit –V
Support Vector Machines, Bayesian learning, application of machine learning in computer
vision, speech processing, natural language processing etc, Case Study: ImageNet Competition.
REFERENCE BOOKS:
UNIT –I
Lecture No. 1
Today's Agenda:
Introduction to Machine Learning: Understanding the origin and historical background
of machine learning.
Real world examples of Machine Learning: Exploring the Real world examples of
Machine Learning.
Types of Machine Learning: Briefly overviewing the three types of Machine Learning.
Introduction to ML
Arthur Samuel, an early American leader in the field of computer gaming and artificial
intelligence, coined the term “Machine Learning” in 1959 while at IBM. He defined machine
learning as “the field of study that gives computers the ability to learn without being explicitly
programmed “. However, there is no universally accepted definition for machine learning.
Different authors define the term differently. We give below two more definitions.
A computer program is said to learn from experience E with respect to some class of tasks T
and performance measure P, if its performance at tasks T, as measured by P, improves with
experience E.
Examples
Diagram visualizing difference between Machine Learning Algorithm and Machine Learning Model.
Types of Machine learning
Supervised Learning
Unsupervised Learning
Reinforcement Learning
A. Supervised learning:
The group of algorithms that require dataset which consists of example input-output pairs. Each
pair consists of data sample used to make prediction and expected outcome called label. Word
“supervised” comes from a fact that labels need to be assigned to data by the human supervisor.
In training process, samples are being iteratively fed to the model. For every sample, the model
uses the current state of parameters and returns a prediction. Prediction is compared to label, and
the difference is called an error. The error is a feedback for the model of what went wrong and
how to update itself in order to decrease the error in future predictions. This means that model
will change the values of its parameters according to the algorithm based on which it was created.
Supervised Learning models are trying to find parameter values that will allow them to
perform well on historical data. Then they are used for making predictions on unknown data,
that was not a part of training dataset.
There are two main problems that can be solved with Supervised Learning:
Regression – process of predicting a continuous, numerical value for input data sample.
Example usages: assessing the house price, forecasting grocery store food demand,
temperature forecasting.
Example of Classification and Regression models
Summary
Machine learning (ML) has become incredibly important in today's world, permeating nearly
every aspect of our lives. Its ability to learn from data and improve its performance over time
makes it a powerful tool for a variety of tasks, from recommending products on Amazon to
predicting fraud in financial transactions.
Lecture No. 2
Today's Agenda:
Detail discussion on linear algebra for Machine Learning: Discussion on
Mathematics concepts behind the Linear Regression model
Case study of Linear Regression: Exploring the Real world examples of each of its
types of Machine Learning.
Linear Regression
Linear regression is a simple and versatile statistical technique used to model the
relationship between a dependent variable (target) and one or more independent
variables (features). The primary objective of linear regression is to find the best-
fitting straight line through the data points, which allows us to predict the dependent
variable based on the values of the independent variables.
Lecture No. 3
Today's Agenda:
Detail discussion on Probability & statistics: Discussion on Probability & statistics
Practising numerical problems on Probability & statistics: Exploring the Real world
examples of each of its types of Machine Learning.
Probability quantifies the likelihood of an event occurring. For example, if you roll a fair,
unbiased die, then the probability of 1 turning up is 1/6.
Independence
Two events A and B are said to be independent if the occurrence of A does not affect event B.
For example, if you toss a coin and roll a die, the outcome of the die has no effect on whether the
coin shows heads or tails. Also, for two independent events A and B, the probability that A and
B can occur together. So for example, if you want the probability that coin shows heads and die
shows 3
Now let’s talk about events that are not independent. Consider the following table:
Heart Problems 45 15
No heart
10 30
problems
# A survey of 100 people was taken. 60 had heart problems and 40 didn’t. Of the 60 having a
heart problem,
45 were obese. Of the 40 having no heart problem, 10 were obese.
Conditional Probabilities:
If the probability of event A occurring is conditioned on event B, we represent it as
P(A|B)
Now, there is a theorem that helps us calculate this conditional probability. It is called the Bayes
Rule. P(A|B) =P(A and B)/P(B)
Statistics:
Statistics are used to summarize and make inferences about a large number of data points:
Centrality measures
Distributions (especially normal)
Centrality measures and measures of spreads
Mean:
Mean is just an average of numbers. To find out mean, you have to sum the numbers and divide
with the number of numbers. For example, the mean of [1,2,3,4,5] is 15/5 = 3.
Median:
Median is the middle element of a set of numbers when they are arranged in ascending
order. For example, numbers [1,2,4,3,5] are arranged in an ascending order [1,2,3,4,5]. The
middle one of these is 3. Therefore the median is 3. But what if the number of numbers is even
and therefore has no middle number? In that case, you take the average of the two middle-most
numbers. For a sequence of 2n numbers in ascending order, average the nth and (n+1)th number
to get the median. Example – [1,2,3,4,5,6] has the median (3+4)/2 = 3.5
Mode:
Mode is simply the most frequent number in a set of numbers. For example, mode of
[1,2,3,3,4,5,5,5] is 5.
Variance: Variance is not a centrality measure. It measures how your data is spread around the
mean. It is quantified as
Summary
Probability and statistics are the pillars upon which machine learning stands. They provide the
tools and concepts needed to understand data, train models, and make accurate predictions.
Lecture No. 4
Today's Agenda:
Detail discussion on Convex Optimization: Discussion on Convex Optimization with
example.
Convex Optimization
Convex optimization is a powerful tool used to solve optimization problems in various fields
such as finance, engineering, and machine learning. In a convex optimization problem, the goal is to
find a point that maximizes or minimizes the objective function. This is achieved through iterative
computations involving convex functions, which are functions that always lie above their chords.
The objective function is subject to both equality and inequality constraints. An equality
constraint requires the solution to be exactly at a given point, while an inequality constraint restricts
the solution to a certain range. These constraints are critical in defining the feasible region, which is
the set of all solutions that satisfy the constraints.
Data Visualization
Data Visualization techniques involve the generation of graphical or pictorial
representation of DATA, form which leads you to understand the insight of a given data set.
This visualisation technique aims to identify the Patterns, Trends, Correlations, and Outliers
of data sets.
Patterns in business operations: Data visualization techniques help us to determine the patterns
of business operations. By understanding the problem statement and identifying the solutions in
terms of pattering and applied to eliminate one or more of the inherent problems.
Identify business trends and relate to data: These techniques help us identify market trends by
collecting the data on Day-To-Day business activities and preparing trend reports, which helps
track the business how influences the market. So that we could understand the competitors and
customers. Certainly, this helps to long-term perspective.
Storytelling and Decision making: Knowledge of storytelling from available data is one of the
niche skills for business communication, specifically for the Data Science domain which is
playing a vital role. Using best visualization this role can be enhanced much better way and
reaching the objectives of business problems.
Understand the current business insights and setting the goals: Businesses can understand
the insight of the business KPIs, finding tangible goals and business strategy plannings, therefore
they could optimize the data for business strategy plans for ongoing activities.
Operational and Performance analysis: Increase the productivity of the manufacturing unit:
With the help of visualization techniques the clarity of KPIs depicting the trends of the
productivity of the manufacturing unit, and guiding were to improve the productivity of the
plant.
Example:
Line Chart
Line Chart is a simple data visualization in Python, which is available under Matplotlib.
plt.show()
import numpy as np
import matplotlib.pyplot as plt
# Data
x1 = np.linspace(0, 10, 25)
y1 = np.sin(x) + x/2
x2 = np.linspace(0, 10, 25)
y2 = np.cos(x) + x/2
# Line chart
fig, ax = plt.subplots()
ax.plot(x1, y1, marker = "o",
label = "Sin(x) + x/2")
ax.plot(x2, y2, marker = "o",
label = "Cos(x) + x/2")
ax.legend()
Line charts are used to represent the relation between two data X and Y on the
respective
axis.
Summary
Convex Optimization stands as a powerful foundation for many machine
learning techniques. It's the art of finding the best possible solution within a
set of constraints, ensuring efficiency and reliability in model training.
data visualization is an indispensable tool in the machine learning
workflow. It empowers us to understand data, build better models,
communicate effectively, and ultimately, unlock the full potential of
machine learning.
Lecture No. 5
Today's Agenda:
Detail discussion on Hypothesis Function: Discussion on Hypothesis
Function and space with example.
In most supervised machine learning algorithm, our main goal is to find out a
possible hypothesis Convex optimization can be used to optimize algorithms by
improving the speed at which they converge to a solution. Additionally, it can be used
to solve linear systems of equations by finding the best approximation to the system,
rather than computing an exact answer. Convex optimization plays a critical role in
training machine learning models, which involves finding the optimal parameters that
minimize a given loss function. In machine learning, convex optimization is used to
solve a wide range of problems such as linear regression, logistic regression, support
vector machines, and neural networks. The following are some applications of convex
optimization in training machine learning models:
from the hypothesis space that could possibly map out the inputs to the proper outputs.
The following figure shows the common method to find out the possible hypothesis from the
Hypothesis space:
Hypothesis space is the set of all the possible legal hypothesis. This is the set from which the
machine learning algorithm would determine the best possible (only one) which would best
describe the target function or the outputs.
Hypothesis (h):
A hypothesis is a function that best describes the target in supervised machine learning. The
hypothesis that an algorithm would come up depends upon the data and also depends upon
the restrictions and bias that we have imposed on the data.
Summary
In conclusion, the hypothesis (h) is the core concept in machine learning. It's the bridge
between data and models, guiding the learning process and shaping the predictions made.
Understanding its importance is crucial for effectively building and applying machine
learning models to real-world problems.
Lecture No. 6
Today's Agenda:
Detail discussion on Data Distributions, Data Pre-processing : Discussion
on Data Distributions, Data Preprocessing, with example.
a dataset to a similar scale. The purpose is to ensure that all features contribute
equally to the model and to avoid the domination of features with larger values.
Feature scaling becomes necessary when dealing with datasets containing
magnitude. In such cases, the variation in feature values can lead to biased
model performance or difficulties during the learning process. There are several
normalization, and min-max scaling. These methods adjust the feature values
involves transforming the data so that it has a zero mean and a unit variance.
algorithms, such as linear regression and logistic regression, assume that the
data is normally distributed. Normalizing the data can help to improve the
points that are far away from the rest of the data. They can have a large impact
on the training of a model, and normalizing the data can help to mitigate this
effect.
can occur when the data has a large range of values, and it can lead to
inaccurate results.
1. Mean normalization: This method subtracts the mean of the data from
3. Z-score normalization: This method subtracts the mean of the data from
each data
Example:
In this example, we’ll demonstrate how to normalize a dataset using the min-
import numpy as np
scaler = MinMaxScaler()
normalized_data = scaler.fit_transform(data)
print("Original data:")
print(data)
print("Normalized data:")
print(normalized_data)
Output:
Original data:
[[ 1 200 3000]
[ 2 300 4000]
[ 3 400 5000]]
Normalized data:
[[0. 0. 0. ]
[1. 1. 1. ]]
Summary
In summary, normalizing datasets is essential for:
Lecture No. 7
Today's Agenda:
Detail discussion on Supervised Machine Learning Discussion on
Supervised Machine learning with real world applications.
A. Supervised learning:
Supervised learning is the machine learning task of learning a function that
maps an input to an output based on example input-output pairs. The
given data is labeled. Both classification and regression problems are
supervised learning problems.
B. Unsupervised learning:
Gender Age
M 48
M 67
F 53
M 49
F 34
M 21
Group of algorithms that try to draw inferences from non-labeled data (without
reference to known or labeled outcomes). In Unsupervised Learning, there are no
correct answers. Models based on this type of algorithms can be used for discovering
unknown data patterns and data structure itself.
The most common applications of Unsupervised Learning are:
Pattern recognition and data clustering – Process of dividing and grouping similar
data samples together. Groups are usually called clusters. Example usages:
segmentation of supermarkets, user base segmentation, signal denoising.
Reducing data dimensionality – Data dimension is the number of features needed to
describe data sample. Dimensionality reduction is a process of compressing features
into so-called principal values which conveys similar information concisely. By
selecting only a few components, the amount of features is reduced and a small part of
the data is lost in the process. Example usages: speeding up other Machine Learning
algorithms by reducing numbers of calculations, finding a group of most reliable
features in data.
Example of Unsupervised Learning concept. All data is fed to the model and it produces an output on
it’s own based on similarity between samples and algorithm used to create the model.
C. Reinforcement learning:
Training of an agent is a process of trial and error. It needs to find itself in various
situations and get punished every time it takes the wrong action in order to learn.
The goal of optimization can be set in many ways depending on Reinforcement
Learning approach e.g. based on Value Function, Gradient Policy or Environment
Model.
There is a broad group of Reinforcement Learning applications. Majority of them
are the inventions, that are regularly mentioned as most innovative
accomplishments of AI.
Example of solutions where Reinforcement Learning is used. From self-driving cars through various
games such as Go, Chess, Poker or computer ones — Dota or Starcraft, to manufacturing.
Summary
There are three main types of machine learning, each with its unique characteristics and
applications:
1. Classification: When inputs are divided into two or more classes, the learner must
produce a model that assigns unseen inputs to one or more (multi-label classification) of these
classes. This is typically tackled in a supervised way. Spam filtering is an example of
classification, where the inputs are email (or other) messages and the classes are “spam” and
“not spam”.
2. Regression: Which is also a supervised problem, A case when the outputs are
continuous rather than discrete.
3. Clustering: When a set of inputs is to be divided into groups. Unlike in classification,
the groups are not known beforehand, making this typically an unsupervised task.
# Labeled data is data that has some predefined tags such as name, type, or number. For example, an
image has an apple or banana. At the same time, unlabeled data contains no tags or no specified
name.
NOTES
UNIT –II
Today's Agenda:
Detail discussion on Linearity vs non linearity in Machine learning with
real world applications.
Here is an example of a linear data set or linearly separable data set. The data set used
is the IRIS data set from sklearn.datasets package. The data represents two different
classes such as Setosa and Versicolor. Note that one can easily separate the data
represented using black and green marks with a linear hyperplane/line.
The code which is used to print the above scatter plot is the following:
import pandas as pd
import numpy as np
iris = datasets.load_iris()
X = iris.data
y = iris.target
plt.legend(loc='upper left')
plt.show()
Here is an example of a non-linear data set or linearly non-separable data set. The data set
used is the IRIS data set from sklearn.datasets package. The data represents two different
classes such as Virginica and Versicolor. Note that one can’t separate the data
represented using black and red marks with a linear hyperplane. Thus, this data can be
called as non-linear data.
The code which is used to print the above scatter plot to identify non-linear dataset is the
following:
import pandas as pd
import numpy as np
iris = datasets.load_iris()
X = iris.data
y = iris.target
plt.legend(loc='upper left')
plt.show()
Use Simple Regression Method for Regression Problem
Linear data is data that can be represented on a line graph. This means that there is a
clear relationship between the variables and that the graph will be a straight line. Non-
linear data, on the other hand, cannot be represented on a line graph. This is because
there is no clear relationship between the variables and the graph will be curved.
In case you are dealing with predicting numerical value, the technique is to use
scatter plots and also apply simple linear regression to the dataset, and then check
the least square error. If the least square error shows high accuracy, it can be implied
that the dataset is linear in nature, else the dataset is non-linear. Here is how the scatter
plot would look for a linear data set when dealing with a regression problem.
In addition to the above, you could also fit a regression model and examine the statistics
such as R-squared, adjusted R-squared, F-statistics, etc to validate the linear relationship
between response and the predictor variables. For instance, if the value of F-statistics is
more than the critical value, we reject the null hypothesis that all the coefficients = 0. This
means that there exists some linear relationship between the response and one or more
predictor variables. You may want to check out this post to learn greater details.
While linear data is relatively easy to predict and model, non-linear data can be more
difficult to work with. However, non-linear data can also provide more insight into
complex systems.
Summary