Lecture 03
Lecture 03
ICT-4261
By-
Dr. Jesmin Akhter
Professor
Institute of Information Technology
Jahangirnagar University
Contents
The course will mainly cover the following topics:
A Gentle Introduction to Machine Learning
Linear Regression
Logistic Regression
Naive Bayes
Support Vector Machines
Decision Trees and Ensemble Learning
Clustering Fundamentals
Hierarchical Clustering
Neural Networks and Deep Learning
Unsupervised Learning……
Outline
Clustering is a popular type of unsupervised learning approach.You can even break it down further into different
types of clustering; for example:
– Exlcusive clustering: or “hard” clustering
• It is the kind of grouping in which one piece of data can belong only to one cluster.
– Overlapping clustering:
• A soft cluster in which a single data point may belong to multiple clusters with varying degrees of
membership.
Types of Unsupervised Learning
Clustering Algorithms
– Hierarchical Clustering:
• Hierarchical clustering develops a hierarchy of clusters
by merging or splitting them depending on their
similarity. Here, two close cluster are going to be in the
same cluster.
• In case you start with all data items attached to the same
cluster and then perform splits until each data item is set
as a separate cluster, the approach will be called top-
down or divisive hierarchical clustering.
• Two clusters that are closest to one another are then
merged into a single cluster. The merging goes on
iteratively till there's only one cluster left at the top. Such
an approach is known as bottom-up or agglomerative.
• The example shows how seven different clusters (data
points) are merged step by step based on distance until
they all create one large cluster.
Types of Unsupervised Learning
Clustering Algorithms
– K-Means Clustering
• In K-means clustering, data is grouped in terms of
characteristics and similarities.
• K is a letter that represents the number of clusters. For
example, if K=5, then the number of desired clusters is 5. If
K=10, then the number of desired clusters is 10.
• It puts the data points into the predefined number of
clusters K.
• Each data item then gets assigned to the nearest cluster
center, called centroids (black dots in the picture). The latter
act as data accumulation areas.
• The procedure of clustering may be repeated several times
until the clusters are well-defined.
Types of Unsupervised Learning
Clustering Algorithms
– Fuzzy K-means
• It is an extension of the K-means algorithm used to perform overlapping clustering. Unlike the K-means
algorithm, fuzzy K-means implies that data points can belong to more than one cluster with a certain level of
closeness towards each.
• The closeness is measured by the distance from a data point to the centroid of the cluster. So, sometimes
there may be an overlap between different clusters.
–
Types of Unsupervised Learning
Clustering Algorithms
– Gaussian Mixture Models (GMMs) is an algorithm used in
probabilistic clustering.
• Models assume that there is a certain number of Gaussian
distributions, each representing a separate cluster.
• The algorithm is basically utilized to decide which cluster a
particular data point belongs to.
– Principal Component Analysis (PCA): PCA transforms the original features into a
lower-dimensional space while preserving the maximum amount of information.
– The PCA method is particularly useful when the variables within the data set are highly
correlated. Correlation indicates that there is redundancy in the data. Due to this
redundancy, PCA can be used to reduce the original variables into a smaller number of new
variables ( principal components) explaining most of the variance in the original variables.
What is the difference between supervised and
unsupervised learning?
Supervised learning requires labeled data with input features and corresponding output labels, while
unsupervised learning aims to discover patterns or structures in unlabeled data without predefined
output labels.
Reinforcement machine learning
Reinforcement Learning(RL) enables an agent to learn in an interactive environment by trial and error
using feedback from its own actions and experiences.
Here, agents are self-trained on reward and punishment mechanisms.
It can take actions and interact with it.
Reinforcement machine learning algorithm isn’t trained using sample data.
A sequence of successful outcomes will be reinforced to develop the best recommendation or policy
for a given problem.
Linear Regression
– Linear models
– A bidimensional example
– Hypothesis function for Linear Regression
– Cost function
Regression
Linear regression is a type of supervised machine learning algorithm that computes the linear
relationship between a dependent variable and one or more independent features.
When the number of the independent feature, is 1 then it is known as Univariate Linear
regression, and in the case of more than one feature, it is known as multivariate linear
regression.
It predicts the continuous output variables based on the independent input variable like the
prediction of house prices based on different parameters like house age, distance from the main
road, location, area, size etc.
Linear Regression
The goal of the algorithm is to find the best linear equation that can predict the value of the
dependent variable based on the independent variables.
The equation provides a straight line that represents the relationship between the dependent and
independent variables.
The slope of the line indicates how much the dependent variable changes for a unit change in the
independent variable(s).
Linear regression is used in many different fields, including finance, economics, and psychology, to
understand and predict the behavior of a particular variable.
– For example, in finance, linear regression might be used to understand the relationship between
a company’s stock price and its earnings or to predict the future value of a currency based
on its past performance.
Linear Regression
Y is a dependent or target variable and X is an independent variable also known as the predictor
of Y.
There are many types of functions or modules that can be used for regression.
– A linear function is the simplest type of function.
X may be a single feature or multiple features representing the problem.
Linear regression performs the task to predict a dependent variable value (y) based on a given
independent variable (x)). Hence, the name is Linear Regression.
In the figure, X (input) is the work experience and Y (output) is the salary of a person.
Hypothesis function for Linear Regression
salary of a person
– m = number of training examples
– x = input variables / features
– y = output variable "target" variables
• (x,y) - single training example
• (xi, yj) - specific example (ith training example)
We have assumed that our independent feature is the experience 𝑥 and the respective salary y is
the dependent variable.
x(Work experience in years ) y(salary)
2 3000
3 4000
4 5000
5 6000
10 11000
12 13000
Hypothesis function for Linear Regression
Let’s assume there is a linear relationship between X and Y then the salary can be predicted using:
ℎ𝜃 (𝑥) = 𝜃0 + 𝜃1 𝑥
The model gets the best regression fit line by finding the best θ0 and θ1 values.
θ0: intercept
θ1: coefficient of x or gradient
Chosen these parameters so hθ(x) is close to y for our training examples
Once we find the best θ0 and θ1 values, we get the best-fit line. So when we are finally using our
model for prediction, it will predict the value of ℎ𝜃 (𝑥) for the input value of x.
– Different values give you different functions
• If θ0 is 1.5 and θ1 is 0 then we get straight line parallel with X along 1.5
• If θ1 is > 0 then we get a positive slope
– Think of hθ(x) as a "y imitator" - it tries to convert the x into y, and considering we already
have y we can evaluate how well hθ(x) does this
Linear models
We can learn with a larger number of features
– So may have other parameters which contribute towards a price of houses
– Size
– Age
– Number bedrooms
– Number floors
– x1, x2, x3, x4
– With multiple features becomes hard to plot
• Can't really plot in more than 3 dimensions
• Notation becomes more complicated too
– Best way to get around with this is the notation of linear algebra
– Gives notation and set of things you can do with matrices and vectors
Now we have multiple features. A linear model is based on the assumption that it's possible to
approximate the output values through a regression process based on the rule. Hypothesis can be written
• ℎ𝜃 (𝑥) = 𝜃0 + 𝜃1 𝑥1 + 𝜃2 𝑥2 +𝜃3 𝑥3 + 𝜃4 𝑥4 = 𝜃0 + 𝑚 𝑖=1 𝜃𝑖 𝑥𝑖
Implementation: cost function(Cont.)
Hypothesis - is like your prediction machine, throw in an x value,
get a putative y value
The cost function or the loss function is nothing but the error or
difference between the predicted value ℎ𝜃 (𝑥 ) and the true value Y.
It is the Mean Squared Error (MSE) between the predicted
value and the true value.
Cost function - is a way to, use your training data, determine
values for your θ values which make the hypothesis as accurate as
possible
• This cost function is reasonable choice for most
regression functions
• Probably most commonly used function
Cost function - a deeper look
Lets consider some intuition about the cost function and why we
want to use it
– The cost function determines parameters
– The value associated with the parameters determines how your
hypothesis behaves, with different values
– To achieve the best-fit regression line, the model aims to predict
the target value. So the Cost function updates the θ0 and
θ1 values, to reach the best value that minimizes the error
between the predicted ℎ𝜃 𝑥 value and the true y value.
Simplified hypothesis
– Assumes θ0 = 0
Cost function and goal here are very similar to when we have θ0,
but with a simpler parameter
– Simplified hypothesis makes visualizing cost function J(θ1) a bit
easier
So hypothesis pass through 0,0
Cost function - a deeper look (Cont.)
• Two key functions we want to – Plot
understand • θ1 vs J(θ1)
• Data
– hθ(x)
– 1)
• Hypothesis is a function of x - function » θ1 = 1
of what the size of the house is
» J(θ1) = 0
– J(θ1) – 2)
• Is a function of the parameter of θ1 » θ1 = 0.5
– So for example » J(θ1) = ~0.58
– 3)
– θ1 = 1 » θ1 = 0
– J(θ1) = 0 » J(θ1) = ~2.3
Cost function - a deeper look (Cont.)
If we compute a range of values plot
– J(θ1) vs θ1 we get a polynomial (looks like a quadratic)
27