0% found this document useful (0 votes)
17 views

Chapter 5 - Machine Learning

Uploaded by

nebiyu daniel
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Chapter 5 - Machine Learning

Uploaded by

nebiyu daniel
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 114

Machine Learning Basics

12/15/2024 AI/COSC 1
Outline
– Knowledge in Leaning
– Learning probabilities models
– Supervised learning
– Learning classification models
– Probabilistic model
– Unsupervised learning
– Clustering models
– Reinforcement learning
– Deep learning
– Neural networks and back propagations
– Convolutions neural network
– Recurrent neural network and LSTMs

12/15/2024 AI/COSC 2
Learning
– An agent is learning if it improves its performance on
future tasks after making observations about the
world.
– Learning can range from the trivial, as exhibited by jotting
down a phone number, to the profound, as exhibited by
Albert Einstein, who inferred a new theory of the universe.
– Any component of an agent can be improved by learning
from data.
– The improvements, and the techniques used to make them,
depend on four major factors:
 Which component is to be improved
 What prior knowledge the agent already has.
 What representation is used for the data and the component.
 What feedback is available to learn from.

– In this chapter, the emphasis is on the type of


learning that is typical on learning from a collection
of input–output pairs, learn a function that predicts the
output for new inputs.
12/15/2024 AI/COSC 3
Learning
Why would we want an agent to learn?
– If the design of the agent can be
improved, why wouldn’t the designers
just program in that improvement to
begin with?
– There are three main reasons.
 First, the designers cannot anticipate all
possible situations that the agent might
find itself in, for example, a robot designed
to navigate mazes must learn the layout of
each new maze it encounters.
 Second, the designers cannot anticipate
all changes over time; a program designed
12/15/2024 to predict tomorrow’s
AI/COSC stock market prices 4
Learning
 Third, sometimes human programmers have no
idea how to program a solution themselves. For
example, most people are good at recognizing
the faces of family members, but even the best
programmers are unable to program a computer
to accomplish that task, except by using learning
algorithms.
In general
– Learning is essential for unknown
environments,
i.e., when designer lacks omniscience
– Learning is useful as a system construction
method,
i.e., expose the agent to reality rather than
trying to write it down
12/15/2024 AI/COSC 5
Learning Agents

12/15/2024 AI/COSC 6
Learning Agents

– An additional learning element means these


agents can gradually improve and become more
knowledgeable about an environment over time.
It does so by taking feedback from whatever
actions it has performed and adapting
accordingly.
– This process requires the Learning Agent to have
four components – the learning element (which
learns from experience); the critic (which is the
feedback system); the performance element
(which decides the external action that should
be taken); and the problem generator (which is a
feedback agent that keeps history and makes
new suggestions).
12/15/2024 AI/COSC 7
Machine learning
- A subset of artificial intelligence known as machine
learning focuses primarily on the creation of
algorithms that enable a computer to independently
learn from data and previous experiences.
- Arthur Samuel first used the term "machine
learning" in 1959. It could be summarized as
follows:
- Without being explicitly programmed, machine
learning enables a machine to automatically learn
from data, improve performance from experiences,
and predict things.

12/15/2024 AI/COSC 8
How Machine learning work?
- How Machine Learning work ?
- A machine learning system builds prediction models,
learns from previous data, and predicts the output of
new data whenever it receives it. The amount of
data helps to build a better model that accurately
predicts the output, which in turn affects the
accuracy of the predicted output.
- Let's say we have a complex problem in which we
need to make predictions. Instead of writing code,
we just need to feed the data to generic algorithms,
which build the logic based on the data and predict
the output.

12/15/2024 AI/COSC 9
Need of Machine learning?
Following are some key points which show
the importance of Machine Learning:
- Rapid increment in the production of
data
- Solving complex problems, which are
difficult for a human
- Decision making in various sector
including finance
- Finding hidden patterns and extracting
useful information from data.

12/15/2024 AI/COSC 10
How to get datasets for Machine
Learning
- The field of ML depends vigorously on
datasets for preparing models and
making precise predictions.
- Datasets assume a vital part in the
progress of Machine learnig projects
and are fundamental for turning into a
gifted information researcher

12/15/2024 AI/COSC 11
What is datasets ?
-A dataset is a collection of data in
which data is arranged in some order.
- A dataset can contain any data from a
series of an array to a database table.
Eg.

12/15/2024 AI/COSC 12
What is datasets ?
-A tabular dataset can be understood
as a database table or matrix, where
each column corresponds to a
particular variable, and each row
corresponds to the fields of the
dataset.
- The most supported file type for a
tabular dataset is "Comma Separated
File," or CSV. But to store a "tree-like
data," we can use the JSON file more
efficiently.

12/15/2024 AI/COSC 13
Types of data in datasets ?
Those are type of data in Datasets:-
– Numerical data:- Such as house price,
temperature, etc.
– Categorical data: Such as Yes/No,
True/False, Blue/green, etc.
– Ordinal data: These data are similar
to categorical data but can be
measured on the basis of
comparison.

12/15/2024 AI/COSC 14
Popular sources for Machine Learning
datasets
Below is the list of datasets which are
freely available for the public to work
on it:
1. Kaggle Datasets
2. UCI Machine Learning Repository
3. Datasets via AWS
4. Google's Dataset Search Engine
5. Microsoft Datasets

12/15/2024 AI/COSC 15
Classification of Machine
learning?
machine learning can be classified into
three types:
1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning

12/15/2024 AI/COSC 16
Supervised Machine learning?
- Supervised learning is a type of
machine learning where an algorithm
learns from labeled training data to
make predictions or decisions.
- In supervised learning, the algorithm
is provided with a dataset that includes
both input features and corresponding
output labels. The goal is for the
algorithm to learn a mapping or
relationship between the input data
and the desired output.

12/15/2024 AI/COSC 17
How Supervised Machine
learning work?

12/15/2024 AI/COSC 18
Steps Involved in Supervised
Learning?
– First Determine the type of training
dataset
– Collect/Gather the labelled training
data.
– Split the training dataset into training
dataset, test dataset, and validation
dataset.
– Determine the input features of the
training dataset, which should have
enough knowledge so that the model
can accurately predict the output.

12/15/2024 AI/COSC 19
Steps Involved in Supervised
Learning?
– Determine the suitable algorithm for
the model, such as support vector
machine, decision tree, etc.
– Execute the algorithm on the training
dataset. Sometimes we need
validation sets as the control
parameters, which are the subset of
training datasets.
– Evaluate the accuracy of the model
by providing the test set. If the model
predicts the correct output, which
means our model is accurate.
12/15/2024 AI/COSC 20
Supervised Machine learning?
-Supervised learning can be grouped
further in two categories of algorithms:
- Classification
- Regression

12/15/2024 AI/COSC 21
Regression
-Regression algorithms are used if there is a
relationship between the input variable and
the output variable.
- It is used for the prediction of continuous
variables, such as Weather forecasting,
Market Trends, etc.
- Below are some popular Regression
algorithms which come under supervised
learning:
– Linear Regression
– Regression Trees
– Non-Linear Regression
– Bayesian Linear and Polynomial Regression
12/15/2024 AI/COSC 22
Linear Regression
- Linear regression is a statistical method
used for modeling the relationship
between a dependent variable and one or
more independent variables.
- The goal is to find the best-fitting linear
relationship that can be used to predict
the values of the dependent variable
based on the values of the independent
variables.
- It is a simple and widely used approach
in both statistics and machine learning.
- Linear regression makes predictions for
continuous/real or numeric variables such as sales,
salary, age, product price,
12/15/2024 etc.
AI/COSC 23
Linear Regression
- Linear regression algorithm shows a
linear relationship between a
dependent (y) and one or more
independent (y) variables, hence called
as linear regression.
- Since linear regression shows the
linear relationship, which means it
finds how the value of the dependent
variable is changing according to the
value of the independent variable.

12/15/2024 AI/COSC 24
Linear Regression
-

12/15/2024 AI/COSC 25
Linear Regression
-Mathematically, we can represent a linear
regression as:
y= a0+a1x+ ε
Here,
Y= Dependent Variable (Target Variable)
X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional
degree of freedom)
a1 = Linear regression coefficient (scale factor
to each input value).
ε = random error
The values for x and y variables are training
datasets for Linear Regression model
representation.
12/15/2024 AI/COSC 26
Types of Linear Regression
-Linear regression can be further divided
into two types of the algorithm:
1. Simple Linear Regression:- If a
single independent variable is used to
predict the value of a numerical
dependent variable, then such a Linear
Regression algorithm is called Simple
Linear Regression.
2. Multiple Linear regression:- If more
than one independent variable is used to
predict the value of a numerical
dependent variable, then such a Linear
Regression algorithm is called Multiple
Linear Regression.AI/COSC
12/15/2024 27
Linear Regression Line
- A linear line showing the relationship
between the dependent and
independent variables is called a
regression line. A regression line can
show two types of relationship:
Positive Linear Relationship:
If the dependent variable increases on
the Y-axis and independent variable
increases on X-axis, then such a
relationship is termed as a Positive
linear relationship.

12/15/2024 AI/COSC 28
Linear Regression Line
-

12/15/2024 AI/COSC 29
Linear Regression Line
-Negative Linear Relationship:
If the dependent variable decreases on
the Y-axis and independent variable
increases on the X-axis, then such a
relationship is called a negative linear
relationship.

12/15/2024 AI/COSC 30
Linear Regression example
-

12/15/2024 AI/COSC 31
Linear Regression example
-

12/15/2024 AI/COSC 32
Linear Regression example
-

12/15/2024 AI/COSC 33
Linear Regression example
- Solve this Example 2

12/15/2024 AI/COSC 34
Finding the best fit line in Linear
Regression
- When working with linear regression,
our main goal is to find the best fit line
that means the error between
predicted values and actual values
should be minimized. The best fit line
will have the least error.
- The different values for weights or
the coefficient of lines (a0, a1) gives a
different line of regression, so we need
to calculate the best values for a0 and
a1 to find the best fit line, so to
calculate this we use cost function.

12/15/2024 AI/COSC 35
Cost function
-The different values for weights or
coefficient of lines (a0, a1) gives the
different line of regression, and the cost
function is used to estimate the values
of the coefficient for the best fit line.
- Cost function optimizes the regression
coefficients or weights. It measures how
a linear regression model is performing.
- We can use the cost function to find
the accuracy of the mapping function,
which maps the input variable to the
output variable. This mapping function
is also known as Hypothesis function.
12/15/2024 AI/COSC 36
Mean Squared Error (MSE) Cost
function
-For Linear Regression, we use the
Mean Squared Error (MSE) cost
function, which is the average of
squared error occurred between the
predicted values and actual values. It
can be written as:

12/15/2024 AI/COSC 37
Gradient Descent
- Gradient descent is used to minimize
the MSE by calculating the gradient of
the cost function.
- A regression model uses gradient
descent to update the coefficients of
the line by reducing the cost function.
- It is done by a random selection of
values of coefficient and then
iteratively update the values to reach
the minimum cost function.

12/15/2024 AI/COSC 38
Model Performance
- The Goodness of fit determines how the
line of regression fits the set of observations.
The process of finding the best model out of
various models is called optimization. It can
be achieved by R-squared method

12/15/2024 AI/COSC 39
R-squared method
-R-squared is a statistical method that
determines the goodness of fit.
- It measures the strength of the relationship
between the dependent and independent
variables on a scale of 0-100%.
- The high value of R-square determines the
less difference between the predicted values
and actual values and hence represents a
good model.
- It is also called a coefficient of
determination, or coefficient of multiple
determination for multiple
12/15/2024 AI/COSC regression. 40
Classification
-Classification algorithms are used
when the output variable is
categorical, which means there are two
classes such as Yes-No, Male-Female,
True-false, etc.
– Random Forest
– Decision Trees
– Logistic Regression
– Support vector Machines

12/15/2024 AI/COSC 41
Classification Algorithm in Machine
Learning
- Classification algorithms can be
better understood using the below
diagram. In the below diagram, there
are two classes, class A and Class B.
These classes have features that are
similar to each other and dissimilar to
other classes.

12/15/2024 AI/COSC 42
Classification Algorithm in Machine
Learning
- The algorithm which implements the
classification on a dataset is known as a
classifier. There are two types of
Classifications:
Binary Classifier: If the classification
problem has only two possible outcomes,
then it is called as Binary Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or
NOT SPAM, CAT or DOG, etc.
Multi-class Classifier: If a classification
problem has more than two outcomes,
then it is called as Multi-class Classifier.
Example: Classifications of types of crops,
Classification of types of music.
12/15/2024 AI/COSC 43
Classification Algorithm in Machine
Learning
-Classification Algorithms can be further
divided into the Mainly two category:
Linear Models
• Logistic Regression
• Support Vector Machines
Non-linear Models
• K-Nearest Neighbours
• Kernel SVM
• Naïve Bayes
• Decision Tree Classification
• Random Forest Classification
12/15/2024 AI/COSC 44
Evaluating a Classification model
- Once our model is completed, it is
necessary to evaluate its performance;
either it is a Classification or
Regression model. So for evaluating a
Classification model, we have the
following ways:
1. Log Loss or Cross-Entropy Loss:
2. Confusion Matrix:

12/15/2024 AI/COSC 45
Log Loss or Cross-Entropy Loss

– It is used for evaluating the performance of a


classifier, whose output is a probability value
between the 0 and 1.
– For a good binary Classification model, the value of
log loss should be near to 0.
– The value of log loss increases if the predicted value
deviates from the actual value.
– The lower log loss represents the higher accuracy of
the model.
– For Binary classification, cross-entropy can be
calculated as:

12/15/2024 AI/COSC 46
Confusion matrix

- The confusion matrix provides us a matrix/table as


output and describes the performance of the model.
- It is also known as the error matrix.
- The matrix consists of predictions result in a
summarized form, which has a total number of correct
predictions and incorrect predictions. The matrix looks
like as below table:

12/15/2024 AI/COSC 47
K-Nearest Neighbor(KNN) Algorithm
for ML
- K-NN algorithm assumes the similarity
between the new case/data and available
cases and put the new case into the category
that is most similar to the available
categories.

12/15/2024 AI/COSC 48
How does K-NN
work?
-The K-NN working can be explained on the basis
of the below algorithm:
Step-1: Select the number K of the neighbors
Step-2: Calculate the Euclidean distance of K
number of neighbors
Step-3: Take the K nearest neighbors as per the
calculated Euclidean distance.
Step-4: Among these k neighbors, count the
number of the data points in each category.
Step-5: Assign the new data points to that category
for which the number of the neighbor is maximum.
Step-6: Our model is ready.
12/15/2024 AI/COSC 49
How does K-NN
work?
-Suppose we have a new data point and
we need to put it in the required category.
Consider the below image:

12/15/2024 AI/COSC 50
How does K-NN
work?
-Firstly, we will choose the number of
neighbors, so we will choose the k=5.
- Next, we will calculate the Euclidean
distance between the data points. The
Euclidean distance is the distance
between two points, which we have
already studied in geometry. It can be
calculated as:

12/15/2024 AI/COSC 51
How does K-NN
work?

- By calculating the Euclidean distance we got the


nearest neighbors, as three nearest neighbors in
category A and two nearest neighbors in category B.
Consider the below image:

12/15/2024 AI/COSC 52
How to select the value of K in the K-
NN?

- Below are some points to remember while


selecting the value of K in the K-NN algorithm:
- There is no particular way to determine the
best value for "K", so we need to try some
values to find the best out of them. The most
preferred value for K is 5.
- A very low value for K such as K=1 or K=2, can
be noisy and lead to the effects of outliers in
the model.
Large values for K are good, but it may find
some difficulties. AI/COSC
12/15/2024 53
Example K-NN?

12/15/2024 AI/COSC 54
Example K-NN?

12/15/2024 AI/COSC 55
Example K-NN?

12/15/2024 AI/COSC 56
Probability
– A well-known and well-understood framework for
uncertainty
– Clear semantics
– Provides principled answers for:
 Combining evidence
 Predictive & Diagnostic reasoning
 Incorporation of new evidence
– Intuitive (at some level) to human experts
– Can be learned

12/15/2024 AI/COSC 57
Frequency Interpretation
– Draw a ball from a jug containing n balls of the
same size, r red and s yellow.
– The probability that the proposition A = “the ball is
red” is true corresponds to the relative frequency
with which we expect to draw a red ball  P(A) = ?

12/15/2024 AI/COSC 58
Random Variables
– A proposition that takes the value True with
probability p and False with probability 1-p is a
random variable with distribution (p,1-p)
– If an urn contains balls having 3 possible colors – red,
yellow, and blue – the color of a ball picked at random
from the bag is a random variable with 3 possible
values
– The (probability) distribution of a random variable X
with n values x1, x2, …, xn is:
(p1, p2, …, pn) with P(X=xi) = pi and Si=1,…,n pi = 1

12/15/2024 AI/COSC 59
Bayesian Viewpoint
– Probability is "degree-of-belief", or "degree-of-
uncertainty".
– To the Bayesian, probability lies subjectively in
the mind, and can--with validity--be different for
people with different information
e.g., the probability that Wayne will get rich
from selling his kidney.
– In contrast, to the frequentist, probability lies
objectively in the external world.
– The Bayesian viewpoint has been gaining
popularity in the past decade, largely due to
the increase computational power that makes
many of the calculations that were previously
intractable, feasible.

12/15/2024 AI/COSC 60
Generalization
Bayes’ Rule:
Definition: P(A|B) =P(AB) / P(B)
P(A|B) is read as probability of A given B can also
write this as: P(AB) = P(A|B) P(B) called the
product rule
P(A  B) = P(A|B) P(B) = P(B|A) P(A)
P(A|B) P(B)
P(B|A) =
P(A)

P(A  B  C) = P(A|B,C) P(B|C) P(C)


P(A|B,C) P(B|C)
P(B|A,C) =
P(A|C)

12/15/2024 AI/COSC 61
Example

Toothach Toothach Total


e e
Cavity 0.04 0.06 0.1
Cavity 0.01 0.89 0.9
Total 0.05 0.95 1.o0

Given:
– P(Cavity)=0.1
– P(Toothache)=0.05
– P(Cavity|Toothache)=0.8
Bayes’ rule tells:
– P(Toothache|Cavity)=(0.8x0.05)/0.1= 0.4

12/15/2024 AI/COSC 63
Probabilistic(Statistical) Learning: What is Bayesian
Classification?

Baye’s Rule application in Detail


– Bayesian classifiers are statistical classifiers
– For each new sample they provide a probability that
the sample belongs to a class (for all classes)

12/15/2024 AI/COSC 64
Probabilistic(Statistical) Learning:
Bayes’ Theorem: Basics
– Let X be a data sample (“evidence”): class label is unknown
– Let H be a hypothesis that X belongs to class C
– Classification is to determine P(H|X), the probability that the
hypothesis holds given the observed data sample X
– P(H) (prior probability), the initial probability
E.g., X will buy computer, regardless of age, income, …
– P(X): probability that sample data is observed

- P(X|H) (posteriori probability), the probability of observing


the sample X, given that the hypothesis holds
 E.g., Given that X will buy computer, the prob. that X is 31..40,
medium income

12/15/2024 AI/COSC 65
Statistical Learning
Classification Part II: Bayes’ Theorem

Given training data X, posteriori probability of a


hypothesis H, P(H|X), follows the Bayes theorem
P(H | X) P(X | H ) P( H )
P(X)

12/15/2024 AI/COSC 66
Statistical Learning: Towards Naïve
Bayesian Classifiers
– Let D be a training set of tuples and their
associated class labels, and each tuple is
represented by an n-D attribute vector X = (x1,
x2, …, xn)
– Suppose there are m classes C1, C2, …, Cm.
– Classification is to derive the maximum
posteriori, i.e., the maximal P(Ci|X)
– This can be derived from Bayes’ theorem
– Since P(X) is constant for all classes, only
P(C | X) P(X | C ) P(C ) P(X | C )P(C )
needs to ibe maximized i i P(C | X) 
i
i i
P(X)

12/15/2024 AI/COSC 67
NBC: Training Dataset
credit buys
age income student _ratin _co
<=30 high no fair no
<=30 high no excellent no

Class: 31…40 high no fair yes


C1:buys_computer = >40 medium no fair yes
‘yes’ >40 low yes fair yes
C2:buys_computer = >40 low yes excellent no
‘no’
31…40 low yes excellent yes
Data sample <=30 medium no fair no
X = (age <=30, <=30 low yes fair yes
Income = medium, >40 medium yes fair yes
Student = yes
<=30 medium yes excellent yes
Credit_rating = Fair)
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no

12/15/2024 AI/COSC 68
NBC: An Example
P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357
Compute P(X|Ci) for each class
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4

X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044


P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) =
0.007

Therefore, X belongs to class (“buys_computer = yes”)


12/15/2024 AI/COSC 69
Naive Bayesian Classifier
Example play
2
Outlook Temperature tennis?
Humidity W indy Class
sunny hot high false N
sunny hot high true N
overcast hot high false P
rain mild high false P
rain cool normal false P
rain cool normal true N
overcast cool normal true P
sunny mild high false N
sunny cool normal false P
rain mild normal false P
sunny mild normal true P
overcast mild high true P
overcast hot normal false P
rain mild high true N

12/15/2024 AI/COSC 70
Naive Bayesian Classifier
Example
Outlook Temperature Humidity Windy Class
overcast hot high false P
rain mild high false P
rain cool normal false P
overcast cool normal true P
sunny cool normal false P 9
rain mild normal false P
sunny mild normal true P
overcast mild high true P
overcast hot normal false P

Outlook Temperature Humidity Windy Class


sunny hot high false N
sunny hot high true N
rain cool normal true N 5
sunny mild high false N
rain mild high true N

12/15/2024 AI/COSC 71
Naive Bayesian Classifier
Example
Given the training set, we compute the probabilities:

Outlook P N Humidity P N
sunny 2/9 3/5 high 3/9 4/5
overcast 4/9 0 normal 6/9 1/5
rain 3/9 2/5
Tempreature Windy
hot 2/9 2/5 true 3/9 3/5
mild 4/9 2/5 false 6/9 2/5
cool 3/9 1/5

We also have the probabilities


– P = 9/14
– N = 5/14

12/15/2024 AI/COSC 72
Naive Bayesian Classifier
Example
To classify a new sample X:
– outlook = sunny
– temperature = cool
– humidity = high
– windy = false
• Prob(P|X) = Prob(P)*Prob(sunny|P)*Prob(cool|P)*
Prob(high|P)*Prob(false|P) =
9/14*2/9*3/9*3/9*6/9 = 0.01
• Prob(N|X) = Prob(N)*Prob(sunny|N)*Prob(cool|N)*
Prob(high|N)*Prob(false|N) =
5/14*3/5*1/5*4/5*2/5 = 0.013
• Therefore X takes class label N

12/15/2024 AI/COSC 73
Naive Bayesian Classifier
Example
• Second example X = <rain, hot, high,
false>

• P(X|p)·P(p) =
P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p) =
3/9·2/9·3/9·6/9·9/14 = 0.010582
• P(X|n)·P(n) =
P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n) =
2/5·2/5·4/5·2/5·5/14 = 0.018286

• Sample X is classified in class N (don’t play)

12/15/2024 AI/COSC 74
Bayes solved example

12/15/2024 AI/COSC 75
Bayes solved example2

12/15/2024 AI/COSC 76
Classifier Evaluation Metrics
Accuracy, Error Rate, Sensitivity and
Specificity
A\P C ¬C
 Class Imbalance
C TP FN P Problem:
¬C FP TN N
 raOne class may be
P’ N’ All
rare, e.g. fraud, or HIV-
positive
 Significant majority of
• Classifier Accuracy, or the negative class and
recognition rate: percentage minority of the positive
of test set tuples that are class
correctly classified  Sensitivity: True
Accuracy = (TP + TN)/All Positive recognition rate
• Error rate: 1 – accuracy, or 
Sensitivity = TP/P
Error rate = (FP +  Specificity: True
FN)/All Negative recognition te

Specificity = TN/N
12/15/2024 AI/COSC 77
Classifier Evaluation Metrics: Confusion Matrix

Confusion Matrix:

Actual class\Predicted class C1 ¬ C1


C1 True Positives (TP) False Negatives (FN)
¬ C1 False Positives (FP) True Negatives (TN)

Example of Confusion Matrix:

Actual class\Predicted buy computer buy_computer Total


class = yes = no
buy_computer = yes 6954(0.6954) 46(0.0046) 7000(0.7)
buy_computer = no 412(0.0412) 2588(0.2588) 3000(0.3)
Total 7366(0.7366) 2634(0.2634) 10000(1.0)

• Given m classes, an entry, CMi,j in a confusion matrix indicates


# of tuples in class i that were labeled by the classifier as class j
• May have extra rows/columns to provide totals

12/15/2024 AI/COSC 7878


Classifier Evaluation Metrics: Example

Actual Class\Predicted cancer = cancer Total Recognition


class yes = no (%)
cancer = yes 90 210 300 30.00
(sensitivity
cancer = no 140 9560 9700 98.56
(specificity)
Total 230 9770 1000 96.50
0 (accuracy)
– Precision = 90/230 = 39.13% Recall= 90/300 = 30.00%

Recall = (a/(a+b)*100
Precision = (a/(a+c))*100
Positive Negative Total
Positive a b a+b
Negative c d c+d
Total a+c b+d a+b+c
+d
12/15/2024 AI/COSC 79 79
Unsupervised Leaning
- Unsupervised learning is a type of machine
learning in which models are trained using
unlabeled dataset and are allowed to act on
that data without any supervision
- The goal of unsupervised learning is to find the
underlying structure of dataset, group that data
according to similarities, and represent that dataset in a
compressed format.

12/15/2024 AI/COSC 80
Why use Unsupervised Learning?
-Below are some main reasons which describe the
importance of Unsupervised Learning:
– Unsupervised learning is helpful for finding
useful insights from the data.
– Unsupervised learning is much similar as a
human learns to think by their own experiences,
which makes it closer to the real AI.
– Unsupervised learning works on unlabeled and
uncategorized data which make unsupervised
learning more important.
– In real-world, we do not always have input data
with the corresponding output so to solve such
cases, we need unsupervised learning.
12/15/2024 AI/COSC 81
How Unsupervised Learning work?
-

12/15/2024 AI/COSC 82
Types Unsupervised Learning
work?
-The unsupervised learning algorithm can
be further categorized into two types of
problems:

12/15/2024 AI/COSC 83
Types Unsupervised Learning
work?
- Clustering: Clustering is a method of
grouping the objects into clusters such that
objects with most similarities remains into a
group and has less or no similarities with the
objects of another group. Cluster analysis
finds the commonalities between the data
objects and categorizes them as per the
presence and absence of those
commonalities.
- Association: An association rule is an
unsupervised learning method which is used
for finding the relationships between
variables in the large database.

12/15/2024 AI/COSC 84
Clustering
- is A way of grouping the data points into
different clusters, consisting of similar data
points. The objects with the possible similarities
remain in a group that has less or no similarities
with another group
- K-Means

12/15/2024 AI/COSC 85
K-means Clustering
- is an Unsupervised Learning algorithm, which groups
the unlabeled dataset into different clusters. Here K
defines the number of pre-defined clusters that need to
be created in the process, as if K=2, there will be two
clusters, and for K=3, there will be three clusters, and so
on.
– The k-means clustering algorithm mainly
performs two tasks:
• Determines the best value for K center points or
centroids by an iterative process.
• Assigns each data point to its closest k-center.
Those data points which are near to the
particular k-center, create a cluster.

12/15/2024 AI/COSC 86
How does K-means Clustering work
Step-1: Select the number K to decide the number of
clusters.
Step-2: Select random K points or centroids. (It can be
other from the input dataset).
Step-3: Assign each data point to their closest centroid,
which will form the predefined K clusters.
Step-4: Calculate the variance and place a new centroid
of each cluster.
Step-5: Repeat the third steps, which means reassign
each datapoint to the new closest centroid of each
cluster.
Step-6: If any reassignment occurs, then go to step-4
else go to FINISH.
Step-7: The model is ready.
12/15/2024 AI/COSC 87
Reinforcement learning
-Reinforcement Learning is a type of machine
learning where an agent learns how to behave in
an environment by performing actions and
receiving rewards or penalties in return.
- The agent's objective is to learn a policy, which
is a strategy that maps states to actions, in order
to maximize a cumulative reward over time.

12/15/2024 AI/COSC 88
Reinforcement learning
-Example: Suppose there is an AI agent present within a
maze environment, and his goal is to find the diamond.
The agent interacts with the environment by performing
some actions, and based on those actions, the state of
the agent gets changed, and it also receives a reward or
penalty as feedback.

12/15/2024 AI/COSC 89
Approaches to implement RL
-There are mainly three ways to implement reinforcement-
learning in ML, which are:
Value-based: The value-based approach is about to find the
optimal value function, which is the maximum value at a
state under any policy. Therefore, the agent expects the long-
term return at any state(s) under policy π.
Policy-based: Policy-based approach is to find the optimal
policy for the maximum future rewards without using the
value function. In this approach, the agent tries to apply such
a policy that the action performed in each step helps to
maximize the future reward.
Model-based: In the model-based approach, a virtual model
is created for the environment, and the agent explores that
environment to learn it. There is no particular solution or
algorithm for this approach because the model
representation is differentAI/COSC
12/15/2024
for each environment. 90
Real-world Use cases of RL
-Video Games:
RL algorithms are much popular in gaming applications. It is used to
gain super-human performance. Some popular games that use RL
algorithms are AlphaGO and AlphaGO Zero.
Resource Management:
The "Resource Management with Deep Reinforcement Learning" paper
showed that how to use RL in computer to automatically learn and
schedule resources to wait for different jobs in order to minimize
average job slowdown.
Robotics:
RL is widely being used in Robotics applications. Robots are used in the
industrial and manufacturing area, and these robots are made more
powerful with reinforcement learning. There are different industries
that have their vision of building intelligent robots using AI and Machine
learning technology.
Text Mining Text-mining, one of the great applications of NLP, is now
being implemented with the help of Reinforcement Learning by
Salesforce company.
12/15/2024 AI/COSC 91
Deep learning
– Deep learning is a subset of machine learning,
which is a broader field of artificial intelligence
(AI). It involves the use of artificial neural
networks to model and solve complex problems.
– Deep learning algorithms attempt to simulate the
human brain's architecture, which is composed of
interconnected nodes (neurons) organized in
layers.
– These neural networks can learn and make
intelligent decisions from data.

12/15/2024 AI/COSC 92
Deep learning

12/15/2024 AI/COSC 93
why DL may be advantageous over
ML
1. Feature Learning: Deep learning models can
automatically learn hierarchical representations
of data.
- Traditional machine learning models often require
manual feature engineering, where experts need to
extract relevant features from the data.
- Deep learning algorithms, on the other hand, can
automatically learn useful features from raw data,
reducing the need for handcrafted features.

12/15/2024 AI/COSC 94
Why DL may be advantageous over
ML
2. Complex Data Representations: Deep neural
networks, especially deep convolutional neural
networks (CNNs) and recurrent neural networks
(RNNs), are well-suited for handling complex data
types like images, speech, and sequential data.
3. Scalability: Deep learning models can scale with
the size of the data.
4. End-to-End Learning: Deep learning models can
learn end-to-end mappings from input to output,

12/15/2024 AI/COSC 95
Artificial Neural Networks
– Artificial Neural Network is biologically inspired
by the neural network, which constitutes after
the human brain.
– Neural networks are modeled in accordance with
the human brain so as to imitate their
functionality.
– The human brain can be defined as a neural
network that is made up of several neurons, so is
the Artificial Neural Network is made of
numerous perceptron.

12/15/2024 AI/COSC 96
Artificial Neural Networks
– A neural network comprises of three main layers,
which are as follows;
– Input layer: The input layer accepts all the inputs
that are provided by the programmer.
– Hidden layer: In between the input and output
layer, there is a set of hidden layers on which
computations are performed that further results
in the output.
– Output layer: After the input layer undergoes a
series of transformations while passing through
the hidden layer, it results in output that is
delivered by the output layer.
12/15/2024 AI/COSC 97
Artificial Neural Networks

12/15/2024 AI/COSC 98
Motivation behind Neural Networks
– Basically, the neural network is based on the
neurons, which are nothing but the brain cells.
– A biological neuron receives input from other
sources, combines them in some way, followed
by performing a nonlinear operation on the
result, and the output is the final result.

12/15/2024 AI/COSC 99
Neural Networks
– Instead of directly getting into the working of
Artificial Neural Networks, lets breakdown and
try to understand Neural Network's basic unit,
which is called a Perceptron.
– So, a perceptron can be defined as a neural
network with a single layer that classifies the
linear data. It further constitutes four major
components, which are as follows;
– Inputs
– Weights and Bias
– Summation Functions
– Activation or transformation function

12/15/2024 AI/COSC 100


Neural Networks
– The main logic behind the concept of Perceptron is as
follows:
– The inputs (x) are fed into the input layer, which
undergoes multiplication with the allotted weights (w)
followed by experiencing addition in order to form
weighted sums. Then these inputs weighted sums with
their corresponding weights are executed on the
pertinent activation function.

12/15/2024 AI/COSC 101


Neural Networks
– Weights and Bias: As and when the input variable is fed
into the network, a random value is given as a weight of
that particular input, such that each individual weight
represents the importance of that input in order to make
correct predictions of the result.
– However, bias helps in the adjustment of the curve of
activation function so as to accomplish a precise output.
– Summation Function:- After the weights are assigned to
the input, it then computes the product of each input and
weights. Then the weighted sum is calculated by the
summation function in which all of the products are
added.

12/15/2024 AI/COSC 102


Neural Networks
– Activation Function: The main objective of the activation
function is to perform a mapping of a weighted sum upon
the output. The transformation function comprises of
activation functions such as tanh, ReLU, sigmoid, etc.
– The activation function is categorized into two main parts:
– Linear Activation Function
– Non-Linear Activation Function
– Gradient Descent Algorithm:- an optimization algorithm
that is utilized to minimize the cost function used in
various machine learning algorithms so as to update the
parameters of the learning model. In linear regression,
these parameters are coefficients, whereas, in the neural
network, they are weights.

12/15/2024 AI/COSC 103


Back propagations
– The backpropagation consists of an input layer of
neurons, an output layer, and at least one hidden
layer.
– The neurons perform a weighted sum upon the
input layer, which is then used by the activation
function as an input, especially by the sigmoid
activation function.
– It also makes use of supervised learning to teach
the network. It constantly updates the weights of
the network until the desired output is met by the
network.
– It includes the following factors that are
responsible for the training and performance of the
network:
12/15/2024 AI/COSC 104
Back propagations
– It includes the following factors that are
responsible for the training and performance of
the network:
– Random (initial) values of weights.
– A number of training cycles.
– A number of hidden neurons.
– The training set.
– Teaching parameter values such as learning rate and
momentum.

12/15/2024 AI/COSC 105


Working of Back propagations

12/15/2024 AI/COSC 106


Working of Back propagations
– The preconnected paths transfer the inputs X.
– Then the weights W are randomly selected, which are
used to model the input.
– After then, the output is calculated for every individual
neuron that passes from the input layer to the hidden
layer and then to the output layer.
– Lastly, the errors are evaluated in the outputs. ErrorB=
Actual Output - Desired Output
– The errors are sent back to the hidden layer from the
output layer for adjusting the weights to lessen the error.
– Until the desired result is achieved, keep iterating all of
the processes.

12/15/2024 AI/COSC 107


Convolutions neural network
– Convolutional Neural Networks are a
special type of feed-forward artificial neural
network in which the connectivity pattern
between its neuron is inspired by the visual
cortex.

12/15/2024 AI/COSC 108


Working Convolutions neural network
– Generally, a Convolutional Neural Network has three
layers, which are as follows;
– Input: If the image consists of 32 widths, 32 height
encompassing three R, G, B channels, then it will hold
the raw pixel([32x32x3]) values of an image.
– Convolution: It computes the output of those neurons,
which are associated with input's local regions, such that
each neuron will calculate a dot product in between
weights and a small region to which they are actually
linked to in the input volume. For example, if we choose
to incorporate 12 filters, then it will result in a volume of
[32x32x12].
– ReLU Layer: It is specially used to apply an activation
function elementwise, like as max (0, x) thresholding at
zero. It results in ([32x32x12]), which relates to an
unchanged size of the volume.
– Pooling: This layer is used to perform a downsampling
operation along the spatial dimensions (width, height)
12/15/2024 that results in [16x16x12] volume.
AI/COSC 109
Working Convolutions neural network
– Pooling: This layer is used to perform a downsampling
operation along the spatial dimensions (width,
height) that results in [16x16x12] volume.

– Locally Connected: It can be defined as a regular


neural network layer that receives an input from the
preceding layer followed by computing the class
scores and results in a 1-Dimensional array that has
the equal size to that of the number of classes.

12/15/2024 AI/COSC 110


Working Convolutions neural network
– Locally Connected: It can be defined as a regular
neural network layer that receives an input from the
preceding layer followed by computing the class
scores and results in a 1-Dimensional array that has
the equal size to that of the number of classes.

12/15/2024 AI/COSC 111


Recurrent neural network and LSTMs
– Recurrent Neural Networks (RNNs) are a type of artificial
neural network designed for sequence modeling and
processing.
– Unlike traditional feedforward neural networks, RNNs
have connections that form directed cycles, allowing
them to maintain a hidden state that captures
information about previous inputs in a sequence.
– This makes RNNs particularly well-suited for tasks
involving sequential data, such as time series analysis,
natural language processing (NLP), speech recognition,
and more.

12/15/2024 AI/COSC 112


Recurrent neural network and LSTMs
– Recurrent Neural Networks (RNNs) are a type of artificial
neural network designed for sequence modeling and
processing.
– Unlike traditional feedforward neural networks, RNNs
have connections that form directed cycles, allowing
them to maintain a hidden state that captures
information about previous inputs in a sequence.
– This makes RNNs particularly well-suited for tasks
involving sequential data, such as time series analysis,
natural language processing (NLP), speech recognition,
and more.

12/15/2024 AI/COSC 113


Recurrent neural network and LSTMs

12/15/2024 AI/COSC 114


Projects
– Movie Recommendations with Movielens Dataset
– Sales Forecasting with Walmart
– Breast Cancer Prediction
– Iris Classification
– Turning Handwritten Documents into Digitized Versions
– Traffic Prediction
– Product recommendations
– Spam and Malware Filtering
– Personal Assistant
– Stock Market trading
– Fraud and Preference
– Web scraping

12/15/2024 AI/COSC 115

You might also like