0% found this document useful (0 votes)
10 views110 pages

ML - Unit 1

The document provides an introduction to Machine Learning (ML), defining it as a subset of artificial intelligence that enables computers to learn from data and make predictions without explicit programming. It discusses various applications of ML, including image and speech recognition, traffic prediction, product recommendations, and medical diagnosis, highlighting its advantages over traditional programming. Additionally, it covers essential concepts such as training and testing datasets, classification accuracy, and cross-validation methods.

Uploaded by

darshanakadu786
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views110 pages

ML - Unit 1

The document provides an introduction to Machine Learning (ML), defining it as a subset of artificial intelligence that enables computers to learn from data and make predictions without explicit programming. It discusses various applications of ML, including image and speech recognition, traffic prediction, product recommendations, and medical diagnosis, highlighting its advantages over traditional programming. Additionally, it covers essential concepts such as training and testing datasets, classification accuracy, and cross-validation methods.

Uploaded by

darshanakadu786
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 110

MACHINE LEARNING

Poonam Girish Fegade

MCA

K.K.Wagh Institute of Engineering Education & Research


Nashik
MACHINE LEARNING INTRODUCTION

DSA
Machine Learning Introduction

▪ Arthur Samuel, coined the term “Machine


Learning”. He defined machine learning
as – a “Field of study that gives
computers the capability to learn without
being explicitly programmed”.
▪ Machine Learning is a branch of artificial
intelligence that develops algorithms by
learning the hidden patterns of the
datasets used it to make predictions on
new similar type data, without being
explicitly programmed for each task.
Machine Learning Introduction
▪ Automating automation
▪ Getting computers to
program themselves
▪ Writing software is the
bottleneck
▪ Let the data do the work
instead!
Traditional Programming

Data
Computer Output
Program

Machine Learning

Data
Computer Program
Output
Machine Learning Traditional Programming
Machine Learning is a subset of In traditional programming, rule-
artificial intelligence(AI) that focus based code is written by the
on learning from data to develop developers depending on the
an algorithm that can be used to problem statements
make a prediction.
Machine Learning uses a data- Traditional programming is
driven approach, It is typically typically rule-based and
trained on historical data and then deterministic. It hasn’t self-learning
used to make predictions on new features like Machine Learning
data. and AI.
Machine Learning Traditional Programming
ML can find patterns and insights Traditional programming is totally
in large datasets that might be dependent on the intelligence of
difficult for humans to discover developers. So, it has very limited
capability
Machine Learning is the subset of Traditional programming is often
AI. And Now it is used in various used to build applications and
AI-based tasks like Chatbot software systems that have
Question answering, self-driven specific functionality
car., etc.

DSA
When Do We Use Machine Learning?

• Human expertise does not exist (navigating on Mars)


• Humans can’t explain their expertise (speech recognition)
• Models must be customized (personalized medicine)
• Models are based on huge amounts of data (genomics)

5
A classic example of a task that requires machine learning: It is very hard to
say what makes a 2

6
Slide credit: Geoffrey Hinton
Applications of Machine learning
Some more examples of tasks that are best solved by
using a learning algorithm

• Recognizing patterns:
– Facial identities or facial expressions
– Handwritten or spoken words
– Medical images
• Generating patterns:
– Generating images or motion sequences
• Recognizing anomalies:
– Unusual credit card transactions
– Unusual patterns of sensor readings in a nuclear power plant
• Prediction:
– Future stock prices or currency exchange rates

7
1. Image Recognition

▪ Image recognition is one of the most common applications of machine


learning.
▪ It is used to identify objects, persons, places, digital images, etc. The
popular use case of image recognition and face detection is, Automatic
friend tagging suggestion:
▪ Facebook provides us a feature of auto friend tagging suggestion.
Whenever we upload a photo with our Facebook friends, then we
automatically get a tagging suggestion with name, and the technology
behind this is machine learning's face detection and recognition
algorithm.
▪ It is based on the Facebook project named "Deep Face," which is responsible for face
recognition and person identification in the picture.
2. Speech Recognition

▪ While using Google, we get an option of "Search by voice," it


comes under speech recognition, and it's a popular application of
machine learning.
▪ Speech recognition is a process of converting voice instructions
into text, and it is also known as "Speech to text", or "Computer
speech recognition."
▪ At present, machine learning algorithms are widely used by various
applications of speech recognition. Google assistant, Siri, Cortana,
and Alexa are using speech recognition technology to follow the
voice instructions.
3. Traffic prediction

▪ If we want to visit a new place, we take help of Google Maps, which


shows us the correct path with the shortest route and predicts the
traffic conditions.
▪ It predicts the traffic conditions such as whether traffic is cleared, slow-
moving, or heavily congested with the help of two ways:
▪ Real Time location of the vehicle form Google Map app and sensors
▪ Average time has taken on past days at the same time.
▪ Everyone who is using Google Map is helping this app to make it better.
It takes information from the user and sends back to its database to
improve the performance.
4. Product recommendations

▪ Machine learning is widely used by various e-commerce and


entertainment companies such as Amazon, Netflix, etc., for product
recommendation to the user. Whenever we search for some product
on Amazon, then we started getting an advertisement for the same
product while internet surfing on the same browser and this is because
of machine learning.
▪ Google understands the user interest using various machine learning
algorithms and suggests the product as per customer interest.
▪ As similar, when we use Netflix, we find some recommendations for
entertainment series, movies, etc., and this is also done with the help
of machine learning.
5. Self-driving cars

▪ One of the most exciting applications of machine learning is self-


driving cars. Machine learning plays a significant role in self-driving
cars.
▪ Tesla, the most popular car manufacturing company is working on
self-driving car.
▪ It is using unsupervised learning method to train the car models to
detect people and objects while driving.
6. Email Spam and Malware Filtering

▪ Whenever we receive a new email, it is filtered automatically as


important, normal, and spam. We always receive an important
mail in our inbox with the important symbol and spam emails in
our spam box, and the technology behind this is Machine learning.
Below are some spam filters used by Gmail:
▪ Content Filter ,Header filter ,General blacklists filter , Rules-based
filters , Permission filters
▪ Some machine learning algorithms such as Multi-Layer
Perceptron, Decision tree, and Naïve Bayes classifier are used for
email spam filtering and malware detection.
7. Online Fraud Detection

▪ Machine learning is making our online transaction safe and secure by


detecting fraud transaction. There may be various ways that a
fraudulent transaction can take place such as fake accounts, fake ids,
and steal money in the middle of a transaction. So to detect this, Feed
Forward Neural network helps us by checking whether it is a genuine
transaction or a fraud transaction.
▪ For each genuine transaction, the output is converted into some hash
values, and these values become the input for the next round. For each
genuine transaction, there is a specific pattern which gets change for
the fraud transaction hence, it detects it and makes our online
transactions more secure.
8. Stock Market trading

▪ Machine learning is widely used in stock market trading. In


the stock market, there is always a risk of up and downs in
shares, so for this machine learning's long short term
memory neural network is used for the prediction of stock
market trends.
9. Medical Diagnosis

▪ In medical science, machine learning is used for diseases


diagnoses. With this, medical technology is growing very fast and
able to build 3D models that can predict the exact position of
lesions in the brain.
▪ It helps in finding brain tumors and other brain-related diseases
easily.
10. Automatic Language Translation

▪ machine learning helps us by converting the text into our


known languages.
▪ Google's GNMT (Google Neural Machine Translation)
provide this feature, which is a Neural Machine Learning
that translates the text into our familiar language, and it
called as automatic translation.
Defining the Learning Task
Improve on task T, with respect to performance metric P,
based on experience E
T: Playing checkers
P: Percentage of games won against an arbitrary opponent
E: Playing practice games against itself
T: Recognizing hand-written words
P: Percentage of words correctly classified
E: Database of human-labeled images of handwritten words

T: Categorize email messages as spam or legitimate.


P: Percentage of email messages correctly classified.
E: Database of emails, some with human-given labels

10
Slide credit: Ray Mooney
Ingredients of Machine Learning

1.Task : The problems that can be solved with machine learning.


Task is an abstract representation of problem.
▪ Here Large problems are broken into small, reasonably independent sub-problems
that are learned separately and then recombined.
▪ Predictive task performs inference on current data in order to make prediction.
▪ Descriptive task characterize the general properties of the data in the database
2. Models:
▪ The output of machine learning. different models are geometric models , probabilistic
models, logical models, grouping and grading.
▪ Instead of transforming problem to fit some algorithm, in model-based machine
learning you design algorithm to fit your problem.

3.Features:
▪ good feature representation is central to achieve high performance in machine learning.
▪ Feature selection is process of choosing subset of feature from the original features so
that feature space is optimally reduced according to a certain criteria.
How does Machine Learning work

▪ A Machine Learning system learns from historical data, builds the prediction models, and
whenever it receives new data, predicts the output for it. The accuracy of predicted output
depends upon the amount of data, as the huge amount of data helps to build a better model
which predicts the output more accurately.
TRAINING AND TESTING DATASET

DSA
Training Datatset
▪ The training data is the biggest (in -size) subset of the original dataset, which is used to train
or fit the machine learning model
▪ The training data is fed to the ML algorithms, which lets them learn how to make predictions
for the given task.
▪ For example, for training a sentiment analysis model, the training data could be as below:
Input Ouput(Lables)
The New UI is Great Positive
Update is really Slow Negative

▪ For Unsupervised learning, the training data contains unlabeled data points, for supervised
learning, the training data contains labels in order to train the model and make predictions.
▪ better the quality of the training data, the better will be the performance of the model.
Training data is approximately more than or equal to 60% of the total data for an ML project.
Test Dataset

▪ The test dataset is another subset of original data, which is independent of the
training dataset.
▪ Once we train the model with the training dataset, it's time to test the model with
the test dataset. testing data is used to check the accuracy of the model.
▪ This dataset evaluates the performance of the model and ensures that the model
can generalize well with the new or unseen dataset
▪ the test dataset is approximately 20-25% of the total original data for an ML
project.
▪ The general ratios of splitting train and test datasets are 80:20, 70:30, or 90:10.
POSITIVE AND NEGATIVE CLASS

DSA
The Boy Who Cried Wolf

▪ A shepherd boy gets bored tending the town's flock. To have some
fun, he cries out, "Wolf!" even though no wolf is in sight. The
villagers run to protect the flock, but then get really mad when they
realize the boy was playing a joke on them.

▪ One night, the shepherd boy sees a real wolf approaching the flock
and calls out, "Wolf!" The villagers refuse to be fooled again and
stay in their houses. The hungry wolf turns the flock into lamb
chops. The town goes angry. Panic ensues.
Let's make the following definitions:
• "Wolf" is a positive class.
• "No wolf" is a negative class.
"wolf-prediction" model using a 2x2 confusion matrix that depicts all four
possible outcomes:

True Positive (TP): False Positive (FP):


Reality: A wolf threatened. Reality: No wolf threatened.
Shepherd said: "Wolf." Shepherd said: "Wolf."
Outcome: Shepherd is a hero. Outcome: Villagers are angry at
shepherd for waking them up.
False Negative (FN): True Negative (TN):
Reality: A wolf threatened. Reality: No wolf threatened.
Shepherd said: "No wolf." Shepherd said: "No wolf."
Outcome: The wolf ate all the Outcome: Everyone is fine.
sheep.
▪ A true positive is an outcome where the model correctly predicts the positive class.
Similarly, a true negative is an outcome where the model correctly predicts
the negative class.

▪ A false positive is an outcome where the model incorrectly predicts the positive class.
And a false negative is an outcome where the model incorrectly predicts
the negative class.
Let's try calculating accuracy for the following model that classified
100 tumors as the positive class or the negative class

True Positive (TP): False Positive (FP):


Reality: Malignant Reality: Benign
ML model predicted: Malignant ML model predicted: Malignant
Number of TP results: 1 Number of FP results: 1

False Negative (FN): True Negative (TN):


Reality: Malignant Reality: Benign
ML model predicted: Benign ML model predicted: Benign
Number of FN results: 8 Number of TN results: 90
Classification: Accuracy
Accuracy is one metric for evaluating classification models. Informally, accuracy is
the fraction of predictions our model got right. Formally, accuracy has the following
definition:
Accuracy= Number of correct prediction
Total number of predictions

For binary classification, accuracy can also be calculated in terms of positives and
negatives as follows:
Accuracy= TP+TN
TP+TN+FP+FN

Where TP = True Positives, TN = True Negatives, FP = False Positives, and FN =


False Negatives.
▪ Accuracy = TP+TN
TP+TN+FP+FN
1+90
= 1+90+1+8
= 0.91
CROSS VALIDATION

DSA
Validation

▪ In this method, we perform training on the 50% of the given data-set and rest
50% is used for the testing purpose.
▪ The major drawback of this method is that we perform training on the 50% of
the dataset, it may possible that the remaining 50% of the data contains some
important information which we are leaving while training our model. i.e
higher bias
Leave P Out Cross Validation
▪ In this approach, the p datasets are left out of the training data. It means, if there
are total n datapoints in the original input dataset, then n-p data points will be
used as the training dataset and the p data points as the validation set.
▪ This is repeated for all combinations, and then the error is averaged.
Pros
▪ It has Zero randomness
▪ The Bias will be lower
Cons
▪ This method is exhaustive and computationally infeasible.

DSA
LOOCV (Leave One Out Cross Validation)
▪ This method is similar to the leave-p-out cross-validation, but instead of p, we need to
take 1 dataset out of training.
▪ for each learning set, only one datapoint is reserved, and the remaining dataset is used to
train the model. This process repeats for each datapoint. Hence for n samples, we get n
different training set and n test set.
▪ An advantage of using this method is that we make use of all data points and hence it is
low bias.
▪ The major drawback of this method is that it leads to higher variation in the testing
model as we are testing against one data point. If the data point is an outlier it can lead
to higher variation.
▪ Another drawback is it takes a lot of execution time as it iterates over ‘the number of
data points’ times

DSA
K-Fold Cross Validation
▪ In this method, we split the data-set into k number of subsets(known as folds)
then we perform training on the all the subsets but leave one(k-1) subset for the
evaluation of the trained model. In this method, we iterate k times with a
different subset reserved for testing purpose each time.
▪ Here, we have total 25 instances. In first iteration we use the first 20 percent of
data for evaluation, and the remaining 80 percent for training([1-5] testing and
[5-25] training) while in the second iteration we use the second subset of 20
percent for evaluation, and the remaining three subsets of the data for
training([5-10] testing and [1-5 and 10-25]
DSA
▪ Total instances: 25 No. Training set observations Testing set
▪ Value of k : 5 Iterat observation
ion s
1 [ 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 [0 1 2 3 4]
21 22 23 24]
2 [ 0 1 2 3 4 10 11 12 13 14 15 16 17 18 19 20 [5 6 7 8 9]
21 22 23 24]
3 [ 0 1 2 3 4 5 6 7 8 9 15 16 17 18 19 20 21 22 [10 11 12
23 24] 13 14]
4 [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 20 21 22 [15 16 17
23 24] 18 19]
5 [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 [20 21 22
18 19] 23 24]
▪ Pros
▪ This will help to overcome the problem of computational power.
▪ Models may not be affected much if an outlier is present in data.
▪ It helps us overcome the problem of variability.
▪ Cons
▪ Imbalanced data sets will impact our model.

DSA
Stratified K-Fold Cross-Validation

▪ K Fold Cross Validation technique will not work as expected for an Imbalanced
Data set.
▪ Slight change to the K Fold cross validation technique, such that each fold
contains approximately the same strata of samples of each output class as the
complete. This variation of using a stratum in K Fold Cross Validation is known as
Stratified K Fold Cross Validation.
▪ It helps in reducing both Bias and Variance.
▪ Let the population for that state be 51.3% male and 48.7% female, Then for
choosing 1000 people from that state if you pick 513 male ( 51.3% of 1000 ) and
487 female ( 48.7% for 1000 ) i.e 513 male + 487 female (Total=1000 people) to
ask their opinion. Then these groups of people represent the entire state. This is
called Stratified Sampling.
TYPES OF LEARNING

DSA
Types of Learning

▪ Supervised (inductive) learning


▪ Training data includes desired outputs

▪ Unsupervised learning
▪ Training data does not include desired outputs

▪ Semi-supervised learning
▪ Training data includes a few desired outputs

▪ Reinforcement learning
▪ Rewards from sequence of actions
Supervised learning

Supervised machine learning is based on supervision. It means in the supervised


learning technique, we train the machines using the "labelled" dataset, and
based on the training, the machine predicts the output.
Here, the labelled data specifies that some of the inputs are already mapped to the
output. More preciously, we can say; first, we train the machine with the input
and corresponding output, and then we ask the machine to predict the output
using the test dataset.
Let's understand supervised learning with an example. Suppose we
have an input dataset of cats and dog images. So, first, we will
provide the training to the machine to understand the images, such
as the shape & size of the tail of cat and dog, Shape of eyes,
colour, height (dogs are taller, cats are smaller), etc.
After completion of training, we input the picture of a cat and ask the
machine to identify the object and predict the output. Now, the
machine is well trained, so it will check all the features of the
object, such as height, shape, colour, eyes, ears, tail, etc., and find
that it's a cat. So, it will put it in the Cat category. This is the
process of how the machine identifies the objects in Supervised
Learning.
Categories of Supervised Machine Learning

▪ Supervised machine learning can be classified into two


types of problems, which are given below:

• Classification
• Regression
Classification

▪ Classification algorithms are used to solve the classification


problems in which the output variable is categorical, such as "Yes"
or No, Male or Female, Red or Blue, etc. The classification
algorithms predict the categories present in the dataset. Some
real-world examples of classification algorithms are Spam
Detection, Email filtering, etc.
▪ Some popular classification algorithms are given below:
• Random Forest Algorithm
• Decision Tree Algorithm
• Logistic Regression Algorithm
• Support Vector Machine Algorithm
Regression

▪ Regression algorithms are used to solve regression problems in


which there is a linear relationship between input and output
variables. These are used to predict continuous output variables,
such as market trends, weather prediction, etc.
▪ Some popular Regression algorithms are given below:
• Simple Linear Regression Algorithm
• Multivariate Regression Algorithm
• Lasso Regression
Advantages and Disadvantages of Supervised Learning

▪ Advantages:
• Since supervised learning work with the labelled dataset so we can have an
exact idea about the classes of objects.
• These algorithms are helpful in predicting the output on the basis of prior
experience.

▪ Disadvantages:
• These algorithms are not able to solve complex tasks.
• It may predict the wrong output if the test data is different from the training
data.
• It requires lots of computational time to train the algorithm.
Applications of Supervised Learning

• Image Segmentation: Supervised Learning algorithms are used in image


segmentation. In this process, image classification is performed on different
image data with pre-defined labels.
• Medical Diagnosis: Supervised algorithms are also used in the medical field for
diagnosis purposes. It is done by using medical images and past labeled data
with labels for disease conditions. With such a process, the machine can identify
a disease for the new patients.
• Fraud Detection - Supervised Learning classification algorithms are used for
identifying fraud transactions, fraud customers, etc. It is done by using historic
data to identify the patterns that can lead to possible fraud.
• Spam detection - In spam detection & filtering, classification algorithms are
used. These algorithms classify an email as spam or not spam. The spam emails
are sent to the spam folder.
• Speech Recognition - Supervised learning algorithms are also used in speech
recognition. The algorithm is trained with voice data, and various identifications
can be done using the same, such as voice-activated passwords, voice
commands, etc.
2. Unsupervised Machine Learning

▪ Unsupervised learning is different from the Supervised learning


technique, there is no need for supervision. It means, in
unsupervised machine learning, the machine is trained using the
unlabeled dataset, and the machine predicts the output without
any supervision.
▪ In unsupervised learning, the models are trained with the data that
is neither classified nor labelled, and the model acts on that data
without any supervision.
▪ The main aim of the unsupervised learning algorithm is to group
or categories the unsorted dataset according to the similarities,
patterns, and differences. Machines are instructed to find the
hidden patterns from the input dataset.
▪ Let's take an example to understand it more preciously;
suppose there is a basket of fruit images, and we input it
into the machine learning model. The images are totally
unknown to the model, and the task of the machine is to
find the patterns and categories of the objects.
▪ So, now the machine will discover its patterns and
differences, such as color difference, shape difference,
and predict the output when it is tested with the test
dataset.
Categories of Unsupervised Machine Learning

▪ Unsupervised Learning can be further classified into two


types, which are given below:
• Clustering
• Association
1) Clustering

▪ The clustering technique is used when we want to find the inherent


groups from the data. It is a way to group the objects into a cluster
such that the objects with the most similarities remain in one group
and have fewer or no similarities with the objects of other groups.
An example of the clustering algorithm is grouping the customers
by their purchasing behavior.
▪ Some of the popular clustering algorithms are given below:
• K-Means Clustering algorithm
• DBSCAN Algorithm
• Principal Component Analysis
• Independent Component Analysis
2) Association

▪ Association rule learning is an unsupervised learning technique,


which finds interesting relations among variables within a large
dataset. The main aim of this learning algorithm is to find the
dependency of one data item on another data item and map those
variables accordingly so that it can generate maximum profit. This
algorithm is mainly applied in Market Basket analysis, Web usage
mining, continuous production, etc.
▪ Some popular algorithms of Association rule learning are Apriori
Algorithm, Eclat, FP-growth algorithm.
Advantages and Disadvantages of Unsupervised Learning Algorithm

▪ Advantages:
• These algorithms can be used for complicated tasks compared
to the supervised ones because these algorithms work on the
unlabeled dataset.
• Unsupervised algorithms are preferable for various tasks as
getting the unlabeled dataset is easier as compared to the
labelled dataset.
▪ Disadvantages:
• The output of an unsupervised algorithm can be less accurate
as the dataset is not labelled, and algorithms are not trained
with the exact output in prior.
• Working with Unsupervised learning is more difficult as it works
with the unlabelled dataset that does not map with the output.
Applications of Unsupervised Learning
• Network Analysis: Unsupervised learning is used for identifying
plagiarism and copyright in document network analysis of text data
for scholarly articles.
• Recommendation Systems: Recommendation systems widely use
unsupervised learning techniques for building recommendation
applications for different web applications and e-commerce
websites.
• Anomaly Detection: Anomaly detection is a popular application of
unsupervised learning, which can identify unusual data points
within the dataset. It is used to discover fraudulent transactions.
3. Semi-Supervised Learning

▪ Semi-Supervised learning is a type of Machine Learning


algorithm that lies between Supervised and Unsupervised
machine learning.
▪ It uses the combination of labelled and unlabeled datasets during
the training period.
▪ Semi-supervised learning is particularly useful when there is a large
amount of unlabeled data available, but it’s too expensive or difficult to
label all of it.
▪ To overcome the drawbacks of supervised learning and
unsupervised learning algorithms, the concept of Semi-
supervised learning is introduced.
▪ The main aim of semi-supervised learning is to effectively use all
the available data, rather than only labelled data like in supervised
learning. Initially, similar data is clustered along with an
unsupervised learning algorithm, and further, it helps to label the
unlabeled data into labelled data. It is because labelled data is a
comparatively more expensive acquisition than unlabeled data.
▪ We may imagine the three types of learning algorithms as
▪ Supervised learning where a student is under the supervision of a teacher at both
home and school,

▪ Unsupervised learning where a student has to figure out a concept himself

▪ Semi-Supervised learning where a teacher teaches a few concepts in class and


gives questions as homework which are based on similar concepts.

DSA
▪ Advantages and disadvantages of Semi-supervised Learning
▪ Advantages:
• It is simple and easy to understand the algorithm.
• It is highly efficient.
• It is used to solve drawbacks of Supervised and Unsupervised
Learning algorithms.
▪ Disadvantages:
• Iterations results may not be stable.
• We cannot apply these algorithms to network-level data.
• Accuracy is low.
4. Reinforcement Learning

▪ Reinforcement learning works on a feedback-based


process, in which an AI agent (A software component)
automatically explore its surrounding by hitting &
trail, taking action, learning from experiences, and
improving its performance. Agent gets rewarded for each
good action and get punished for each bad action; hence
the goal of reinforcement learning agent is to maximize the
rewards.
▪ In reinforcement learning, there is no labelled data like
supervised learning, and agents learn from their
experiences only.
▪ The reinforcement learning process is similar to a human being; for
example, a child learns various things by experiences in his day-to-
day life. An example of reinforcement learning is to play a game,
where the Game is the environment, moves of an agent at each step
define states, and the goal of the agent is to get a high score. Agent
receives feedback in terms of punishment and rewards.
▪ Due to its way of working, reinforcement learning is employed in
different fields such as Game theory, Operation Research,
Information theory, multi-agent systems.

DSA
Example: The problem is as follows: We have an agent and a reward, with
many hurdles in between. The agent is supposed to find the best possible
path to reach the reward. The following problem explains the problem more
easily.
Categories of Reinforcement Learning

▪ Reinforcement learning is categorized mainly into two types of


methods/algorithms:

• Positive Reinforcement Learning: Positive reinforcement


learning specifies increasing the tendency that the required
behaviour would occur again by adding something. It enhances
the strength of the behaviour of the agent and positively
impacts it.

• Negative Reinforcement Learning: Negative reinforcement


learning works exactly opposite to the positive RL. It increases
the tendency that the specific behaviour would occur again by
avoiding the negative condition.
Real-world Use cases of Reinforcement Learning

• Video Games: RL algorithms are much popular in gaming applications. It is used


to gain super-human performance. Some popular games that use RL algorithms
are AlphaGO and AlphaGO Zero.
• Resource Management: The "Resource Management with Deep Reinforcement
Learning" paper showed that how to use RL in computer to automatically learn
and schedule resources to wait for different jobs in order to minimize average
job slowdown.
• Robotics: RL is widely being used in Robotics applications. Robots are used in
the industrial and manufacturing area, and these robots are made more powerful
with reinforcement learning. There are different industries that have their vision
of building intelligent robots using AI and Machine learning technology.
• Text Mining : Text-mining, one of the great applications of NLP, is now being
implemented with the help of Reinforcement Learning.
Advantages and Disadvantages of Reinforcement Learning

▪ Advantages
• It helps in solving complex real-world problems which are difficult to be solved by general
techniques.
• The learning model of RL is similar to the learning of human beings; hence most accurate results
can be found.
• Helps in achieving long term results.

▪ Disadvantage
• RL algorithms are not preferred for simple problems.
• RL algorithms require huge data and computations.
• Too much reinforcement learning can lead to an overload of states which can weaken the
results.
Criteria Supervised ML Unsupervised ML Reinforcement ML

Trained using
Learns by using Works on interacting
Definition unlabelled data without
labelled data with the environment
any guidance.

Type of data Labelled data Unlabelled data No – predefined data

Regression and Association and Exploitation or


Type of problems
classification Clustering Exploration
Supervision Extra supervision No supervision No supervision
Linear Regression,
K – Means, Q – Learning,
Algorithms Logistic Regression,
C – Means, Apriori SARSA
SVM, KNN etc.
Discover underlying
Aim Calculate outcomes Learn a series of action
patterns
Recommendation
Risk Evaluation, Self Driving Cars,
Application System, Anomaly
Forecast Sales Gaming, Healthcare
Detection
The machine learning framework

▪ Apply a prediction function to a feature representation of the image to get the


desired output:

f( ) = “apple”
f( ) = “tomato”
f( ) = “cow”
Slide credit: L. Lazebnik
The machine learning framework

y = f(x)
output prediction function Image feature

▪ Training: given a training set of labeled examples {(x1,y1), …, (xN,yN)}, estimate the
prediction function f by minimizing the prediction error on the training set
▪ Testing: apply f to a never before seen test example x and output the predicted
value y = f(x)

Slide credit: L. Lazebnik


Steps
Trainin Training
g Labels
Training Images

Image Learned
Training
Features model

Testin
g
Image
Learned model Prediction
Features
Test Slide credit: D. Hoiem and L. Lazebnik

Image
MODELS OF MACHINE LEARNING

DSA
▪ Geometric models use intuitions from geometry such as separating (hyper-)planes,
linear transformations and distance metrics.
▪ Probabilistic models view learning as a process of reducing uncertainty, modelled
by means of probability distributions.
▪ Logical models are defined in terms of easily interpretable logical expressions.
▪ Grouping models divide the instance space into segments; in each segment a very
simple (e.g., constant) model is learned.
▪ Grading models learning a single, global model over the instance space.
Geometric Model

▪ A geometric model is a mathematical representation of an object or


system that uses geometry to describe its properties and relationships.
▪ In machine learning, geometric models can be used to represent data in a
way that allows us to analyze its properties and relationships.
▪ A geometric model is constructed directly in instance space, using
geometric concepts such as lines, planes and distances.
▪ They are easy to visualise as long as we keep two or three dimensions
▪ can be used in various areas of machine learning, such as data analysis,
classification, clustering, and regression

ML
Geometric Model
▪ Example : The nearest neighbour algorithm used for
classification and regression
▪ works by finding the closest data point to a given query
point in a geometric space.
▪ The distance between two data points can be measured
using different metrics, such as Euclidean distance or
cosine similarity. Once the closest data point is found,
the algorithm can use its properties to classify or predict
the properties of the query point.
▪ The basic linear classifier constructs a decision boundary by
half-way intersecting the line between the positive and
negative centres of mass.
▪ It is described by the equation w· x = t, with w = p−n;
▪ the decision threshold can be found by noting that (p+n)/2 is
on the decision boundary, and hence t = (p−n)·(p+n)/2 =
(||p||2 −||n||2 )/2, where ||x|| denotes the length of vector x.
Geometric Model
▪ Support vector machine used in classification task
▪ SVM works by finding a hyperplane in a high-
dimensional space that separates the data points into
different classes.
▪ The hyperplane is chosen in such a way that it
maximizes the margin between the two closest data
points from different classes.
▪ The decision boundary learned by a support vector
machine from the linearly separable data from Figure
1.
▪ The decision boundary maximises the margin, which
is indicated by the dotted lines. The circled data points
are the support vectors
Geometric Model

▪ Geometric models can also be used in clustering tasks, where the goal is to
group similar data points together.
▪ One example of a geometric model for clustering is the k-means algorithm,
which works by partitioning the data into k clusters based on their distance to k
initial centroids. The centroids are then updated iteratively to minimize the
distance between the data points and their respective centroids.

ML
Geometric Model

Challenges:
1. Curse of Dimensionality: As the dimensionality of data increases, it becomes
increasingly difficult to model and analyze the data. In order to overcome this
challenge, various techniques have been developed such as feature selection,
dimensionality reduction, and regularization.
2. Choosing the Right Model: Geometric models come in many different forms, each
with their own strengths and weaknesses. Choosing the right model for a given
problem can be a challenging task, and often requires careful experimentation and
analysis.
3. Interpreting Results: Geometric models can produce complex and high-
dimensional results, making it difficult to interpret and understand the output.
Techniques such as visualization and feature importance analysis can help to
overcome this challenge and make the results more interpretable.
ML
Probabilistic models

▪ probabilistic models, which take into consideration the uncertainty inherent in real-
world data.

▪ These models make predictions based on probability distributions, rather than


absolute values, allowing for a more accurate understanding of complex systems

▪ Not all data fits well into a probabilistic framework, which can limit the usefulness of
these models in certain applications.

▪ Another challenge is that probabilistic models can be computationally intensive and


require significant resources to develop and implement.

ML
Categories Of Probabilistic Models

1. Generative models :
▪ Generative models aim to model the joint distribution of the input and output
variables.
▪ These models generate new data based on the probability distribution of the
original dataset.
▪ The joint distribution looks for a relationship between two variables. Once this
relationship is inferred, it is possible to infer new data points.
▪ Generative models are powerful because they can generate new data that
resembles the training data.
▪ They can be used for tasks such as image and speech synthesis, language
translation, and text generation.

ML
2. Discriminative models:
▪ The discriminative model aims to model the conditional distribution of the output
variable given the input variable.

▪ They learn a decision boundary that separates the different classes of the output
variable.

▪ Discriminative models are useful when the focus is on making accurate predictions
rather than generating new data.

▪ They can be used for tasks such as image recognition, speech recognition, and
sentiment analysis.
3. Graphical models:

▪ These models use graphical representations to show the conditional


dependence between variables.

▪ They are commonly used for tasks such as image recognition, natural
language processing, and causal inference.
Advantages Of Probabilistic Models

▪ The main advantage of these models is their ability to take into account
uncertainty and variability in data. This allows for more accurate predictions and
decision-making, particularly in complex and unpredictable situations.

▪ Probabilistic models can also provide insights into how different factors influence
outcomes and can help identify patterns and relationships within data.
Posterior Probability

Let X denote variables we know about e.g instance’s feature values


Let Y denote the target variables we are interested in e.g the instance’s class
We are interested in conditional probability P(Y|X)
Suppose Y - whether email is spam, X- whether email contains the words
‘Vigra’ and ‘lottery’
The probability of intereset is then P(Y|Vigra,lottery)
for particular email we may write P(Y|vigra=1,lottery=0)) <- posterior
probability because it is used after the features X are observed
A simple Probablistic Model
Decision rule

Assuming that X and Y are the only variables we know and care about, the
posterior distribution P(Y |X) helps us to answer many questions of interest.
▪ For instance, to classify a new e-mail we determine whether the words
‘Viagra’ and ‘lottery’ occur in it, look up the corresponding probability
P(Y = spam|Viagra,lottery), and predict spam if this probability exceeds
0.5 and ham otherwise.
▪ Such a recipe to predict a value of Y on the basis of the values of X and
the posterior distribution P(Y |X) is called a decision rule
Likelihood ratio

As a matter of fact, statisticians work very often with different conditional probabilities,
given by the likelihood function P(X|Y ).
▪ I like to think of these as thought experiments: if somebody were to send me a spam e-
mail, how likely would it be that it contains exactly the words of the e-mail I’m looking
at? And how likely if it were a ham e-mail instead?
▪ What really matters is not the magnitude of these likelihoods, but their ratio: how much
more likely is it to observe this combination of words in a spam e-mail than it is in a
non-spam e-mail.
▪ For instance, suppose that for a particular e-mail described by X we have P(X|Y =
spam) = 3.5 · 10−5 and P(X|Y = ham) = 7.4 · 10−6 , then observing X in a spam e-mail
is nearly five times more likely than it is in a ham e-mail

▪ This suggests the following decision rule: predict spam if the likelihood ratio is larger
than 1 and ham otherwise.
Naive Bayes Algorithm in Probabilistic Models

▪ Naive Bayes is a probabilistic algorithm that is used for classification problems.

▪ It is based on the Bayes theorem of probability and assumes that the features are
conditionally independent of each other given the class

▪ The Naive Bayes Algorithm is used to calculate the probability of a given sample
belonging to a particular class.

▪ This is done by calculating the posterior probability of each class given the
sample and then selecting the class with the highest posterior probability as the
predicted class.
Logical models

● Logical models use a logical expression to divide the instance space into segments and hence
construct grouping models.

● Once the data is grouped using a logical expression, the data is divided into homogeneous
groupings for the problem we are trying to solve. For example, for a classification problem, all
the instances in the group belong to one class.

● There are mainly two kinds of logical models: Tree models and Rule models.
● Rule based models:
○ Rule models consist of a collection of implications or IF-THEN rules
if Viagra=1 then Class=Y=spam.
if Vigra=0 ^ lottery=1 then Class=Y=spam
if Viagra=0 ^ lottery=0 then Class=Y=ham

Such rules are easily arranged in a tree structure, which we refer to as a feature tree.
Tree model

A feature tree:

(left) A feature tree combining two Boolean features. Each internal node or split is labelled with a
feature, and each edge emanating from a split is labelled with a feature value. Each leaf therefore
corresponds to a unique combination of feature values. Also indicated in each leaf is the class
distribution derived from the training set. (right) A feature tree partitions the instance space into
rectangular regions, one for each leaf. We can clearly see that the majority of ham lives in the lower
left-hand corner.
Labelling a feature tree

▪ The leaves of the tree in Figure could be labelled, from left to right, as ham – spam –
spam, employing a simple decision rule called majority class.

▪ Alternatively, we could label them with the proportion of spam e-mail occurring in each
leaf: from left to right, 1/3, 2/3, and 4/5.

▪ Or, if our task was a regression task, we could label the leaves with predicted real values
or even linear functions of some other, real-valued features.

▪ One of the most well-known logical models in machine learning is the decision tree.
Decision trees are a popular classification algorithm that uses a tree-like model of
decisions and their possible consequences to classify data points. Each internal node in the
decision tree represents a decision based on a feature value, and each leaf node represents
a class label.
Grouping and Grading model

The key difference between grouping and grading model is the way they handle the instance
space.
Grouping Model:
▪ Grouping model breaks the instance space into groups or segments, the number of which is
determined at training time.
▪ Grouping models have a fixed and finite resolution and cannot distinguish between individual
instances beyond this resolution.
▪ What grouping models do at this finest resolution is often something very simple, such as
assigning the majority class to all instances that fall into the segment.
▪ The main emphasis of training a grouping model is then on determining the right segments.

▪ Example : tree based models: They work by repeatedly splitting the instance space into
smaller subsets. The subsets at the leaves of the tree partition the instance space with some
finite resolution. Instances filtered into the same leaf of the tree are treated the same ,
regardless of any features not in the tree that might be able to distinguish them.
Grouping and Grading model

Grading Model:
▪ Grading model do not employ notion of segment.

▪ Rather than applying very simple, local models, they form one global model over the instance
space.

▪ Grading models are able to distinguish between arbitrary instances, no matter how similar
they are.

▪ Their resolution is, intheory, infinite,particularly when working in a Cartesian instance space.

▪ Example: Support vector machines and other geometric classifiers- because they work in a
Cartesian instance space, they are able to represent and exploit the minutest differences
between instances.
Parametric and Parametric model

Parametric Model:
A learning model that summarizes data with a set of parameters of fixed size
(independent of the number of training examples) is called a parametric model. No
matter how much data you throw at a parametric model, it won’t change its mind about
how many parameters it needs.
▪ Assumptions can greatly simplify the learning process, but can also limit what can be
learned
▪ Algorithms that simplify the function to a known form are called parametric machine
learning algorithms.
▪ The algorithms involve two steps:
▪ Select a form for the function.
▪ Learn the coefficients for the function from the training data.
▪ An easy to understand functional form for the mapping function is a line, as is used in
linear regression:
b0 + b1*x1 + b2*x2 = 0
Where b0, b1 and b2 are the coefficients of the line that control the intercept and
slope, and x1 and x2 are two input variables.
▪ we need to do is estimate the coefficients of the line equation and we have a
predictive model for the problem
▪ the actual unknown underlying function may not be a linear function like a line in
which case the assumption is wrong and the approach will produce poor result
▪ Some more examples of parametric machine learning algorithms include:
Logistic Regression ,Linear Discriminant Analysis, Naive Bayes
Benefits of Parametric Machine Learning Algorithms:
▪ Simpler: These methods are easier to understand and interpret results.
▪ Speed: Parametric models are very fast to learn from data.
▪ Less Data: They do not require as much training data and can work well even if the fit
to the data is not perfect.
Limitations of Parametric Machine Learning Algorithms:
▪ Constrained: By choosing a functional form these methods are highly constrained to
the specified form.
▪ Limited Complexity: The methods are more suited to simpler problems.
▪ Poor Fit: In practice the methods are unlikely to match the underlying mapping
function.
Non Parametric Model:
Nonparametric methods are good when you have a lot of data and no prior
knowledge, and when you don’t want to worry too much about choosing just the
right features.
▪ Algorithms that do not make strong assumptions about the form of the mapping
function are called nonparametric machine learning algorithms
▪ they are free to learn any functional form from the training data.
▪ referred to as distribution-free methods -no distribution (normal distribution, etc.) of
any kind is available for use.
▪ Example: k-nearest neighbors algorithm that makes predictions based on the k
most similar training patterns for a new data instance. The method does not
assume anything about the form of the mapping function other than patterns that
are close are likely to have a similar output variable.
Some more examples of popular nonparametric machine learning algorithms are:
k-Nearest Neighbors , Decision Trees like CART and C4.5, Support Vector Machines
Benefits of Nonparametric Machine Learning Algorithms:
▪ Flexibility: Capable of fitting a large number of functional forms.
▪ Power: No assumptions (or weak assumptions) about the underlying function.
▪ Performance: Can result in higher performance models for prediction.
Limitations of Nonparametric Machine Learning Algorithms:
▪ More data: Require a lot more training data to estimate the mapping function.
▪ Slower: A lot slower to train as they often have far more parameters to train.
▪ Overfitting: More of a risk to overfit the training data and it is harder to explain why specific
predictions are made.
Parametric Methods Non parametric Methods
uses a fixed number of parameters to use the flexible number of parameters to
build the model. build the model.

It always considers strong assumptions It generally fewer assumptions about data.


about data.

require lesser data than Non-Parametric requires much more data than Parametric
Methods. Methods.

Parametric methods assumed to be a There is no assumed distribution in non-


normal distribution parametric methods

Parametric data handles – Intervals data But non-parametric methods handle


or ratio data. original data.

the result or outputs generated can be result or outputs generated cannot be


easily affected by outliers. seriously affected by outliers.
Parametric Methods Non parametric Methods
Parametric Methods can perform well in Similarly, Non-Parametric Methods can
many situations but its performance is at perform well in many situations but its
peak (top) when the spread of each group performance is at peak (top) when the
is different. spread of each group is the same.
computationally faster than the Non- computationally slower than the
Parametric methods. Parametric methods.

Examples: Logistic Regression, Naïve Examples: KNN, Decision Tree Model,


Bayes Model, etc. etc.
THANK YOU !!
Name of Instructor: Poonam G. Fegade

Email of Instructor: [email protected]

K.K.Wagh Institute of Engineering Education & Research, Nashik

You might also like