Aiml Iii
Aiml Iii
Bayesian and Computational Learning: Bayes theorem, concept learning, maximum likelihood,
minimum description length principle, Gibbs Algorithm, Naïve Bayes Classifier Algorithm , Instance Based
Learning- K-Nearest neighbor learning
Introduction to Machine Learning (ML): Definition, Evolution, Need, applications of ML in
industry and real world, classification; differences between supervised and unsupervised learning
paradigms.
1. Experiment
1. Tossing Coin,
2. Drawing Card And
3. Rolling Dice, Etc.
2. Sample Space
During an experiment what we get as a result is called as possible outcomes and the set of all possible outcome of an
event is known as sample space.
S1 = {1, 2, 3, 4, 5, 6}
Similarly, if our experiment is related to toss a coin and recording its outcomes, then sample space will be:
S2 = {Head, Tail}
3. Event
Assume in our experiment of rolling a dice, there are two event A and B such that;
A = Event when an even number is obtained = {2, 4, 6}
B = Event when a number is greater than 4 = {5, 6}
Probability of the event A ''P(A)''= Number of favourable outcomes / Total number of possible outcomes
P(E) = 3/6 =1/2 =0.5
Similarly, Probability of the event B ''P(B)''= Number of favourable outcomes / Total number of possible outcomes
=2/6
=1/3
=0.333
Disjoint Event: If the intersection of the event A and B is an empty set or null then such events are known as disjoint
event or mutually exclusive events also.
4. Random Variable:
A random variable is taken on some random values and each value having some probability.
A random variable can either be discrete, continuous or combination of both.
5. Exhaustive Event:
6. Independent Event:
Two events are said to be independent when occurrence of one event does not affect the occurrence of another event.
In simple words we can say that the probability of outcome of both events does not depends one another.
Mathematically, two events A and B are said to be independent if:
7. Conditional Probability:
Conditional probability is defined as the probability of an event A, given that another event B has already occurred
(i.e. A conditional B).
This is represented by P(A|B) and we can define it as:
Bayes theorem is given by an English statistician, philosopher, named Mr. Thomas Bayes in 17th century.
Bayes provides their thoughts in decision theory which is extensively used in important mathematics concepts as
Probability.
Bayes theorem is also widely used in Machine Learning where we need to predict output accurately.
An important concept of Bayes theorem named Bayesian method
It used to calculate conditional probability in Machine Learning application that includes classification tasks.
Further, a simplified version of Bayes theorem (Naïve Bayes classification) is also used to reduce computation time
and average cost of the projects.
Bayes theorem is also known with some other name such as Bayes rule or Bayes Law.
It is used to calculate the probability of occurring one event while other one already occurred.
Bayes' theorem can be derived using product rule and conditional probability of event X with known event Y:
o According to the product rule we can express as the probability of event X with known event Y as follows;
Mathematically, Bayes theorem can be expressed by combining both equations on right hand side. We will get:
Here, both events X and Y are independent events which means probability of outcome of both events does not
depends one another.
Length of the database = Length of the Model +(Length of Data /Length of the Model)
Error Over Complexity = if one hypothesis then I got more error. Then I select another hypothesis for less error.
Gibbs Algorithm
Bayes Optimal classifier is shows best performance to show posterior probability using hypothesis H.
But it becomes costly because of H(combination of all hypothesis).
Choose random hypothesis “h” from “H” for posteriori probability distribution H.
Use random hypothesis “h” to predict the next outcome.
{Here random hypothesis h1, h2, h3 are subset of H}
Take one hypothesis in between h1 or h2 or h3 which one has less errors.
Maximum likelihood
What Are Likelihood and Probability?
Probability is a branch of mathematics that deals with the possibility of a random experiment occurring. The
term "probability" refers to the possibility of something happening.
Likelihood refers to the process of determining the best data distribution given a specific situation in the data.
Goal of maximum likelihood is to find the parameter values that give the distribution that maximize the
probability of observing the data.
Normally we assume probability or likelihood of Mouse weights are something but as per mouse weights Likelihood is
diffrent
Maximum Likelihood means we find the average of all outcomes and shifted or update the “Normal distribution” to over
maximum distribution.
--------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------
Naïve Bayes Classifier Algorithm
Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem and used for
solving classification problems.
It is mainly used in text classification that includes a high-dimensional training dataset.
Naïve Bayes Classifier is one of the simple and most effective Classification algorithms which helps in building
the fast machine learning models that can make quick predictions.
It is a probabilistic classifier, which means it predicts on the basis of the probability of an object.
Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental analysis, and classifying
articles.
The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be described as:
o Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is independent of the
occurrence of other features.
Such as if the fruit is identified on the bases of color, shape, and taste, then red, spherical, and sweet fruit is
recognized as an apple.
Hence each feature individually contributes to identify that it is an apple without depending on each other.
Bayes' Theorem:
o Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine the probability of a hypothesis with
prior knowledge. It depends on the conditional probability.
o The formula for Bayes' theorem is given as:
Working of Naïve Bayes' Classifier can be understood with the help of the below example:
Suppose we have a dataset of weather conditions and corresponding target variable "Play". So using this dataset we
need to decide that whether we should play or not on a particular day according to the weather conditions. So to solve
this problem, we need to follow the below steps:
Problem: If the weather is sunny, then the Player should play or not?
Solution: To solve this, first consider the below dataset:
Outlook Play
0 Rainy Yes
1 Sunny Yes
2 Overcast Yes
3 Overcast Yes
4 Sunny No
5 Rainy Yes
6 Sunny Yes
7 Overcast Yes
8 Rainy No
9 Sunny No
10 Sunny Yes
11 Rainy No
12 Overcast Yes
13 Overcast Yes
Weather Yes No
Overcast 5 0
Rainy 2 2
Sunny 3 2
Total 10 5
Weather No Yes
Rainy 2 2 4/14=0.29
Sunny 2 3 5/14=0.35
All 4/14=0.29 10/14=0.71
Applying Bayes'theorem:
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
P(Sunny)= 0.35
P(Yes)=0.71
P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|NO)= 2/4=0.5
P(No)= 0.29
P(Sunny)= 0.35
Where,
“d” is the Euclidean distance
(x1, y1) is the coordinate of the first point
(x2, y2) is the coordinate of the second point.
The K-NN working can be explained on the basis of the below algorithm:
Step-1: Select the number K of the neighbors
Step-2: Calculate the Euclidean distance of K number of neighbors
Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
Step-4: Among these k neighbors, count the number of the data points in each category.
Step-5: Assign the new data points to that category for which the number of the neighbor is maximum.
Step-6: Our model is ready.
Example:- We have a new data point and we need to put it in the required category. Consider the below
image:
Firstly, we will choose the number of neighbors, so we will choose the k=5.
Next, we will calculate the Euclidean distance between the data points.
As we can see the 3 nearest neighbors are from category A, hence this new data point must belong to category A.
Below are some points to remember while selecting the value of K in the K-NN algorithm:
There is no particular way to determine the best value for "K", so we need to try some values to find the
best out of them. The most preferred value for K is 5.
A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of outliers in the model.
Large values for K are good, but it may find some difficulties.
It is simple to implement.
It is robust to the noisy training data
It can be more effective if the training data is large.
Always needs to determine the value of K which may be complex some time.
The computation cost is high because of calculating the distance between the data points for all the training samples.
But can a machine also learn from experiences or past data like a human does? So here comes the role of Machine
Learning.
Machine Learning is said as a subset of artificial intelligence that is mainly concerned with the development of
algorithms which allow a computer to learn from the data and past experiences on their own.
The term machine learning was first introduced by Arthur Samuel in 1959.
With the help of sample historical data, which is known as training data, machine learning algorithms build
a mathematical model that helps in making predictions or decisions without being explicitly programmed.
A Machine Learning system learns from historical data, builds the prediction models, and whenever it receives new
data, predicts the output for it.
The accuracy of predicted output depends upon the amount of data, as the huge amount of data helps to
build a better model which predicts the output more accurately.
Suppose we have a complex problem, where we need to perform some predictions, so instead of writing a
code for it, we just need to feed the data to generic algorithms, and with the help of these algorithms,
machine builds the logic as per the data and predict the output.
Machine learning has changed our way of thinking about the problem.
The below block diagram explains the working of Machine Learning algorithm:
Machine learning is used in self-driving cars, cyber fraud detection, face recognition, and friend suggestion by
Facebook, etc.
Various top companies such as Netflix and Amazon have build machine learning models that are using a vast amount of
data to analyze the user interest and recommend product accordingly.
Following are some key points which show the importance of Machine Learning:
1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning
1) Supervised Learning
Supervised learning is a type of machine learning method
Here we provide sample data to the train the system.
The system creates a model using Sample data to understand the datasets and learn about each data.
Once the training and processing are done then we test the system by providing a sample data
And check whether it is predicting the exact output or not.
2) Unsupervised Learning
Unsupervised learning is a learning method in which a machine learns without any supervision.
The training is provided to the machine with the set of data that has not been labeled, classified, or
categorized,
And the algorithm needs to act on that data without any supervision.
The goal of unsupervised learning is to restructure the input data into new features or a group of objects with
similar patterns.
Year 1970 :
Backpropagation is a set of algorithms used extensively in Deep Learning, they dynamically alter the Deep
Learning Neural Network to effectively do self correction.
Backpropagation scientific paper was published by Seppo Linnainmaa but at that time it was called Automatic
Differentiation (AD).
Year 1980 :
Kunihiko Fukushima successfully built a multilayered Neural Network called ANN - Artificial Neural Network
It acted as a platform for the development of Convoluted Neural Networks down the line.
Year 1981 :
Gerald Dejong built a new way to teach machines and he called it Explanation Based Learning, this was a very early
Machine Learning implementation and it processed Data to create a set of rules which is another way of saying that it
created an algorithm.
Year 1989 :
Reinforcement Learning is finally realized. Q-Learning algorithm is developed by Christopher Watkins which
made it teaching a machine for play and risk and reward game.
Year 1995 :
Rise of 2 very important algorithms in the Machine Learning space;
“Random Forest Algorithm” and “Support Vector Machines”.
Year 2009 :
ImageNet is created, which facilitated Computer Vision research by giving researchers access to a vast
database categorized by objects and features.
It was a project initiated by Fei-Fei Li from Stanford University.
1. Image Recognition:
Image recognition is one of the most common applications of machine learning.
It is used to identify objects, persons, places, digital images, etc.
The popular use case of image recognition and face detection is, Automatic friend tagging suggestion:
Facebook provides us a feature of auto friend tagging suggestion.
Whenever we upload a photo with our Facebook friends, then we automatically get a tagging suggestion with
name, and the technology behind this is machine learning's face detection and recognition algorithm.
It is based on the Facebook project named "Deep Face," which is responsible for face recognition and person
identification in the picture.
2. Speech Recognition
While using Google, we get an option of "Search by voice," it comes under speech recognition, and
it's a popular application of machine learning.
Speech recognition is a process of converting voice instructions into text, and it is also known as
"Speech to text", or "Computer speech recognition."
At present, machine learning algorithms are widely used by various applications of speech recognition.
Google assistant, Siri, Cortana, and Alexa are using speech recognition technology to follow the
voice instructions.
3. Traffic prediction:
If we want to visit a new place, we take help of Google Maps, which shows us the correct path with the
shortest route and predicts the traffic conditions.
It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or heavily congested with
the help of two ways:
Real Time location of the vehicle form Google Map app and sensors
Everyone who is using Google Map is helping this app to make it better.
It takes information from the user and sends back to its database to improve the performance.
4. Product recommendations:
Machine learning is widely used by various e-commerce and entertainment companies such
as Amazon, Netflix, etc., for product recommendation to the user.
Whenever we search for some product on Amazon, then we started getting an advertisement for the same
product while internet surfing on the same browser and this is because of machine learning.
Google understands the user interest using various machine learning algorithms and suggests the product as per
customer interest.
As similar, when we use Netflix, we find some recommendations for entertainment series, movies, etc., and this
is also done with the help of machine learning.
5. Self-driving cars:
One of the most exciting applications of machine learning is self-driving cars. Machine learning plays a significant
role in self-driving cars.
Tesla, the most popular car manufacturing company is working on self-driving car.
It is using unsupervised learning method to train the car models to detect people and objects while driving.
Whenever we receive a new email, it is filtered automatically as important, normal, and spam.
We always receive an important mail in our inbox with the important symbol and spam emails in our
spam box, and the technology behind this is Machine learning.
o Content Filter
o Header filter
o General blacklists filter
o Rules-based filters
o Permission filters
Some machine learning algorithms such as Multi-Layer Perceptron, Decision tree, and Naïve Bayes
classifier are used for email spam filtering and malware detection.
Machine learning is making our online transaction safe and secure by detecting fraud transaction.
Whenever we perform some online transaction, there may be various ways that a fraudulent
transaction can take place such as fake accounts, fake ids, and steal money in the middle of a
transaction.
So to detect this, Feed Forward Neural network helps us by checking whether it is a genuine
transaction or a fraud transaction.
For each genuine transaction, the output is converted into some hash values, and these values
become the input for the next round. For each genuine transaction, there is a specific pattern which
gets change for the fraud transaction hence, it detects it and makes our online transactions more
secure.
Machine learning is widely used in stock market trading. In the stock market, there is always a risk of
up and downs in shares, so for this machine learning's long short term memory neural network is
used for the prediction of stock market trends.
Nowadays, if we visit a new place and we are not aware of the language then it is not a problem at all,
as for this also machine learning helps us by converting the text into our known languages.
Google's GNMT (Google Neural Machine Translation) provide this feature, which is a Neural Machine
Learning that translates the text into our familiar language, and it called as automatic translation.
Difference between Supervised and Unsupervised Learning
Supervised and Unsupervised learning are the two techniques of machine learning.
But both the techniques are used in different scenarios and with different datasets.
Supervised learning needs supervision to train the model, which is similar to as a student learns things in
the presence of a teacher.
Supervised learning can be used for two types of problems:
Classification and Regression.
The goal of unsupervised learning is to find the structure and patterns from the input data.
Unsupervised learning can be used for two types of problems: Clustering and Association.
Example:
To understand the unsupervised learning, we will use the example given above.
So unlike supervised learning, here we will not provide any supervision to the model.
We will just provide the input dataset to the model and allow the model to find the patterns from the data.
With the help of a suitable algorithm, the model will train itself and divide the fruits int
Supervised learning model takes direct Unsupervised learning model does not
feedback to check if it is predicting take any feedback.
correct output or not.
Supervised learning model predicts the Unsupervised learning model finds the
output. hidden patterns in data.
Supervised learning can be used for those Unsupervised learning can be used for
cases where we know the input as well as those cases where we have only input
corresponding outputs. data and no corresponding output data.