Bayes theorem calculates the probability of one event occurring given that another event has occurred. It breaks down the calculation into prior probability, likelihood, and marginal probability. Naive Bayes is a classification algorithm that applies Bayes theorem with an independence assumption between predictors.
Bayes theorem calculates the probability of one event occurring given that another event has occurred. It breaks down the calculation into prior probability, likelihood, and marginal probability. Naive Bayes is a classification algorithm that applies Bayes theorem with an independence assumption between predictors.
Bayes theorem calculates the probability of one event occurring given that another event has occurred. It breaks down the calculation into prior probability, likelihood, and marginal probability. Naive Bayes is a classification algorithm that applies Bayes theorem with an independence assumption between predictors.
Bayes theorem calculates the probability of one event occurring given that another event has occurred. It breaks down the calculation into prior probability, likelihood, and marginal probability. Naive Bayes is a classification algorithm that applies Bayes theorem with an independence assumption between predictors.
• Bayes theorem is one of the most popular machine learning
concepts that helps to calculate the probability of occurring one event with uncertain knowledge while other one has already occurred. • Bayes' theorem c an be derived using produc t rule and conditional probability of event X with known event Y: • According to the product rule we can express as the probability of event X with known event Y as follows; • P(X ? Y)= P(X|Y) P(Y) {equation 1} • Further, the probability of event Y with known event X: 1. P(X ? Y)= P(Y|X) P(X) {equation 2} • Here, both events X and Y are independent events which means probability of outcome of both events does not depends one another. • P(X|Y) is called as posterior, which we need to calculate. It is defined as updated probability after considering the evidence. • P(Y|X) is called the likelihood. It is the probability of evidence when hypothesis is true. • P(X) is called the prior probability, probability of hypothesis before considering the evidence • P(Y) is called marginal probability. It is def ined as the probability of evidence under any consideration. • Hence, Bayes Theorem can be written as: • posterior = likelihood * prior / evidence Understanding Bayes Theorem • 1. Experiment • An experiment is def ined as the planned operation carried out under controlled condition such as tossing a coin, drawing a card and rolling a dice, etc. • 2. Sample Space • During an experiment what we get as a result is called as possible outcomes and the set of all possible outcome of an event is known as sample space. For example, if we are rolling a dice, sample space will be: • S1 = {1, 2, 3, 4, 5, 6} • Similarly, if our experiment is related to toss a coin and recording its outcomes, then sample space will be: • S2 = {Head, Tail} • 3. Event • Event is def in ed as subset of sample space in an experiment. Further, it is also called as set of outcomes. • ssume in our experiment of rolling a dice, there are two event A and B such that; • A = Event when an even number is obtained = {2, 4, 6} • B = Event when a number is greater than 4 = {5, 6} • Probability of the event A ''P(A)''= Number of favourable outcomes / T o t a l n u m b e r o f p o s s i b l e o u t c o m e s P(E) = 3/6 =1/2 =0.5 • Similarly, Probability of the event B ''P(B)''= Number of favourable o u t c o m e s / T o t a l n u m b e r o f p o s s i b l e o u t c o m e s = 2 / 6 = 1 / 3 = 0 . 3 3 3 • Union of event A and B: • Intersection of events • Disjoint events • 4. Random Variable: • It is a real value function which helps mapping between sample space and a real line of an experiment. A random variable is taken on some random values and each value having some probability. However, it is neither random nor a variable but it behaves as a function which can either be discrete, continuous or combination of both. • 5. Exhaustive Event: • As per the name suggests, a set of events where at least one event occurs at a time, called exhaustive event of an experiment. • 7. Conditional Probability: • Conditional probability is def ined as the probability of an event A, given that another event B has already occurred (i.e. A conditional B). This is represented by P(A|B) and we can define it as: • P(A|B) = P(A ∩ B) / P(B) • 8. Marginal Probability: • Marginal probability is def in ed as the probability of an event A occurring independent of any other event B. Further, it is considered as the probability of evidence under any consideration. • P(A) = P(A|B)*P(B) + P(A|~B)*P(~B) Bayes theorem in ML • Bayes theorem helps us to calculate the single term P(B|A) in terms of P(A|B), P(B), and P(A). This rule is very helpful in such scenarios where we have a good probability of P(A|B), P(B), and P(A) and need to determine the fourth term. • • Naïve Bayes classif ie r is one of the simplest applications of Bayes theorem which is used in classif ic ation algorithms to isolate data as per accuracy, speed and classes. • Let's understand the use of Bayes theorem in machine learning with below example. • Suppose, we have a vector A with I attributes. It means • A = A1, A2, A3, A4……………Ai • Further, we hav e n c lasses rep resented as C1 , C2 , C3 , C4…………Cn. • These are two conditions given to us, and our classif ie r that works on Machine Language has to predict A and the f irst thing that our classif ier has to choose will be the best possible class. So, with the help of Bayes theorem, we can write it as: • P(Ci/A)= [ P(A/Ci) * P(Ci)] / P(A) • Here; • P(A) is the condition-independent entity. • P(A) will remain constant throughout the class means it does not change its value with respect to change in class. To maximize the P(Ci/A), we have to maximize the value of term P(A/Ci) * P(Ci). • With n number classes on the probability list let's assume that the possibility of any class being the right answer is equally likely. Considering this factor, we can say that: • P(C1)=P(C2)-P(C3)=P(C4)=…..=P(Cn). • This process helps us to reduce the computation cost as well as time. This is how Bayes theorem plays a significant role in Machine Learning and Naïve Bayes theorem has simplified the conditional probability tasks without affecting the precision. Hence, we can conclude that: • P(Ai/C)= P(A1/C)* P(A2/C)* P(A3/C)*……*P(An/C) • Hence, by using Bayes theorem in Machine Learning we can easily describe the possibilities of smaller events. • • Naïve Bayes theorem is also a supervised algorithm, which is based on Bayes theorem and used to solve classification problems. It is one of the most simple and effective classification algorithms in Machine Learning which enables us to build various ML models for quick predictions. It is a probabilistic classifier that means it predicts on the basis of probability of an object. Some popular Naïve Bayes algorithms are spam filtration, Sentimental analysis, and classifying articles. What Is the Naive Bayes Algorithm? • It is a classification technique based on Bayes’ Theorem with an independence assumption among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. • For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in diameter. Even if these features depend on each other or upon the existence of the other features, all of these properties independently contribute to the probability that this fruit is an apple and that is why it is known as ‘Naive’. • An NB model is easy to build and particularly useful for very large data sets. Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods. • Bayes theorem provides a way of computing posterior probability P(c|x) from P(c), P(x) and P(x|c). Look at the equation below: • Above, • P ( c|x ) is the posterior probability of class (c, target) given predictor (x, attributes). • P(c) is the prior probability of class. • P(x|c) is the likelihood which is the probability of the predictor given class. • P(x) is the prior probability of the predictor. Working • Let’s understand it using an example. Is a training data set of weather and corresponding target variable ‘Play’ (suggesting possibilities of playing). Now, we need to classify whether players will play or not based on weather condition. Let’s follow the below steps to perform it. • Step 1: Convert the data set into a frequency table • Step 2: Create Likelihood table by f inding the probabilities like Overcast probability = 0.29 and probability of playing is 0.64 • Step 3: Now, use Naive Bayesian equation to calculate the posterior probability for each class. The class with the highest p osterior p rob ab ility is the outc om e of the p redic tion. • Problem: Players will play if the weather is sunny. Is this statement correct? • We can solve it using the above-discussed method of posterior probability. • P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny) • Here P( Sunny | Yes) * P(Yes) is in the numerator, and P (Sunny) is in the denominator. • Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P( Yes)= 9/14 = 0.64 • Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability. • The Naive Bayes uses a similar method to predict the probability of different class based on various attributes. This algorithm is mostly used in text classification (nlp) and with problems having multiple classes.