Chapt 2 Notes
Chapt 2 Notes
Weather Yes No
Overcast 5 0
Rainy 2 2
Sunny 3 2
Total 10 5
Weather No Yes
Rainy 2 2 4/14=0.29
Sunny 2 3 5/14=0.35
Applying Bayes'theorem:
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
P(Sunny|Yes)= 3/10= 0.3
P(Sunny)= 0.35
P(Yes)=0.71
So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60
Q-4.What are the Types of Naïve Bayes Model:
Ans-There are three types of Naive Bayes Model, which are given below:
1.Gaussian: The Gaussian model assumes that features follow a normal
distribution. This means if predictors take continuous values instead of discrete,
then the model assumes that these values are sampled from the Gaussian
distribution.
2.Multinomial: The Multinomial Naïve Bayes classifier is used when the data
is multinomial distributed. It is primarily used for document classification
problems, it means a particular document belongs to which category such as
Sports, Politics, education, etc.
The classifier uses the frequency of words for the predictors.
3.Bernoulli: The Bernoulli classifier works similar to the Multinomial
classifier, but the predictor variables are the independent Booleans variables.
Such as if a particular word is present or not in a document. This model is also
famous for document classification tasks.
Q.5. Why is it called Naïve Bayes?
Ans-The Naïve Bayes algorithm is comprised of two words Naïve and Bayes,
Which can be described as:
Naïve: It is called Naïve because it assumes that the occurrence of a certain
feature is independent of the occurrence of other features. Such as if the fruit is
identified on the bases of color, shape, and taste, then red, spherical, and sweet
fruit is recognized as an apple. Hence each feature individually contributes to
identify that it is an apple without depending on each other.
Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.
It may have an overfitting issue, which can be resolved using the Random
Forest algorithm.
For more class labels, the computational complexity of the decision tree may
increase.
It predicts output with high accuracy, even for the large dataset it runs
efficiently.
Ans- There are mainly four sectors where Random forest mostly used:
Instead of relying on one decision tree, the random forest takes the prediction
from each tree and based on the majority votes of predictions, and it predicts the
final output.
The greater number of trees in the forest leads to higher accuracy and prevents
the problem of overfitting.
Q.16. What are the Advantages and Disadvantages of Random Forest?
It enhances the accuracy of the model and prevents the overfitting issue.
Although random forest can be used for both classification and regression tasks,
it is not more suitable for Regression tasks.
Ans- Random Forest works in two-phase first is to create the random forest
by combining N decision tree, and second is to make predictions for each
tree created in the first phase.
Step-2: Build the decision trees associated with the selected data points
(Subsets).
Step-3: Choose the number N for decision trees that you want to build.
Step-5: For new data points, find the predictions of each decision tree, and
assign the new data points to the category that wins the majority votes.
The working of the algorithm can be better understood by the below example:
Example: Suppose there is a dataset that contains multiple fruit images. So, this
dataset is given to the Random forest classifier. The dataset is divided into
subsets and given to each decision tree. During the training phase, each decision
tree produces a prediction result, and when a new data point occurs, then based
on the majority of results, the Random Forest classifier predicts the final
decision. Consider the below image: