Sheet ML
Sheet ML
2. About 2/3 of your email is spam so you downloaded an open source spam filter
based on word occurrences that uses the Naive Bayes classifier. Assume you
collected the following regular and spam mails to train the classifier, and only
three words are informative for this classification, i.e., each email is
represented as a 3-dimensional binary vector whose components indicate
whether the respective word is contained in the email.
1. You find that the spam filter uses a prior p(spam) = 0.1. Explain (in one sentence) why this
might be sensible.
2. Based on the prior and conditional probabilities above, give the model probability
P(spam|s) that the sentence s=“money for psychology study” is spam.
Course : Machine Learning
Faculty of Computers and Information
Minia University
------------------------------------------------------------------------------------------------------------------------------------------------------------
3. What is the biggest advantage of decision trees when compared to logistic regression
classifiers?
4. What is the biggest weakness of decision trees compared to logistic regression
classifiers?
5. We are given a set of two dimensional inputs and their corresponding output
pair: {xi, 1, xi, 2, y } We would like to use the following regression model to predict y:
9. Draw the full decision tree that would be learned for this data.