0% found this document useful (0 votes)
642 views10 pages

Questions and Solutions On Bayes Theorem

Questions and solutions on bayes theorem

Uploaded by

KHAN AZHAR ATHAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
642 views10 pages

Questions and Solutions On Bayes Theorem

Questions and solutions on bayes theorem

Uploaded by

KHAN AZHAR ATHAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 10
5 Naive Bayes (8 pts) a Naive Bayes problem with thre of 12 training examples, 6 positive ble with some of the counts: features, ;...rs. Imagine that we have seen a (with y = 1) and 6 negative (with y = 0). He isa y=0[y=l mt 6 6 m=1 0 0 y= 2 4 1. Supply the following estimated probabilities. Use the Laplacian correction. # Pr(z2 = 1Jy=1) = # Pr(z; = Oly = 0) 2. Which feature plays the largest role in deciding the class of a new instance? Why 13, because it has the biggest difference in the likelihood of being true for the two different classes. The other two features carry no information about the class. Problem 2. Bayes Rule and Bayes Classifiers (12 points) Suppose you are given the following, set of data with three Boolean input variables a,b, and c, and a single Boolean output variable K. For parts (a) and (b), assume we are using a naive Bayes classifier to predict the value of K’ from the values of the other variables (a) [1.5 pts] According to the naive Bayes classifier, what is P(K Answer: 1/2 1a=1Ab=1A60=0)? 1) PC (b) [1.5 pts] According to the naive Bayes classifier, what is P(K = 0la=1A6=1)? Answer: 2/3 PUK = 0a = 1.46 =1) = PK =0Aa=1Ab=1)/P(a=1Ab=1) = P(K=0)- Pla =1|K = 0): P(b= 1K =0)/ Pla=1Ab=1AK =1)4Pla Now, suppose we are using a joint Bayes classifier to predict the value of K from the values of the other variables. (©) [15 pts] According to the joint Bayes classifier, what is P(K ~ Ia 1Ab=1A6~0)? Answer: 0 Let num(X) be the number of records in our data matching X, Then we have P(K = 1Ja=1/6=1A¢=0) = num(K (@) [1.5 pts] According to the joint Bayes classifier, what is P(K = Ola = 1.6= 1)? Answer: 1/2 PUK =0ja = 1 b= 1) = num(K = 0a =1Ab=1)/numia=1Ab= Tn an unrelated example, imagine we have three variables X,¥, and Z. (©) [2 pts] Imagine I tell you the following P(ZIX) PUY) o7 of Do you have enough information to compute P(Z|X AY)? If not, write “not enough info" compute the value of P(Z|X AY) from the above information, Answer: Not enough info. If so, (8) [2 pts] Instead, imagine [tell you the following: P(ZIX) PUY) P(X) PY) Do you now have cnough information to compute P(Z|X AY)? If'mot, write “not enough info”. Ifso, compute the value of P(Z\X AY) from the above information. Answer: Not enough info (g) (2 pts| Instead, imagine I tell you the following (falsifying my easlicr statements) PUZAX)=02 P(X) 0.3 PY) Do you now have enough information to compute P(Z|X AY)? Ifnot, write “not enough info”. If so, compute the value of P(Z|X AY) from the above information, Answer: 2/3, P(Z|X AY) = P(Z|X) since P(Y) = 1. In this ease, P(Z|X AY) = P(ZAX)/P(X) = 0.2/0.3 = 2/3. 6 Bayes Classifiers (10 points) Suppose we are given the following dataset, where A,B,C’ are input binary random variables, and y is a binary output whose value we want to predict. mer oreap| moron olal connoe-al (a) (5 points) How would a naive Bayes classifier predict y given this input A=0,B= “Assume that in case of a tie the classifier always prefers to predict 0 for y. Answer: The classifier will predict PA PA 1/3 P(C 1/2, PC Predicted y maximizes P(A = Oly) P(B P(A = Oly = 0)P(B = Oly = 0)P(C P(A = Oly = 1) P(B = Oly =) P(C Heenee, the predicted y is 1 ore (b) (5 points) Suppose you know for fact that A, B,C are independent random variables. In this case is it possible for any other classifier (e.g., a decision tree or a neural net) to do better than a naive Bayes Classifier? (The dataset is irrolevant for this question) Answer: Yes ‘The independency of A,B,C does not imply that they are independent within each class (in other words, they are not necessarily independent when conditioned on y). Therefore, naive Bayes classifier may uot be able to model the fmction well, while a decision tree might. ‘Thus, for example, y = AXOR B, is an example where A,B might be independent vatiables, but a naive Bayes classifier will not model the function well since for a particular class (say, y = 0), A and B are dependent, rT 5 Bayes Rule (19 points) (®) (4 points) I give you the following fact: P(AIB) = 2/3 Do you have enough information to compute P(B|A)? If not, write “not enough info”. If so, compute the value of P(B|A). Na Enoual \afe (b) (5 points) Instead, I give you the following fac P(AIB) = 2/3 PAl~B) = 1/3 Do you now have enough information to compute P(B|A)? If not, write “not enough info”. If so, compute the value of P(B|A). Not eroaale lap (©) (5 points) Instead, I give you the following facts: P(AIB) = 2/3 P(A|~B) = 1/3 P(B) = 1/3 Do you now have enough information to compute P(B|A)? If not, write “not enough info If $0, compute the value of P(B|A). P(slay= _f(ateye(s) os 4x8 _ PAlee(a+ PCAL-BYPCR) 4x5 + 59% (@)_ ( points) Instead, I give you the following facts: P(AIB) = 2/3 P(Aj~B) = 1/3 P(B) = 1/3 Pld) = 4/9 Do you now have enough information to compute P(BIA)? If not, write “not enough info”. If so, compute the value of P(BIA). yO: KUL ‘of- Commer S 5 Ofc ») 1 Conditional Independence, MLE/MAP, Probability (12 pts) 1. (4 pts) Show that Pr(X,¥ |Z) = Pr(X|Z) Pr(¥|Z) if Pr(X|¥,Z) = Pr( XZ) Pr(XYIZ) = Pr(XIYZIAY|Z) — Cchain rule ) = Pr (xX|2)Pr (VIZ) meron mistake: Pr(x]Y}z)=Pr(x|zZ) > X4Y given Z ° MP > Pe(KN|7)= PextO BZ) the first > does not hold if the equation is not fer al | pocsi (4 pts) If a data point y follows the Poisson distritution sath nto parameter 6, then th ey prcbebty of ile oberon 4 pula) — PE, for y—0,1,2, a You are given data points yi,-+- ,Ym independently drawn from a Poisson distribution with parameter 9. Write down the log-likelihood of the data as a function of 8. Dini (Ys lag 6-8 — log Ys! ) = ($45) igo ng — loy(IL i! ) (4 pts) Suppose that in answering a question in a multiple choice test, an examinee cither knows the answer, with probability p, or he guesses with probability 1— p. Assume that the probability of ausweriug a question conectly is 1 for an examinee who knows the answer and 1/m for the examinee who guesses, where m is the number of multiple choice alternatives. What is the probability that an examinee knew the answer W a question, given tat he has corey anemred i? PCenow answers, Conect ) a PC Know answer | Correct) = Preonect ) ~~ 2 P+ UD, 3 Gaussian Bayes Classifiers (19 points) (a) (2 points) Suppose you have the following training set with one real-valued input X and a categorical output Y that has two values TY] OTA 2 [A 3 |B 4[B 5 |B 6(B 7([B You must learn the Maximum Likelihood Gaussian Bayes Classifier from this data. Write your answers in this table: m= | a= | PY=A= Be, oo ob = BeleDelw 9 PUB) 87, Ss I considered asking you to compute p(X = 2|¥ = A) using the parameters you had learned. But I decided that was too fiddly. So in the remainder of the question you can give your answers in terms of a and 3, where’ a = p(X =2/¥ =A) B = p(X =2¥=B) (b) (2 points) What is p(X =2AY = A) (answer in terms of a)? > 9X22} yzk)P(7=A) = (©) (2 points) What is p(X =2AY = B) (answer in terms of 8)? = P(x22]72B)P(1=8) = 4B (@)_ (2 points) What is p(X = 2) (answer in terms of a and 8)? 4 (2+ 58) (e). (2 points) What is P(Y = A|X = 2) (answer in terms of a and )? Se R@chahse) 2) tay cei ier ~FOrzY S(zxtS8) — 2at5h (h) (2 points) Finally, consider the following figure. If you trained a new Bayes Classifier on this data, what class would be predicted for the query location? AA A aA B A a Rall ches howe Same Conesid , aa and oe . sk larger vi OAL Ur ead wronsunr , bale Wee calcak foot ig tee As clam prem io mack Liquely So we. predid 6 Naive Bayes (15 points) Consider a Naive Bayes problem with three features, 2: ...r3. Imagine that we have seen a total of 12 training examples, 6 positive (with y = 1) and 6 negative (with y = 0). Here are the actual points of if a 1| of o 1] 1 1] 0 o} of a ofo pops] o} if o rf apa o} of o o} 1] 0 r| ofa Here is a table with the summary counts: v=0]y=1] maT] 3,3 m-1{ 3] 3 n=1] 3] 3 1, What are the values of the parameters R,(1,0) and Rj(1,1) for each of the features i (using the Laplace correction)? All the parameters are (3+ 1)/(6+ 2) = 0.5 ‘ou seo the data point 1, 1,1 and use the parameters you found above, what output would Naive Bayes prediet? Explain how you got the result, ‘The prediction is arbitrary since S(0)=S(1) = 1/8 3. Naive Bayes doesn’t work very well on this data, explain why. The basic assumption of NB is that the features are independent, give the class. In this data sel, features 1 and 3 are definitely not independent; the values of these features are opposite for class 0 and equal for class 1. All the information is in these correlations, each feature independently says nothing about the class, so NB is not really applicable. Note that a decision tree would not have any problem with this data set

You might also like