Notes On Probability: 1 Basic Notions
Notes On Probability: 1 Basic Notions
Basic Notions
Probability is a mathematical tool that we use in inductive inferences. As we know, an inductive argument is one such that the truth of the premises does not guarantee the truth of the conclusion. Rather, an inductive argument makes the conclusion more or less likely. The science of quantifying likelihood is called probability theory. A probability is a number between 0 and 1. When all the possibilities are equally likely (this condition is important), a probability is just the ratio of the number of desired outcomes to the number of possible outcomes. For an event E, we denote the probability of the event by P (E). Example 1 The probability of rolling a 6 with a fair die is: P (rolling a 6) = 1 number of 6 on a die = = 0.167 number of faces 6
Example 2 The probability of rolling an even number on a fair die is: P (rolling a even number) = number of even-number faces 3 = = 0.5 number of faces 6
Example 3 Suppose there is 5 choices of answer for a question in your exam. What is the probability that you choose the right answer by chance? P (right answer) = number of right answer 1 = = 0.2 number of answers 5
Example 4 What is the probability of picking a jack in a normal 52-card deck? P (jack) = number of jack 4 = = 0.077 number of cards 52
We can use this method to x the probability of simple events. Now, we can nd the probability that one of these events happen twice in a row, or the probability that two dierent events happen together. In this case, we nd the number of possible outcomes by multiplying the number of possible outcomes of each of them. 1
Example 5 How many dierent outcomes can we obtain if we roll two dice? 6 6 = 36 Example 6 How many dierent outcomes can we obtain if we ip a coin three times? 222=8 Example 7 How many dierent outcomes can we obtain if we ip a coin and then roll a die? 2 6 = 12 We can use this counting method to nd the probability of complex events. Consider these examples. Example 8 What is the probability of obtaining two sixes when rolling two fair dice? P (having two sixes) = number of pair of sixes 1 = = 0.027 number of outcomes 36
Example 9 What is the probability of ipping two heads if we ip a coin twice? P (two heads) = number of pair of heads 1 = = 0.25 number of outcomes 4
Example 10 Now, we consider a more dicult problem. What is the probability, when rolling a die twice, that the sum will equal 5? There are four ways of obtaining numbers summing up to 5: 4 and 1, 3 and 2, 2 and 3, 1 and 4. Note that 2 and 3 and 3 and 2 are dierent, since the rst means that we obtained 2 on the rst roll, while the second means that we obtained 3 on the rst roll. We thus obtain: P (sum of 5) = number of outcomes summing to 5 4 = = 0.111 number of outcomes 36
We have seen how to count the number of possible outcomes for an event. Sometimes, the order of the outcomes is important, but sometimes it is not. When the order is important, we want to be able to nd the number of ways to rearrange a certain number of items. For example, if we have the three letters a, b and c, we can rearrange them in 6 ways: abc, acb, bac, bca, cba, cab The mathematical process that is involved here is sampling without replacement. We start with three possible letters and we choose one. When we choose the second letter, there is only two choices left. And when we choose the third 2
letter, there is only one choice left. To nd the number of rearrangements of 3 objects, we thus multiply 321=6 Likewise, for four objects, we would have 4 3 2 1 = 24. Thus, there is 24 possible rearrangements of four objects. There is a general formula that gives us the number of ways to rearrange n items. The function that we use is called the factorial, denotes n!, and it is dened as follows: n! = n (n 1) (n 2) . . . 3 2 1 Note that, by convention, 0! = 1 Example 11 In how many ways can we rearrange the ten digits, using them only once? The number of ways to rearrange them is: 10! = 10 9 8 7 6 5 4 3 2 1 = 3, 628, 800 Example 12 How many signals can we send with ags of 6 dierent colors? (We assume that we use each color only once.) 6! = 6 5 4 3 2 1 = 720 The number of rearrangements of items is also called the number of permutations. On the other hand, when order does not matter, we count the number of combinations. Consider our permutations of the letters a, b, c: abc, acb, bac, bca, cba, cab When the order does not matter, all of them are equivalent. Thus, there is only one possible combination of those three letters. Now, we would like to nd a general way to count the number of permutations and combinations. Suppose we have four ags of dierent colors (G, R, B, W ). We may want to ask how many two-ag messages we can send (we use each color only once). Obviously, the number will be dierent whether the order of the colors matter or not. Let us make a list: Permutations: GR, RG, GB, BG, GW, W G, RB, BR, RW, W R, BW, W B Combinations: GR, GB, GW, RB, RW, BW As we see, there is 12 permutations and 6 combinations. Note: since we obtain the combinations from the permutations by ignoring order, the number of combinations is lower than the number of permutations. To nd the number of permutations of 2 items out of 4, we simply multiply 4 3 = 12.
Example 13 In how many ways can we pick 4 items out of 6? 6 5 4 3 = 360 Note that this question is equivalent to: In how many ways can we permute 4 items out of 6? Note also that 6543= 6! 654321 = 21 2!
n! (n r)!
Here, n Pr means that we pick r elements out of n. The P stands for permutations. Example 14 In how many ways can we rearrange 6 ags out of 10 ags of dierent color? (Assume we use each color just once)
10 P6
As in the last examples, the numbers are often very big. This is why it is important to understand how to use the general formula. Now, as we have seen, the number of combinations is lower than the number of permutations, since we ignore the order. To get the number of combinations, we simply have to divide the number of permutations by the possible number of arrangements of the items we have chosen. For r items, this number corresponds to r!. Example 15 In how many ways can we pick 4 items out of 6, if the order does not matter? 6543 = 15 4! Note also that 6543 6! = 4! 4!2! We can now understand the general formula for counting permutations:
n Cr
n! r!(n r)!
Here, n Cr means that we combine r elements out of n. The C stands for combinations. With these formula, we are already able to compute the probability of more complex events, such as winning the Lotto 6/49 or having four of a kind in a poker hand. 4
Example 16 What are the chances of winning Lotto 6/49? P (win) = # winning combination = # possible combinations 1 = 49 C6 1
49! 6!43!
It is approximately one chance over 14,000,000. Example 17 What are the chances of having four of a kind in a poker hand? P (4 of a kind) = 13 48 624 # four of a kind = = = 0.00024 # possible hands C5 2, 598, 960 52
Note that it is essential to mutiply by 48, since four queens and a 3 of spade is not the same as four queens and a 4 of diamond. As you see, the probability is very low! Example 18 Suppose we ip a coin ve times. What is the probability of having 4 heads? To nd the answer, we need the number of ways to have 4 heads in 5 ips (which is 5 C4 ) and the number of possible outcomes for ve ips (which is 25 = 32): P (4 heads) = 5 # of ways to have 4 heads 5 C4 = 5 = = 0.156 # of possible outcomes 2 32
A joint occurrence is one in which many events happen together. If we want to nd the probability of joint occurrences of events, we may simply multiply their probability, given that they are independent. By independent, we mean that they are not inuencing one another. Example 19 What is the probability of obtaining two sixes when rolling two fair dice? P (having two sixes) = P (rolling 6) P (rolling 6) = 1 1 1 = = 0.027 6 6 36
In general, if two events denoted by A and B are independent events, we may nd their joint probability by means of this formula: P (A&B) = P (A) P (B) We are often interested in nding the probability of another sort of complex event, namely the probability of alternative occurrences. It happens when we want to know the probability that something, or another thing, happens. Example 20 What is the probability of rolling a 1 or a 2 on a fair die? P (1 2) = P (1) + P (2) = 1 = 0.333 3
We may simply add up the probability of 1 and 2, since they are independent. 5
Let us explain independence in terms of mutual exclusion. We say that two events are mutually exclusive when they cannot both occur at the same time. In our last example, rolling a 1 and rolling a 2 are independent, because we cannot roll a 1 and a 2 on the same roll. Generally speaking, if we have two mutually exclusive events A and B, we can nd their alternative probability by means of the formula P (A B) = P (A) + P (B) Example 21 An urn contains 20 red balls, and 10 blue balls. What is the probability that, when we pick two balls without replacement, we obtain exactly one red and one blue? Obviously, there are two ways of obtaining this outcome, i.e., R&B and B&R. The order is important, since we pick balls without replacement. Now, these two ways of obtaining exactly one red and one blue balls are just alternative events. Thus, P (1 red & 1 blue) = P (R&B) + P (B&R) = 20 10 10 20 + = 0.230 30 29 30 29
Now, consider two events, such as drawing a diamond from a deck, and drawing an ace. Clearly, we cannot nd their alternative occurrence by simply adding them together, since we would count the probability of drawing the as of diamond twice. Accordingly, when two events A and B are not mutually exclusive, we nd their alternative probability by means of the formula: P (A B) = P (A) + P (B) P (A&B) A special case of probability of alternative and joint occurrences is that in which the complex event we are interested with contains phrases such as at least or at most. In this case, we almost always want to use the formula P (E happens) = 1 P (E does not happen) Thus, instead of nd the probability that something happens, we nd the probability that it does not happen; when subtracted from one, they are equal. Example 22 What is the probability of having at least one tail if we ip a fair coin ve times? 1 P (at least one tail) = 1 P (5 heads) = 1 ( )5 = 0.969 2 Example 23 What is the probability of rolling at least one six with two fair dice? 5 P (at least one 6) = 1 P (zero 6) = 1 ( )2 = 0.306 6
Conditional Probability
Conditional probability is very important for cases in which events are not independent. In such cases, the probability of an event depends on the occurrence of some other events. This is why, in such cases, we talk about the probability of an event, given the occurrence of some other event. The mathematical symbol that we use to express given is |. So, the probability of having rolled a 6 on a fair die, given that we have rolled an even number, is expressed as follows: P (rolling a 6 | rolling an even) In general, for two events A and B, we say that the probability of A given B is P (A | B). Note that if A and B are independent, asking what is the probability of A and asking what is the probability of A given B will be the same (i.e., if A and B are independent, P (A) = P (A | B). But in most cases, A and B are not independent. In this case, we nd the conditional probability by means of the following formulas: P (A&B) P (A | B) = P (B) Example 24 An urn contain 100 balls. 20 of them are red. What is the probability of picking two red balls, if we pick without replacement? P (2 reds) = P (red on rst pick) P (red on second pick | red on rst pick) 20 19 19 = = = 0.038 100 99 495 Example 25 An urn contain 100 balls. 20 of them are red. What is the probability of picking two red balls, if we pick with replacement? P (2 reds) = P (red) P (red) = ( 20 2 ) = 0.04 100
Example 26 What is the probability of drawing two queens in a normal 52-card deck a) with replacement and b) without replacement? With replacement: P (two queens) = ( Without replacement: P (two queens) = P (Q on 1st draw) P (Q on 2nd draw | Q on 1st draw) 4 3 = = 0.004 52 51 4 2 ) = 0.006 52
Binomial Distribution
The binomial distribution is a mathematical construction that allows us to nd the probability that an event will occur several times. Let us try to see how this construction works. As we have seen, the probability P (E) of an event E is always between 0 and 1. In cases where we apply the binomial distribution, dierent occurences of event E are always independent. Let us consider a few examples. Example 27 There is 100 balls in an urn. 40 of them are blue. What is the probability of drawing exactly 3 blue balls in 5 picks? We need to have 3 blue balls and 2 balls of a dierent color. Also, we must consider that there are many dierent ways to obtain 3 blue balls and 2 balls of a dierent color. Actually, we may verify that this number is the number of combinations 5 C3 = 10. The probability of picking exactly 3 blue balls in 5 picks is thus: P (3 blue in 5 picks =
5
C3 (
In general, when the probability of an event is p, the probability of failure is 1 p and the probability of r successes in n trials is
n Cr
pr (1 p)nr .
Sampling
In the last example, we knew the probability of picking a blue ball. From our knowledge of this number, we asked: What is the probability of drawing exactly 3 blue balls in 5 picks? With sampling problems, the situation is reversed. For example, we would start with the fact (or data) that we have picked exactly three blue balls out of 5 picks, and we would ask: What is the probability of picking a blue ball in this urn? Clearly, our answer will contain some uncertainty. Let us call the items that we look at samples and let us call their number the sample size. As a matter of fact, when our sample size is small, our uncertainty is large. When our sample size is large, our uncertainty is small. Probability theory is the science that allows us to understand our degree of uncertainty precisely. To understand how we quantify our uncertainty, let us consider an example. Suppose we have an urn with 1000 balls. We know that they are either red or blue. We want to know what is the real probability of picking a red ball (or, alternatively, what is the real proportion of red balls). In sampling problems, we always start with a set of data. In this example, we decide to pick 100 balls in order to obtain data. We happen to have 40 red balls, and 60 blue balls. On the basis of the data obtained, what is our best guess regarding the number of red balls in the urn? Well, since 40% of the balls sampled are red, our best guess is that 400 out of 1000 balls are red. But can we be sure, on the basis of this sample, that there is exactly 400 red balls in the urn? No. It is an inductive inference and, as such, the truth of the conclusion is not guaranteed. Thus, even if 400 is our best guess, 399 would not be a bad guess either. Now, the question is: how far from 400 can we go, and still be making a good guess? The answer will depend on the condence interval. Before giving a denite answer, let us generalize our reasoning. In the last paragraph, we have asked what number is our best guess. But more generally, we may ask how condent we can be that the real proportion is a given number, plus or minus another number? We could ask, for example, how condent we are that the number of red balls in the urn is between 350 and 450; that can be mathematically written as follows: P (350 < # red < 450) The interval going from 350 to 450 is called the margin of error ; it can be written 400 50. Given a margin of error, we nd the probability that the real proportion is in the margin of error by simply adding up the alternative probability. In our example, we would add the probability that there is 350 red balls in the urn, given that we sampled 40 reds out of a hundred, plus the probability that there is 351 red balls in the urn, given that we sampled 40 reds out of a hundred, plus the probability that there is 352 red balls in the urn, given that we sampled 40 reds out of a hundred, and so on, until we reach 450. Obviously, it is very long to calculate. To make our lives a little easier, we can use charts that, basically, contain the answers that we are looking for.
However, before using the chart, we must also understand the notion of condence interval. Consider this expression: P (350 < # red < 450) = 0.95 The meaning of such an expression is as follows: We can arm with a certainty of 95% that the number of red ball in the urn will be between 350 and 450, given that we have sample 40 red balls out of a hundred.1 We can have an intuitive idea of what it means by having a llok at gure 2. The grey area contains 95% of the total area under the curve. In our example, the lower limit would be 350, and the upper limit 450. Thus, when we say that the number of red balls in the urn is between 350 and 450, 19 times out of 20, we are really just saying the ratio of they grey area under the curve to the white area under the curve is 19:1. The chart we will use for our exercises is the following: Condence Level 0.67 0.95 0.99 .15 .3 .45 .05 .1 .15 .015 .03 .045
Sample Size
10 100 1000
Figure 3: Condence Interval Chart In relation to sampling, the following notions are important:
1 Note
that 0.95 is not the correct number here. I use it only for the sake of familiarity.
10
Gamblers fallacy Simpsons paradox Regression fallacy They are both explained in details in the notes and in Kenyon. Be sure to understand them carefully.
12. How must sample size change to increase the level of condence for a xed margin of error? (Use Fig. 3) The best way to see the answer from the chart is to nd two entries with the same number. We have 0.15 that appears twice. If our margin of error is 0.15, and our sample size is 10, then our level of condence is 0.67. Moreover, if our margin of error if 0.15, and our sample size is 100, then our level of condence is 0.99. Thus, to increase our level of condence, the sample size must increase. 13. What is the probability of randomly drawing a king or a queen or a diamond on a single draw from a standard deck of 52 playing cards? Since the word or is use, we know that we have to nd the probability of alternative events. The probability of the alternative three events will thus be the sum of their individual probability. However, we know that the three events are not mutually exclusive, since there is a king and a queen of diamond. We will thus have to subtract the probability of the king of diamond and the queen of diamond, since their probability will have been included twice in the sum. Thus: P (K Q ) = P (K) + P (Q) + P () P (K&) P (Q&) 4 4 13 1 1 + + = 0.365 = 52 52 52 52 52 14. A jar contains 80 red marbles and 120 blue marbles. What are the chances of drawing no less than 2 red marbles on 10 random draws with replacement? Since the marbles are drawn with replacement, we know that the many drawing are independent from one another. We can thus use the binomial formula. Also, since the problem contains the phrase no less than, we know that it will be simply to solve if we nd the probability that the desired event does not happen, and subtract it from 1. That will yield the same answer, but
11
with simpler calculations. Thus, P (no less than 2 reds in 10 draws) is equal to = 1 P (less than 2 reds in 10 draws) = 1 P (0 or 1 red in 10 draws) = 1 P (0 red in 10 draws) + P ( 1 reds in 10 draws) =1 80 0 80 10 ) (1 ) 120 120 1 2 1 = 1 ( )1 0 10 ( )9 = 0.9996 3 3 3
10
C0 (
10
C1 (
80 1 80 9 ) (1 ) 120 120
15 a). Using the chart (Fig 3), what sample size is required to be 67% certain that the observed frequency is within 5% (plus or minus) of the actual frequency in the population? The answer is 100. To nd it, we simply identify the row that corresponds to 0.05 in the column corresponding to a condence interval of 0.67. 15 b). Use the chart (Fig 3). Assuming there are 30,000 students at UWO, and 1,000 are surveyed at random, how condent can we be that at least 20,100 are from Ontario if 70% of those surveyed said they were from Ontario? We have a sample size equal to 1,000. Now, note that 20,100 = 0.67. Since 30,000 .70 0.67 = 0.03, our margin of error is 0.03. By using the chart, we thus know that our condence interval is 0.95.
Example 28 An urn contains 100 balls, 75 of which are red, 15 blue, 5 white, and 5 green. What is the probability of randomly drawing at least one red ball if two balls are chosen without replacement. P (#Red 1) = 1 P (R = 0) = 1 25 24 31 = 0.939 = 93.9% 100 99 33
Example 29 Suppose that a chessplayer has a 90% chance of winning any game in a competition. a) What is the probability that (s)he wins exactly 3 games out of a total of 5 games played? P (#W in = 3) = nCrpr (1p)nr = 5! (0.90)3 (0.10)2 = 0.0729 = 7.29% 3!(5 3)!
b) Now, suppose that their 5-game match end as soon as a player win three games. What is the probability that (s)he wins the match (i.e. 3 games) in exactly 5 gaves? P (win in 5) = P (win 2 out of 4 & win the fth) 4! = (0.9)2 (0.10)2 0.9 = 0.04374 4.4% 2!(4 2)! 12
c) What is the probability that (s)he wins in exactly three games? P (3 out of 3) = 0.93 = 0.729 = 72.9% Example 30 How many dices do you have to roll to have at least a 45% chance of getting one or more 6? For 1 die: 1 P (#6 1) = P (#6 = 1) = = 16.7% 6 For 2 dices: 55 P (#6 1) = 1 P (#6 = 0) = 1 0.3056 30.6% 66 For 3 dices: 5 P (#6 1) = 1 P (#6 = 0) = 1 ( )3 0.421 42.1% 6 For 4 dices: 5 P (#6 1) = 1 P (#6 = 0) = 1 ( )4 0.518 51.8% 6 Thus, the answer is 4. Example 31 Suppose that in our tutorial section there is 40 students. 20 are fans of the Maple Leafs, 8 are fans of the Canadians, 10 dont care at all, and 2 is fan of the Nordiques. All of us, of course, are decent enough not to be fans of the Senators. However, 1 person is fan both of the Nordiques and the Maple Leafs (anything but the Canadians!). What is the probability that a randomly selected persons (in our section) is fan of a team with blue jerseys? The teams with blue jerseys (that have some fans in our section) are the Maple Leafs and the Nordiques). So, P (M N ) = P (M ) + P (N ) P (M &N ) = 20 2 1 21 + = = 0.525 = 52.5% 40 40 40 40 Example 32 Consider the following chart for margin of error: Suppose a ranCondence Level 0.67 0.95 0.99 .15 .3 .45 .05 .1 .15 .015 .03 .045
Sample Size
10 100 1000
dom sample of 100 students is taken and 60 prefer to drink beer over water. a) What is the expected percent of student who prefer to drink beer? 60% b) How condent can we be that more than 50% of the students prefer to drink beer over water? 0.95 *Appendix: Bayes Theorem and Applications P (B)(B|A) P (A|B) = P (A)P (B|A)+P (A)P (B|A) 13