0% found this document useful (0 votes)
654 views86 pages

Elementary Statistics 3

This document provides an introduction to descriptive statistics and key terminology. It discusses how statistics can be used to justify opposing views and how statistics makes probabilistic rather than deterministic statements. Descriptive statistics involves collecting and summarizing data through tables, graphs, and computations of averages. Key terms introduced include population, sample, variable, parameter, statistic, continuous variable, discrete variable, raw data, and frequency distribution.

Uploaded by

alecksander2005
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
654 views86 pages

Elementary Statistics 3

This document provides an introduction to descriptive statistics and key terminology. It discusses how statistics can be used to justify opposing views and how statistics makes probabilistic rather than deterministic statements. Descriptive statistics involves collecting and summarizing data through tables, graphs, and computations of averages. Key terms introduced include population, sample, variable, parameter, statistic, continuous variable, discrete variable, raw data, and frequency distribution.

Uploaded by

alecksander2005
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

Lesson 1: The Language and Terminology Introduction 1.

3 Pictorial Representation of Data Introduction Most people think of statistics as the study of the numerical features of a subject/population. It means the same to statisticians, but also emphasizes the methods of collecting data, summarizing and presenting data, and drawing inferences from data. We all see on TV how political pundits justify opposing points of view by presenting statistics from respectable sources. How could something be a science when it justifies two opposing points of view? The answer is that statistics has a scientific basis but it can be misrepresented in use. Example. During the saga of President Clinton's impeachment, we observed the following: 1.1 Definitions & Concepts Homework 1 - 3 1.2 Frequency Distribution

1. One pundit says that, according to statistics, the majority of Americans think that character matters. 2. The other pundit says, also according to statistics, that the majority of Americans think the president is doing a
good job. The implication here is that one of them was "wrong." But the science of statistics says that both were correct. Data was collected and analyzed, and it was found that the majority of Americans think that character matters and that the majority of Americans think the president is doing a good job. It does not matter to the science of statistics which one of the statistically established facts you or I want to believe. Another point about the nature of statistics as a science is that it is not a deterministic science. It does not have laws like force is equal to mass times acceleration. Statements in statistics come with a probability (i.e., quantified chance) of being correct. When a weatherman says that it will rain today he means that there is, say, a ninety five percent chance that it will rain today. Roughly, this means that if he makes the same prediction one hundred times he will be correct 95 times, and it will not rain the other 5 days. The problem is that sometimes a weatherman will hide the information that there is a 95 percent chance only. Such information hiding is sometimes done for simplicity. Before I conclude this introduction, let me tell you an interesting anecdote about the development of this subject. When the proposal to establish the Indian Statistical Institute in Calcutta was considered by the government of India in the early part of the last century, some critics said, then why not an institute in astrology? At the inception of statistics as a science there was a lot of skepticism about its scientific validity. Those days are gone, and statistics is not likened to astrology any more! Statistics is a well-founded and precise science. It is a nondeterministic science in nature; it makes precise probabilistic statements only. In this course we will be talking about two branches of statistics. The first one is called descriptive statistics and deals with methods of processing, summarizing, and presenting data. The other part deals with the scientific methods of drawing inferences and forecasting from the data, and is called inferential or inductive statistics. In the rest of this lesson and the next we deal with descriptive statistics, which includes the presentation of data in the form of tables, graphs, and computations of various averages of data.

1.1 Basic Definitions and Concepts In statistics we use a small representative "sample" to study a big "population." The reason for this is the cost or even the impossibility of studying the whole population.

Population and Sample Definitions. A complete collection of data on the group under study is called the population or the universe. A member of the population is called a sampling unit. Therefore, the population consists of all its sampling units. A Sample is a collection of sampling units selected from the population.

Most often, we will work with numerical characteristics (like height, weight, and salary) of a group. So usually the population is a large collection of numbers and the sample is a small subset of the population. Example. Suppose we are studying the daily rainfall in Lawrence. Since daily rainfall could be from 0 inches to anything above 0, the population here is all nonnegative numbers (i.e., the interval [0, )). A sample from this population would be the observed amount of daily rainfall in Lawrence on some number of days. A sample of size 11 would be the observed daily rainfall in Lawrence on 11 days.

Variables Many definitions of variables are available in standard textbooks. For our purpose the following definition will suffice. Definition. A variable is a rule or a formula or a mechanism that associates a value with each member of the population. So, given a member w, a variable X assigns a valueX(w) to w. For us X(w) will be a characteristic (like height, weight, time, salary) of the population. Example. Suppose we are studying the KU student population. The population is the whole collection of KU students. A KU student is a sample unit. If GPA is the "characteristic" that we are studying, then X = the GPA of a student is a variable. So, given a student, X has a value. For example: X(Donald Smith) = 3.25, X(Karen Currie) = 3.89, X(Sam Donaldson) = 3.11, X(King Who) = 2.13

On the other hand, if GENDER is the "characteristic" that we are studying, then Y = gender of a student is a variable. So, given a student, Y has a value. For example: Y(Donald Smith) = Male, Y(Karen Currie) = Female , Y(Sam Donaldson) = Male, Y(King Who) = Male

If HEIGHT is the characteristic that we are studying, then Z = height of students is a variable. To give another example, if credit hours completed is the characteristic studied, T = the number of course credit hours completed so far by a student is a variable. Similarly, given any other characteristic like weight, annual income, annual expenditure, you can construct a variable for this population. A variable that takes numerical values is called a quantitative variable. So, the variables X, Z, and T above are quantitative variables, while Y is not. A variable that takes non-numerical values is called a qualitative variable. So, the variable Y above is a qualitative variable. We will mostly be concerned with quantitative variables. We discuss two types of quantitative variables: continuous and discrete variables. A quantitative variable that can assume any numerical value over an interval is called acontinuous variable. Since Z above can (hypothetically) assume any value between 0 to 100 inches, Z is a continuous variable. T assumes only integer values and is therefore not a continuous variable. A different way to understand a discrete variable is that the possible values of the variable can be written down (or can be counted) in a (finite or infinite) list. We say that the values of a discrete variable are countable. A quantitative variable is called a discrete variable if its possible values consist of breaks between successive values. If a variable assumes only a finite number of values, then it is also called a finite variable. Otherwise the variable is called an infinite variable. A finite variable is definitely a discrete variable. The variable T above is a discrete variable.

Examples of Continuous and Discrete Variables 1. The examples of continuous variables are weight, length, volume, area, and time. 2. For this course, examples of discrete variables are always the number of somethingnumber of typos, number of road accidents, number of phone calls.

Parameters and Statistics Definition 1. Given a set of data, any numerical value computed from the data using a formula or a rule is called a quantitative measure of the data. Definition 2. A quantitative measure of a population data is called a parameter. In other words, parameters belong to the whole population and are computed (if feasible) from the WHOLE population data. Examples: the average GPA of all KU students, the height of the tallest student in KU, the average income of the entire KU student population. One way to study a population is to know some of the parameters of the population. Unfortunately, computing such parameters could be expensive or even impossible. Essentially, parameters are unknown and the main game of statistics is to try to estimate parameters on the basis of small samples collected from the population. Definition 3. A quantitative measure of a sample data is called a statistic. So, any constant that we compute from a sample is a statistic. We use these statistics to estimate the parameters of the population. For example, the average height computed from a sample is a reasonable estimate for the (parameter) average height of the KU student population. Obviously, we do not expect the value of the statistic to be exactly equal to the parameter value. Hopefully, the error will be small or will exceed our tolerable limit very rarely (say once in a 100 trials). Why do we need a statistic? Sometimes it will be impossible to know the actual value of a parameter. For example, let be the mean length of the life of light bulbs produced by a company. In this case, the company cannot test all the bulbs it produces to find a mean length. So, the best it can do is to test a few bulbs, compute the sample mean length (a statistic) of the life of these bulbs and use it as an estimate for the mean length (parameter ) of the life for all the bulbs it produces. Definition 4. The data that has not been processed or organized in any form is called raw data. When the data is arranged in an increasing or decreasing order, then it is called an array. The range of the data is the difference between the largest and the smallest value of the data. range = highest value - lowest value. 1.2 Frequency Distribution In this section we talk about representation of data organized in tabular form. Such a representation is called a frequency distribution. We are mostly concerned with numerical data (i.e., quantititative data), but also consider some non-numerical data (i.e., qualitative data). Example. (from Khazanie, p. 18) The following is data on the blood group of 36 patients in a hospital: O A B O A B A A O O

O A O A B A A

O O O AB A O AB O O

O O A B A

O A A

O A

We have four types of blood groups, namely, O, A, B, AB. Each of these blood groups may be referred to as a "class." The frequency of a class is defined as the number of data members that belong to that class. For example, the frequency of the class O is 16; the frequency of class A is 14. A table that lists the classes and the corresponding frequency is called the frequency distribution of this qualitative data. Following is the frequency distribution of this data: Blood Group Frequency O A B 16 14 4

AB Total

2 36

Ungrouped Data For the quantitative data, we consider two types of frequency table. When we are working with a large set of data we group that data into a few classes and construct a "frequency table," which we will discuss later. If the data set is small or if the number of values that appear in the data is small we need not group the data. Instead, we make a list of all the data members and give the corresponding frequency for each data member in a table. The number of times a data member (i.e., value) appears in the data is called the frequency of the data member. A list that presents the data members and the corresponding frequency in a tabular form is called a frequency table orfrequency distribution. The relative frequency and percentage frequency of a data member x are defined as follows:

frequency of x relative frequency of x = total # of data points and frequency of x percentage frequency of x = total # of data points The frequency table may also contain the relative and percentage frequency. Since we did not group the data into a few classes, we call this the frequency distribution of the ungrouped data. Example 1.2.1 To estimate the mean time taken to complete a three-mile drive by a race car, the race car did several time trials, and the following sample of times taken (in seconds) to complete the laps was collected: 50 51 55 48 53 48 49 50 51 46 49 50 54 48 52 53 54 49 52 53 51 51 51 53 47 52 55 56 54 54 52 54 50 51 53 100.

Note that there are 35 observations here. So we say that the size of the sample (or data) is 35. Also the values present are 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56. Since there are only 11 distinct values present we can make a frequency table for the ungrouped data. The following is the frequency distribution of this ungrouped data: Time Relative Frequency (in seconds) Frequency 46 47 48 49 50 51 52 53 54 1 1 3 3 4 6 4 5 5 1/35 1/35 3/35 3/35 4/35 6/35 4/35 5/35 5/35 Percentage Frequency 2.86 2.86 8.57 8.57 11.43 17.14 11.43 14.29 14.29

55 56 Total Grouped Data

2 1 35

2/35 1/35 1

5.71 2.86 100

When we are working with a large set of data that has too many distinct class member (i.e., values) then we group the whole set of data into a few class intervals and give the corresponding "frequency" of the class. When the data is presented in this way, the data is called grouped data. The number of data members that fall in a class interval is called the class frequency and the relative and percentage frequencies are computed by the same formula as above. A list that gives various class intervals and the corresponding class frequencies in a tabular form is called a class frequency table or class frequency distribution of the data. The frequency distribution may also include the relative and percentage frequencies. Grouped Data and Loss of Information Sometimes it is convenient or necessary to group data into class intervals and construct a class frequency distribution. This is the case when there are too many distinct numbers present in the datatoo many even to fit into a simple table on a page for presentation. In such situations, we group the data in a few class intervals. While class frequency distribution is very good for presentation and convenient for other reasons, we lose a lot of information in this process. There is no way we can recover the original data from the class frequency distribution. Given a set of data, a good question would be, How many class intervals should we have? The answer is that it should not be too few nor should it be too many. If we take too few (say one), then all the information will be lost. On the other hand, if we take too many, we will have the problem of having to work with ungrouped data. (In this course we will always tell you how many classes to take.) Although sometimes it may be necessary to take class intervals of varying width, in this course we only consider classes of equal class width. Steps to Construct Frequency Distribution

1. Range: Pick a suitable number L less than or equal to the smallest value present in the data. Pick a suitable
number H greater than or equal to the highest value present in the data. The range R that we consider is R = H L.

2. Number of Classes: Decide on a suitable number of classes. (In this course we will tell you the number of
classes.)

3. Class Width: We have


R class width = w = Number of classes

We will pick L, H, and the number of classes so that class width is a "round number."

4. Classes: We divide our interval [L:H] into subintervals, to be called classes, as


[L,L+w],[L+w,L+2w],[L+2w, L+3w], ...,[H-w,H] Since this definition creates an ambiguous situation in which a data value may fall into two classes, we need a convention to address this situation.

5. Frequency: Find the frequency for each of the classes. You can use an advanced calculator or some software
(like Excel) to count frequencies. A few more important definitions. The above intervals are called class intervals. The w above is called the class size or width. The lower end of the class is called lower limit and the upper end of the class is called upper limit. The class mark is the midpoint of the class, defined as follows:

lower limit of class+ upper limit of class class mark = 2 A class limit is also called a class boundary. I took a slightly different approach when I defined the classes, so that for us class limits and class boundaries are the same. Although all the approaches are essentially the same, many slightly different approaches are possible depending on the situation. Example 1.2.2 The following is the weight (in ounces), at birth, of a certain number of babies. 74 65 74 93 95 143 133 97 143 105 135 123 133 118 125 127 127 103 124 123 124 128 126 120 138 120 92 110 129 124 96 94 147 122 110 124 119 72 134 126 127 138 110 107 150 137 121 78 124 121 72 113 111 86 96 117 138 125 117 119 100 126 121 110 96 106 127 124 89 115 132 98 120 107 130 62 93 81 110 120 74 115 80 97 127 135 113 135 108 85 140 91 145 92 156 91 141 148 99 .

We will construct a class frequency table of this data by dividing the whole range of data into class intervals. Solution: Note that the lowest value is 62 and the highest value is 156. We take L = 60, H = 160, so R = H-W = 100. We made such a choice of L and H, precisely so that R = 100 is a "nice" number. Now we decide to have 5 class intervals and so w = R/5 = 20. According to what I said above, our classes should be : [60, 80], [80,100], [100,120], [120,140], [140, 160]. But if we do so then there is a risk that some data members (like 80, 100, 120, 140) will fall in two classes. One way to avoid this is to add .5 to all the class boundaries. So, our classes are [60.5, 80.5], [80.5, 100.5], [100.5, 120.5], [120.5, 140.5], [140.5, 160.5]. So the frequency distribution is as follows: Classes 60.5 - 80.5 80.5 - 100.5 100.5 - 120.5 120.5 - 140.5 140.5 - 160.5 Total Frequency 9 20 25 37 8 99 Relative Frequency 9/99 20/99 25/99 37/99 8/99 1 Percentage Frequency 9.09 20.20 25.26 37.38 8.08 100

1.3 Pictorial Representation of Data Another way to represent data is to use pictures and graphs. We see such pictorial representation in newspapers and other sources every day. Pictorial representation is particularly important when you have to represent data to people with limited technical background, like newspaper readers or a governmental or congressional body.

The Pie Chart The pie chart is a commonly used pictorial representation of data. When you do your tax return every year, you find a few pie charts in the instruction book for form 1040. These charts show what proportion/percentage of each tax dollar goes for particular expenses. I reproduced the following pie charts from the 1040 instruction book of 1999.

Pie charts are self explanatory; we do not need to discuss them further. The Histogram Among pictorial representations, the most useful in this course is the histogram. The histogram of data is the graphical representation of the frequency distribution of the data, where we plot the variable on the horizontal axis and above each class interval, we erect a bar of the height equal to the frequency of the class. Such a histogram is called a frequency histogram. If, instead, we erect bars of height equal to the relative frequency, then the graph is called a relative frequency histogram. Similarly, we can construct a percentage frequency histogram. The following is a histogram.

We have decided to avoid unequal class lengths, which makes our discussion of the histogram fairly simple. Remark. Take a look at the Stem and Leaf Diagram discussed in any textbook. Example 1.3.1. Following is the frequency table of data on height (in inches) of some babies at birth. Sketch the histogram of the following data: Height 16-17 17-18 18-19 19-20 20-21 21-22 The Cumulative Frequency Distributions For a given value x of a variable, the cumulative frequency of the data, for x, is the number of data members that are less than or equal to x. Definition. Given a frequency distribution of some data, for a class boundary x, the cumulative frequency is the sum of all the class frequenies less or equal to x. Thecumulative frequency distribution is a table that gives the cumulative frequencies against some x values (for us the class boundaries). We also define cumulative relative frequency and cumulative percentage frequency as follows: Frequency 3 8 34 60 72 18

cumulative frequency of x cumulative relative frequency of x = total # of data points cumulative frequency cumulative percentage frequency of x= total # of data points Example 1.3.2 Once again we consider the data on birth weight of babies in Example 1.2 that we discussed in the last section. A cumulative frequency distribution can be constructed from the frequency distribution. Solution: We have seen the frequency distribution before. The following is the cumulative distributions: 100

Weight 60.5 80.5 100.5 120.5 140.5 160.5

Cumulative Frequency 0 9 29 54 91 99

Relative-Cumulative Frequency 0 9/99 29/100 54/99 91/99 1

Cumulative Percentage Frequency 0 9.09 29.29 54.55 91.92 100

The Ogive Definition. The ogive is a line graph, where we plot the variable on the horizontal axis and the cumulative frequency on the vertical axis. If we plot the cumulative relative frequency on the vertical axis, then the line graph is called the relative frequency ogive.

Use of Calculators Because we will be using calculators (TI-83) extensively in this course, let me explain how you enter data in the TI-83. Use of Calculators (TI-83): Enter Your Data: 1. 2. 3. 4. Press the button "stat." Select "Edit" in the Edit menu and enter. You will find 6 lists named L1, L2, L3, L4, L5, L6. Let's say you want to enter your data in L1. If L1 has some data, you clear it by pressing the stat button and selecting ClrList in the Edit menu. ClrList appears then type L1 and hit enter. To type "L1" on your TI83 simply press 2nd then 1. 5. Once L1 is cleared, you select Edit in the Edit menu and enter. 6. Now type in your data; enter one by one.

It is not easy to construct a frequency table of a data set unless you are systematic. Traditionally, we used "tally marks" to count the frequency. Now you can use some software programs (e.g., Excel). Let me show you a method, using a calculator (TI-83). 1. 2. 3. 4. 5. 6. 7. Press "stat." To input data, enter "edit." Enter your data (say in L1). Press "stat." Enter "sortA" L1. Press "stat" and then enter "edit." On L1 you will see that the data is sorted in an increasing order. Now you can count the frequencies.

Problems on 1.2: Frequency Distribution

Exercise 1.2.1 To estimate the mean time taken to complete a three-mile drive by a race car, the race car did several time trials, and the following sample of times taken (in seconds) to complete the laps was collected: 50 51 55 48 53 48 49 50 51 46 49 50 54 48 52 53 54 49 52 53 51 51 51 53 47 52 55 56 54 54 52 54 50 51 53

The following is the frequency distribution of this ungrouped data: Time (in seconds) 46 47 48 49 50 51 52 53 54 55 56 Total Construct a histogram. Exercise 1.2.2. The following is the weight (in ounces), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000. 94 104 105 135 124 123 110 129 119 72 137 121 96 117 110 96 120 107 115 80 119 80 Frequency 1 1 3 3 4 6 4 5 5 2 1 35 Relative Frequency 1/35 1/35 3/35 3/35 4/35 6/35 4/35 5/35 5/35 2/35 1/35 1 Percentage Frequency 2.86 2.86 8.57 8.57 11.43 17.14 11.43 14.29 14.29 5.71 2.86 100

96 111 116 120 109 97 133

123 133 118 125 127 127 103

124 128 126 120 138 120 92

124 96 94 147 122 110 124

134 126 127 138 110 107 150

78 124 121 72 113 111 86

138 125 117 119 100 126 121

106 127 124 89 115 132 98

130 62 93 81 110 120

97 127 135 113 135 108

134 96 112 100 120 148

Construct a class frequency table of this data by dividing the the whole range of data into class intervals: [60.5-70.5], [70.5-80.5], [80.5-90.5], [90.5-100.5], [100.5-110.5], [110.5-120.5], [120.5-130.5], [130.5-140.5], [140.5-150.5] Solution Exercise 1.2.3. The following are the length (in inches), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000. 18 19 18 21 20.5 18 17.75 19.5 20.25 18.5 18.5 19 18.5 19 17 20.5 19 20 20 20.5 19 21.5 20 20.5 21 18 18 20 19.5 21.5 18.5 19.5 19.5 19 18.5 20 19 19 19.5 18 19 20 20.75 20 20 19 20 19.5 20 19.5 21 17 20 19.5 20 19 18.5 20 20 18 18 20 21 17.75 20 19.5 20 19.5 20 19 20 18 20 18.5 20 19 18.5 21 20 19 20.5 19.5 19.5 20.75 21 20.5 20 20.5 20.5 20 20 19 21 19 19.5 19

Construct a frequency table for this data by dividing the whole range into class intervals: [16-17], [17-18], [18-19], [19-20], [20-21], [21-22]. Note: If a data member falls on the boundary, count it in the right/upper class-interval. Solution Exercise 1.2.4. The following data represents the number of typos in a sample of 30 books published by some publisher. 156 159 156 160 162 159 160 159 158 160 162 156 162 159 159 160 156 156 162 162 156 160 162 158 162 162 162 158 158 160

Construct a frequency table (by sorting in your calculator). Also construct a histogram. Solution Exercise 1.2.5. Following is data on the hourly wages (paid only in whole dollars) in an industry. 9 11 8 9 10 11 7 10 12 13

7 13 9 11

11 13 7 9

8 14 12 9

11 12 7 9

14 9 12 10

9 8 7 14

10 12 7 11

9 14 11 12

11 15 13 14

7 9 9 7

Construct a frequency table (by sorting in your calculator). Also construct a histogram. Solution Exercise 1.2.6. Following is data on the hourly wages (paid only in whole dollars) of 99 employees in an industry. 7 7 9 9 12 15 15 15 16 16 11 8 13 7 13 15 16 15 15 16 7 11 12 11 7 16 15 16 15 15 11 11 14 9 9 16 15 15 16 16 10 14 7 12 10 15 16 16 15 16 9 9 8 9 14 16 16 15 15 16 10 7 7 12 11 11 17 15 19 13 10 9 14 11 12 7 16 17 8 12 12 11 15 14 13 18 16 16 16 8 13 7 9 9 7 19 13 12 17

Construct a frequency table (by sorting in your calculator). Lesson 2 : Measures of Central Tendency and Measures of Dispersion

Introduction

2.1 Measures of Central Tendency: Mean Homework 4 - 7

2.2 Measures of Central Tendency: Mean, Median, Mode

2.3 Measures of Dispersion Introduction

In this lesson we talk about two types of constants that we compute from data:

1. 2.

measures of central tendency and measures of dispersion.

A measure of central tendency represents an "average value." Mean, median, mode (if you already know these) are measures of central tendency. A measure of dispersion is a measure of how widely the data is scattered around. 2.1 Measure of Central Tendency: Mean The most common measure of central tendencies is the mean or arithmetic mean. Definition. The mean or the arithmetic mean of a set of data is given by mean = sum of all the data values .

size of the data

If we denote a data value (i.e., the variable) by x and if n is the size of the data, then the above formula is written as

x mean = x = n OR mean = x = x/n where denotes summation. .

If the data is a sample, then the mean is called the sample mean. Again, if x denotes the variable, the data is sometimes denoted by x1,x2, ... ,xn and then

mean =

n xi i=1 n

OR n xi/n i=1 If you have not seen the notation before, it simply means summation. For example, mean = x = n i=1 Weighted Mean Sometimes, different values in data carry different weight. Let us consider the following data and the corresponding frequency distribution that we computed earlier: Example 2.1.1 To estimate the mean time taken to complete a three-mile drive by a race car, the race car did several time trials. The following are sample times taken (in seconds) to complete the laps: 50 48 49 46 54 53 52 51 47 56 52 51 51 53 50 49 48 54 53 51 52 54 54 53 55 48 51 50 52 49 51 53 55 54 50 Following is the frequency distribution of this data: Time (in seconds) Frequency 46 1 47 1 48 3 49 3 50 4 51 6 52 4 53 5 54 5 55 2 56 1 xi = x1+x2+ ... +xn

Now we want to compute the mean time. So, we add all the data values and divide by the data size 35. We already have computed the frequency distribution which tells us that, in the data, 46 was present 1 time, 47 was present 1 time, 48 was present 3, times and so on. So, using the frequency distribution, we compute the mean as follows : mean=x= (46x1+47x1+48x3+49x3+50x4+51x6+52x4+53x5+54x5+55x2+56x1) (1+1+3+3+4+6+4+5+5+2+1) =1799/35=51.4

The mean of the original data is the weighted mean of the data values 46, 47, 48, 49, 50, 51, 52, 53, 54, 55 and 56 with the corresponding frequency as the weight. So, a new formula for the mean would be

= mean =

n xi fi i=1 n fi i=1

OR n mean =x= i=1 n i=1

fi xi / fi

where fi is the frequency of xi. The weighted mean is defined in more general context as follows: Definition. If x1, x2, ... , xn in a data set have different weights and the values xi has weight wi, then the weighted mean is defined as

weighted mean = n i=1 n wi i=1 OR weighted mean = x = wixi / wi wx i i .

Properties of the Mean

1.

Combining two means. Suppose we have two sets of data. The mean of the first set is x, and the size of the first set is m; the mean of the second set is y, and size of the second set is n. The mean of the combined data is Combined mean = (m x +ny)/(m+n) This is the weighted mean of x, y with weight m,n respectively.

2.

Effect of translation. Let x be the mean of x1, x2, ... , xn. Then the mean of y1 = x1+d, y2 = x2+d, ... ; yn = xn+d is given by y= x+d

3.

Effect of multiplication by a constant. Let x be the mean of x1, ... , xn. Then the mean of z1 = cx1, z2 = cx2, ... , zn = cxn is given by z = cx

Properties of Mean Remark (effect of translation): Your teacher tells you that the mean score for the midterm in your class is 73. After you complained and requested a change, he agreed that all can add 7 points to their score. The new mean score is (old mean + 7) = 73 + 7 = 80. This is what we meant by "effect of translation." Example (effect of multiplication by c): Suppose you have some data x1, x2, ..., xn on salaries in an industry in the United States and the mean is $37000. On a certain day, 1 U.S. dollar = 1.4729 Canadian dollars (say c = 1.4729). So, in Canadian dollars the mean is 37000*c = 37000 x 1.4729. Similarly, the change of units (inches to feet or cm) are "multiplication by a constant c." Example 2.1.2. A student took PHSX 115 (College Physics), PSYC 120 (Personality), FREN 110 (Elementary French), BUS 241 (Managerial Accounting), and MATH 365 (Elementary Statistics). The number of credit hours and the student's grade is given in the following table: Course Grade (Points) Credit Hours PHSX 115 B (3 points) 4 PSYC 120 A (4 points) 3 FREN 110 B (3 points) 5 BUS 241 C (2 points) 3 MATH 365 B (3 points) 3

What is the student's GPA? Solution. The GPA is the weighted average of the points (corresponding to the grades), weight being the course-credit hours. So, the GPA = (3x4+4x3+3x5+2x3+3x3)/(4+3+5+3+3) = 54/18 = 3.

2.2 Measure of Central Tendency: Median, and Mode The Median The median represents the middle value of the data. Half the data will be less than or equal to the median, and half the data will be greater than or equal to the median. You are above the median American income if half the American population is making less than you make. Definition. Suppose the data is arranged in an increasing order (i.e., in an array). If the size of data is ODD then the median is the middle value. If it is EVEN, then themedian is the mean of the middle two values.

The Percentiles Definition. For a number p between 0 to 100, the pth percentile xp of the data is a number such that at least p percent of the data members are below xp and at least (100 - p) percent of the data members are above xp.

1. 2. 3.

The 25th percentile is called the first quartile Q1. The median is the 50th percentile, also called the second quartile Q2. The 75th percentile is called the third quartile Q3.

The Mode There is one other measure of central tendencies that should be mentioned. Definition. The MODE of the data is the value or values that have the highest frequency. For example, the mode of the set {1, 3, 5, 5, 7} is {5} because it has the highest frequency. The mode of {1, 1, 3, 5, 5, 7} is {1, 5} because 1 and 5 both have the highest frequency. Such a set is said to be bimodal.

Use of Calculators (TI-83): Entering your data 1. Press the button stat. 2. Select "Edit" in the Edit menu and enter. 3. You will find six lists named L1, L2, L3, L4, L5, L6.

4. Let's say you want to enter your data in L1. 5. If L1 has some data, clear it by pressing the stat button and selecting ClrList in the Edit menu. 6. Once L1 is cleared, select Edit in the Edit menu and enter. 7. Now type in your data and enter one by one.

Sorting data and computing the median 1. 2. 3. 4. 5. 6. 7. Enter your data in a list, say L1. Select SortA in the Edit menu and enter. The calculator will ask for the list. Type in the list (L1), close the parentheses, and enter. The calculator will say Done. Press stat, select edit in the Edit menu, and enter. You will see that your data in L1 has been sorted in an increasing order. If the data size is odd, the median is the middle value. If the data size is even, the median is the average of the middle two values.

Computing the mean if only raw data is given 1. Enter your data in a list, say L1. 2. Select "1-Var Stats" in the CALC menu and enter. 3. The calculator will ask for the list. Type in the list L1 and enter.

4.

The calculator will give a list of numbers; x-bar is the mean x.

Computing the mean if the frequency table is given 1. Enter the frequency table in the calculator, say, x-values in L1 and frequencies in L2. 2. Select "1-Var Stats" in the CALC menu and enter. 3. The calculator will ask for the lists. Type in the list L1, L2 and enter.

4.

The calculator will give a list of numbers; x-bar is the mean x.

Problems on 2.2: Mean and Median Exercise 2.2.1. The following is the price (in dollars) of a stock (say, CISCO SYSTEMS) checked by a trader several times on a particular day. 138 142 127 137 148 130 142 133

Find the median price and mean price observed by the trader. Solution

Exercise 2.2.2. The following figures refer to the GPA of six students. 3.0 3.3 3.1 3.0 3.1 3.1

Find the median and mean GPA.

Exercise 2.2.3. The following data give the lifetime (in days) of light bulbs. 138 952 980 967 992 197 215 157

Find the mean and median lifetime of these bulbs. Solution

Exercise 2.2.4. An athlete ran an event 32 times. The following frequency table gives the time taken (in seconds) by the athlete to complete the events. Time (in seconds) 26 27 28 29 30 31 Total Frequency 3 6 5 6 9 3 32

Compute the mean and median time taken by the athlete. Solution

Exercise 2.2.5. Following is data on the weight (in ounces), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000. 94 104 96 111 116 120 109 97 133 105 135 123 133 118 125 127 127 103 124 123 124 128 126 120 138 120 92 110 129 124 96 94 147 122 110 124 119 72 134 126 127 138 110 107 150 137 121 78 124 121 72 113 111 86 96 117 138 125 117 119 100 126 121 110 96 106 127 124 89 115 132 98 120 107 130 62 93 81 110 120 115 80 97 127 135 113 135 108 119 80 134 96 112 100 120 148

Compute the mean and median weight, at birth, of the babies. Solution

Exercise 2.2.6. Following is data on the hourly wages (paid only in whole dollars) of 99 employees in an industry. 7 7 9 9 12 15 15 15 16 16 11 8 13 7 13 15 16 15 15 16 7 11 12 11 7 16 15 16 15 15 11 11 14 9 9 16 15 15 16 16 10 14 7 12 10 15 16 16 15 16 9 9 8 9 14 16 16 15 15 16 10 7 7 12 11 11 17 15 19 13 10 9 14 11 12 7 16 17 8 12 12 11 15 14 13 18 16 16 16 8 13 7 9 9 7 19 13 12 17

Compute the mean and median hourly wage. Solution

Exercise 2.2.7. Following is the frequency table on the number of typos in a sample of 30 books published by a publisher. No. of Typos Frequency 156 6 158 4 159 5 160 6 162 9

Find the mean and median number of typos in a book. Solution

Exercise 2.2.8. Following is data on the length (in inches), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000. 18 19 18 21 20.5 18 17.75 19.5 20.25 18.5 18.5 19 18.5 19 17 20.5 19 20 20 20.5 19 21.5 20 20.5 21 18 18 20 19.5 21.5 18.5 19.5 19.5 19 18.5 20 19 19 19.5 18 19 20 20.75 20 20 19 20 19.5 20 19.5 21 17 20 19.5 20 19 18.5 20 20 18 18 20 21 17.75 20 19.5 20 19.5 20 19 20 18 20 18.5 20 19 18.5 21 20 19 20.5 19.5 19.5 20.75 21 20.5 20 20.5 20.5 20 20 19 21 19 19.5 19

Compute the mean and median length, at birth, of these babies. Solution

2.3 Measures of Dispersion Range Clearly, the measures of central tendencymean, median, modecannot tell us the "whole story" about the data. Example 2.3.1. Suppose two sections of the statistics class have the following percentage score distribution at the end of the semester: Section A Section B 81 84 83 80 82 72 93 92 82 71

Both these sections have the same mean82. But in Section A, everybody will get a B grade. In section B, we will have two C's, one B and two A's. The measure of dispersion is a measure of how widely the data is scattered around. In section A, the data has a very small dispersion or variability, whereas section B has a large dispersion. A very simple measure of dispersion is the range of the data as we have defined before: range = largest value - smallest value.

Mean Deviation, Sample Variance, and Standard Deviation We will discuss three more measures of dispersion. Suppose we have a data set x1, x2, ... , xn of size n. We will denote the mean of the data by x. Three definitions follow: Definition. The mean deviation of the data is defined as follows.

mean deviation

= ( |x1- x | + ... + |xn- x |) / n

So, the mean deviation is the mean of the absolute deviations | xi -x | from the mean. Definition. The sample variance s2 of the data is defined as follows: s2 = ( (x1- x)2 + ... + (xn- x)2 ) / (n -1)

Remark. 1. Note that we denote the sample variance as the square of a number s. 2. Also note that we divide by n-1, not by n. For some reason, dividing by n-1 works better.

3.

We would like our measure of dispersion to have the same units as our data, but our formula involves squares (xix)2, which means the unit of dispersion, s2, is the unit of the data squared. If the data is in feet, the variance is in square feet. To solve this problem we define another measure of dispersion, standard deviation denoted s.

Definition. The sample standard deviation s is defined as the square root of the sample variance s2. So, to compute the sample standard deviation, we have to compute the sample variance first.

If we simplify the definition of sample variance we get the following formula: s2 =( (x12 + x22 + ... + xn2) - nx2)/(n - 1) Let us quickly do some computation with the above example 2.3.1. The mean deviation for section A = (1+2+1+2+0)/5= 6/5 and the mean deviation for section B = (10+11+10+0+11)/5= 42/5. Since the variability of section B was much higher, the mean deviation was very high. Let us compute the the sample variances : For section A the sample variance is ( (81-82)2+(84-82)2+(83-82)2+(80-82)2+(82-82)2 )/(5-1) = (1+4+1+4+0) /4= 10/4 = 2.5 . For section B the sample variance is ( (72-82)2+(93-82)2+(92-82)2+(82-82)2+(71-82)2 )/(5-1) = (100+121+100+0+121) /4= 442/4. Application of Standard deviation The mean and the standard deviation tell us a lot about how the data is distributed.

Chebyshev's Rule. This rule applies for all kinds of data. Suppose x is the mean and s is the standard deviation of the data. Then we have the following: 1. At least 0 percent of the observations will fall within 1 standard deviation of the mean, i.e, within (x-s, x+s). This is clearly obvious. 2. At least 75 percent of the observations will fall within 2 standard deviations of the mean, i.e., within (x-2s, x+2s). 3. At least 89 percent of the observations will fall within 3 standard deviations of the mean, i.e., within (x3s, x+3s).

4.

More generally, at least 100(1 - 1/k2) percent of the data will be within k- standard deviations from the mean, i.e. within (x-ks, x+ks).

Chebyshev's Rule makes no assumption about the data or the variable. If we make some assumptions about the data, then we can improve the above rule as follows.

The Empirical Rule: Suppose the histogram of the data is symmetric around the vertical line x = x as follows:

In other words, the histogram should fit into a bell-shaped curve.

Bell-shaped Curve Click to see the Flash animation. Then we have the following: 1. Approximately 68.3 percent of the observations will fall in the interval (x-s, x+s). 2. Approximately 95.4 percent of the observations will fall in the interval (x-2s, x+2s). 3. Approximately 99.7 percent of the observations will fall within the interval (x-3s, x+3s).

Question: What does it mean when the variance or mean deviation of some data is zero? The answer is that all the data members are EQUAL! Practice Problem. Consider the exercises 2.2.1 through 2.2.8. For each problem, compute the mean and standard deviation of the data and find what percentage of the data are within one, two, or three standard deviations from the mean.

Use of the Frequency Table When a frequency table is given, we can use new formulas to compute the mean and variance of the data. Formulas. Suppose the data consisting of n observations are given in a frequency table (ungrouped). Let xi denote the values and fi be the frequency of xi. Then 1. the mean = x = fixi = fixi , n

fi

2. the variance = s2 = fi(xi - x)2 n- 1 ,

3. A simplified formula for variance is 1 s2 = n- 1 [ (fixi2) - n x2 ].

4.

If the data is given in a frequency table of the grouped data, we use the same formula, with xi as the class mark, which is the average of the class limits.

Example 2.3.2. The following table extends the frequency table of the time taken to complete a lap by a race car (example 2.1.1) to compute mean and variance using the above formulas. Time x 46 47 48 49 50 51 52 53 54 55 56 Total Frequency f 1 1 3 3 4 6 4 5 5 2 1 35 fx 46 47 144 147 200 306 208 265 270 110 56 1799 fx2 2116 2209 6912 7203 10000 15606 10816 14045 14580 6050 3136 92673

So, the mean x = 1799/35 =51.4 and variance s2 = (92673 - 35x 51.42)/(35-1) = 6.0118. Example 2.3.3. Following is the class frequency distribution of the data on birth weight of some babies (exercise 1.2, Lesson 1): Classes 60.5-80.5 80.5-100.5 100.5-120.5 120.5-140.5 140.5-160.5 Frequency f 9 20 25 37 8 Class Mark x 70.5 90.5 110.5 130.5 150.5 fx 634.5 1810 2762.5 4828.5 1204 fx2 44732.25 163805 305256.25 630119.25 181202

Total

99

11239.5

1325114.75

We can use the above formula to compute (approximate) variance and the standard deviation of the birth weight. So, the mean x = 11239.5/99 = 113.53 and variance s2 = (1325114.75 - 99 x 113.532)/(99-1) = 500.997. Remarks. 1. Note that we can only get an approximate mean and variance if we use the class mark and with the above formula. If you also use the original data you may notice a difference. 2. Because of the availability of computers, the importance of such approximations has declined.

Comment: We have had detailed discussions of various formulas for defining the mean, variance, and other constants. It is important to understand these concepts and formulas. It is equally important to appreciate the value and necessity of using calculators or other available software (like Excel). It is almost impossible (and unnecessary) to compute these constants manually and correctly, unless one is specially gifted with numerical computations. Use of Calculators (TI-83): Computing the variance and standard deviation 1. Follow the same steps used for computing the mean (using either raw data or the frequency table).

2.

The calculator will give a list of numbers; SX is the standard deviation. 3. The variance is the square of the standard deviation.

Problems on 2.3: Variance, Standard Deviation, and Use of the Frequency Table Exercise 2.3.1. The following is the price (in dollars) of a stock (say, CISCO SYSTEMS) checked by a trader several times on a particular day. 138 142 127 137 148 130 142 133

Find the variance and standard deviation of the price. Solution

Exercise 2.3.2. The following figures refer to the GPA of six students. 3.0 3.3 3.1 3.0 3.1 3.1

Find the variance and standard deviation of GPA.

Exercise 2.3.3. The following data give the lifetime (in days) of certain light bulbs. 138 952 980 967 992 197 215 157

Find the variance and standard deviation of the lifetime of these bulbs. Solution

Exercise 2.3.4. An athlete ran an event 32 times. The following frequency table gives the time taken (in seconds) by the athlete to complete the events. Time (in seconds) 15.6 15.7 15.8 15.9 16.0 16.1 Total Frequency 3 6 5 6 9 3 32

Compute the variance and standard deviation of time taken by the athlete. Solution

Exercise 2.3.5. Following is data on the weight (in ounces), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000. 94 104 96 111 116 120 109 97 133 105 135 123 133 118 125 127 127 103 124 123 124 128 126 120 138 120 92 110 129 124 96 94 147 122 110 124 119 72 134 126 127 138 110 107 150 137 121 78 124 121 72 113 111 86 96 117 138 125 117 119 100 126 121 110 96 106 127 124 89 115 132 98 120 107 130 62 93 81 110 120 115 80 97 127 135 113 135 108 119 80 134 96 112 100 120 148

Compute the variance and standard deviation of the weight, at birth, of these babies. Solution

Exercise 2.3.6. Following is data on the hourly wages (paid only in whole dollars) of 99 employees in an industry. 7 7 9 9 12 15 15 15 16 16 11 8 13 7 13 15 16 15 15 16 7 11 12 11 7 16 15 16 15 15 11 11 14 9 9 16 15 15 16 16 10 14 7 12 10 15 16 16 15 16 9 9 8 9 14 16 16 15 15 16 10 7 7 12 11 11 17 15 19 13 10 9 14 11 12 7 16 17 8 12 12 11 15 14 13 18 16 16 16 8 13 7 9 9 7 19 13 12 17

Compute the variance and standard deviation of the hourly wages. Solution

Exercise 2.3.7. Following is the frequency table on the number of typos in a sample of 30 books published by a publisher.

No. of Typos Frequency

156 6

158 4

159 5

160 6

162 9

Find the mean number, variance, and standard deviation of typos in a book. Solution

Exercise 2.3.8. Following is data on the length (in inches), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000. 18 19 18 21 20.5 18 17.75 19.5 20.25 18.5 18.5 19 18.5 19 17 20.5 19 20 20 20.5 19 21.5 20 20.5 21 18 18 20 19.5 21.5 18.5 19.5 19.5 19 18.5 20 19 19 19.5 18 19 20 20.75 20 20 19 20 19.5 20 19.5 21 17 20 19.5 20 19 18.5 20 20 18 18 20 21 17.75 20 19.5 20 19.5 20 19 20 18 20 18.5 20 19 18.5 21 20 19 20.5 19.5 19.5 20.75 21 20.5 20 20.5 20.5 20 20 19 21 19 19.5 19

Compute the variance and standard deviation of the length, at birth, of these babies. Solution

Exercise 2.3.9. The following is the frequency table of weight (in pounds) of some salmon in a river. Find the variance and standard deviation. Weight x Frequency f 31 3 32 2 33 4 34 5 35 6 36 5 37 9

Find the variance and the standard deviation. Solution

Exercise 2.3.10. The following data represents the time (in minutes) taken by students to drive to campus. 23 26 13 17 37 27 19 29 32 24 19 32 42 35 23 33 18 35 20 30 25 22 21 33 15 11 24 9 23 23

Find the mean, variance, and the standard deviation of the data. Lesson 3 : Probability

Introduction 3.3 Laws of Probability Homework 8 - 11

3.1 Basic Concept of Probability

3.2 Sets and Subsets, Statistical Experiments, Sample Space, Events, Probability

3.4 Counting Techniques and Probability 3.5 Conditional Probability and Independent Events

Introduction Probability to a statistician is the probability of the occurrence of an event. To an ordinary person it is the quantified chance of occurrence of that event. Some of the early theory of probability originated in gambling and later theories developed in bioscience. We get very tempted when we see somebody win $1 million in a lottery, but lottery operators design their games and machines in such a way that they will make more money than they give, in the long run.

3.1 Basic Concept of Probability We are all familiar with simple probabilistic statements. If you toss a coin the probability that the HEAD will show up is 1 out of 2. If you roll a die the probability of getting the face 5 is 1 out of 6. The probability of having an accident on a particular busy street on a particular day is 1 out of 100. (When a child says "probably we should invite Aaron for my birthday," however, the "probably" may have little to do with mathematics of probability, but shows the awareness of the concept of probability at a basic human level.) When we toss a coin for a large number of times we find that essentially half the time the head shows up. As we continue to toss, we see that the ratio of the number of Heads to the number of tosses remains close to and moves around 1/2. So we say that if we toss a coin, the probability that the head will show up is .50. On the other hand, if this ratio remains close to and moves around .49 then we will say the probability of heads is .49. To understand the concept of probability empirically, we visit aflash animation of a coin tossing experiment. We observe the accidents on a street over a long period of time and observe that on about one in a hundred days there is an accident. The longer we continue to observe, we see that the ratio of the number of days there is an accident to the number of days observed remains close to one to one hundred. So we say that probability of an accident on a day on that street is 1 percent. These examples explain the basic notion of probability. The probability of an event is understood as the "relative frequency," the ratio of occurrences of the EVENT to the total number of times the EXPERIMENT is repeated. 3.2 Sets and Subsets, Statistical Experiments, Sample Space, Events, Probability This section provides basic definitions that we will need for the rest of the course. Sets and Subsets Definition. By a set S we mean a collection of objects. The objects in this set S are also called elements of the set. A set E is said to be a subset of a set S if each element of E is also an element of S. We write ES to mean that E is a subset of S. Obviously, a subset E of S is a smaller collection than or equal to S. The following are some examples. We also explain the usage of braces to describe a set. 1. Let D = the collection of all 52 cards in a deck. Then D is a set. Let E be the collection of all the hearts in this deck. Then E is a subset of D. In brace notation E={x in D : X is a Heart } This is read "the set of x in D such that x is a heart"

2. Let T be the collection of all those who filed a tax return to the IRS for the year 1999. Then T is a set. Let L be the collection of those whose Adjusted Gross Income in the return was less or equal to $30,000. Then L is a subset of T. Let C be the collection of those who declared capital gains income. Then C is a subset of T. We write LT CT In brace notation L = {x T : the Adjusted Gross Income of x is less or equal to $30,000}. The symbol means "an element of" x T means x is an element of T 3. Let N be the collection of all integers, and let E be the collection of even integers. Then N, E are set and EN In brace notation N = {n : n is an integer} E = {n N : n is even}.

4. Let R be the set of all (real) numbers. Let I be the set of all numbers between 0 and 1, not equal to 0,1. Then R,I are sets and I is a subset of R. In brace notation R = {x : x is a real number} I = {x R : 0 < x < 1}. 5. S = {1,7,13,17,19} is a set. 6. Let S be the collection of you and your siblings, B be the collection of your brothers, and F be the collection of your sisters. Then S,B,F are sets and we have FS B S.

Statistical Experiments and Sample Space Definitions.

1. A statistical experiment is a procedure that produces exactly one out of many possible outcomes. All the
possible outcomes are known, but which outcome will result when you perform the experiment is not known.

2. Given an experiment, the set of all possible outcomes is called the sample space. 3. Given an experiment, an outcome of the experiment is called a sample point. So, the sample space consists of
sample points.

Examples. The following are examples of some experiments and their sample spaces.

1. Suppose your experiment is tossing a coin. The outcomes are H (heads) and T (tails). So, the sample space is S =
{H,T}.

2. Suppose your experiment is tossing a coin twice. The sample points (or outcomes) are HH,HT,TH,TT and the
sample space is S = {HH,HT,TH.TT}.

3. Your experiment is rolling a die. The outcomes are 1,2,3,4,5,6 and the sample space is S = {1,2,3,4,5,6}.
4. Suppose that your experiment is rolling a die twice. Then the sample space is (1,1) (1,2) (1,3) (1,4) (1,5) (1,6) (2,1) (2,2) (2,3) (2,4) (2,5) (2,6) S = (3,1) (3,2) (3,3) (3,4) (3,5) (3,6) (4,1) (4,2) (4,3) (4,4) (4,5) (4,6) (5,1) (5,2) (5,3) (5,4) (5,5) (5,6) (6,1) (6,2) (6,3) (6,4) (6,5) (6,6) In brace notation, we can write S = {(i,j) : i = 1,2,3,4,5,6 and j = 1,2,3,4,5,6}.

5. Suppose your experiment is to determine the number of road accidents in Lawrence on a particular day. So, the sample space is S = {0,1,2,3 ... }.

6. Suppose the experiment is to determine the sex of an unborn chlid. Then the sample space is S = {Female, Male}.

7. Suppose your experiment is to determine the blood group of a patient in a lab. Then the sample space is S = {O,A,B,AB}.

8. Suppose your experiment is to observe the annual wheat production in Kansas. Then the sample space is S={x : x is a nonnegative Number} = {x R : x 0} =[0, ). Definition. The sample space S is called a finite sample space if S has only a finite number of outcomes. If S has infinite elements, it is called an infinite sample space. Note that examples 1, 2, 3, 4, 6, and 7 above have finite sample spaces, and 5 and 8 have infinite sample space. Events Definitions. Given an experiment and its sample space S, the following are important definitions.

1. A subset of the sample space S is called an event. So, an event E consists of outcomes, and we have
E S.

2. An event that has no outcome and is called the empty event or impossible event. The impossible event
consists of no outcome; if you perform the experiment, the impossible event will never occur.

3. Since S is also a subset of S, S is an event. This event S is called the sure event. If you perform the experiment,
this event is sure to occur.

4. A simple event consists of a single outcome.


Remark. Often, we will describe events in "English," and we may have to identify them as a subset of the sample space and also conversely.

Examples. The following are some examples of events. 1. Look at example 2 abovethe experiment on the coin toss. Let E be the event that at least one of the tosses gave T, and let F be the event that both tosses gave the same face. Then E = {HT, TH, TT} and F = {HH,TT}.

2. Look at example 4 abovethe experiment on rolling a die. Let E5 be the event that first die showed 5. Then
E5 = {(5,1), (5,2), (5,3), (5,4), (5,5), (5,6)}. Let T5 be the event that the sum of the two "rolls" is 5. Then T5 = {(1,4), (2,3), (3,2), (4,1)}.

Let T1 be the event that the sum of the two rolls is 1. Because T1 has no outcome, it is an impossible event. Let T13 be the event that the sum of the two rolls is 13. Then T13 is also an impossible event. 3. Look at the example 5 abovethe experiment on road accidents. Let E be the event that there is no accident on that day. Then E = {0}.

4. Look at example 8 abovethe experiment on annual wheat production. Let E be the event that there will be more than 1000 units of wheat production in 1998. Then E = (1000, ). The Theory of Probability Given a sample space S, in the MATHEMATICS of probability we have rules for how to compute the probability of an event E. Although the MATHEMATICS of probability was inspired by the empirical concept of probability, we do not derive anything from our intuitive ideas. We are guided by the precise rules and laws that we set up. For now we will be dealing with finite sample spaces.

Definition. Let

S = { e1, e2, ... ,en }. be a finite sample space. The probability of a simple event {e} is a number (possibly given) denoted by P({e}) which has the following properties: 1. 0 P({e}) 1. 2. The sum of the probabilities of all the simple events is 1: P({e1}) + P({e2}) + ... + P({en}) = 1.

3. If E is an event, then the probability E, P(E), is defined as the sum of the probabilities of all the sample events in E: P(E)= eE 4. So, we also have P(impossible Event)=P ()=0 P(Sure Event)=P(S)=1 P({e})

Remark. If we know the probabilities P({e}) of all the simple events {e}, we will be able to compute the probability of any event E using 3. The probabilities of the simple events will 1. either be given 2. or we will be given a rule how to compute it. Probability with Equally Likely Outcomes One of the most frequently used models to compute probabilities of simple events is called EQUALLY LIKELY OUTCOMES. Definition. Let S = {e1, ... , eN} be a finite sample space. We say that all the outcomes are equally likely if all the outcomes have the same probability. So, in this case, we have P({e1}) = P({e2}) = = P({eN}) = 1/N. Also, in this case, for an event E P(E) = eE P({e}) = eE 1/N

=(Number of Outcomes in E)/(Number of Outcomes in S) If n(E) denotes the number of outcomes in E then P(E) = n(E) .

n(S)

Problems on 3.2 Probability of Simple Events Given in a Table Exercise 3.2.1. The following table gives the blood group distribution of a certain population. Blood Group Distribution Blood Group Percentage of Population O 47 A 42 B 8 AB 3

Find the probability that a random sample of blood will be of Blood Group A or B or AB. (Here S={O, A, B, AB} and we want to compute the probability P(E) of the event E={A, B, AB}. Solution Exercise 3.2.2. A student wants to pick a school based on its grade distribution. Following is the most recent grade distribution in a school: Grade Distribution Unreal Data Grades Percentage of Students A 19 B 33 C 31 D 14 F 3

Find the probability that a randomly picked student will have at least a B average. Solution Exercise 3.2.3. The following table gives the probability distribution of a loaded die. Probability Distribution for a Die Face Probability 1 0.20 2 0.15 3 0.15 4 0.10 5 0.05 6 0.35

Find the probability that the face 2 or 3 or 6 will show up when you roll the die. Solution

Find the Probability with Equally Likely Outcomes Exercise 3.2.4. An urn contains 7 apples and 3 oranges and 5 pears. One piece of fruit is picked at random. Find the probability that 1. the fruit is an apple,

2. the fruit is either an apple or a pear, and 3. the fruit is an orange. Solution Exercise 3.2.5. A die is rolled twice. Find the probability that 1. the sum is 8, 2. only 2 or 3 showed up in both the rolls, and 3. the first roll produced a bigger number. Solution Exercise 3.2.6. A letter is chosen at random from the letters of the English alphabet. Find the probability that 1. the letter is either I or U, 2. the letter is in the word ALWAYS, and 3. the letter is not in the word NEVER. Solution

3.3 Laws of Probability Notations from Set Theory Following are a few notations from the set theory, which we will be using in the context of sample spaces and events. Notations. Let S be a set and E, F be two subsets of S.

1. The union E F, of E and F is the set defined as follows:


E F = {x S : x E or x F}. So, if you put together the elements of E and F in a single collection, you get the union E F.

2. The intersection E F, of E and F is defined as follows:


E F = {x S : both x E and x F}.

So, if you take all the elements common to both E and F, you get the intersection of E and F.

3. The complement Ec, of E is defined as follows:

Ec = {x S : x E}. So, the complement Ec of E is the collection of all the elements in S that are not in E.

Remark. If we can understand and interpret the above definitions in our context of sample spaces and events, that is adequate. For us, S will be a fixed sample space and E,F will be events.

1. E F is the event that consists of all outcomes that are either in E or in F (or both). So the occurrence of either E
or F is the same as the occurrence of E F. That is why some textbooks use the notation (E or F) for E F. So, notationally, as in some textbooks, E F = E or F.

2. E F is the event that consists of all the outcomes that are both in E and F. So the simultaneous occurrence of E and F is the same as the occurrence of E F. That is why E F is denoted by (E and F) in some texts. Notationally, as in some textbooks, E F = E and F.

3. Similarly, Ec is the event that consists of all the outcomes in S that are not in E. So, the occurrence of Ec is the
same as the nonoccurrence of E. Notationally, as in some textbooks, E c = (not E)

Laws of Probability Following are some of the laws of probability. First, probability behaves like area and the laws of probability are like that of area. Some formulas and definitions: Let S be sample space and let E and F be two events. 1. We have P(E F) = P( E or F) = P(E)+P(F)- P(E F) = P(E) + P(F) -P(E and F) We subtract P(E F) because we counted it twice: once in P(E) and once in P(F).

2. Definition. We say E and F are mutually exclusive if E F = , i.e., E and F have no outcome in common.
Since P() = 0, it follows from 1 that if E and F are mutually exclusive then

P(E F) = P(E) + P(F) 3. We also have P(Ec) = 1 - P(E).

4. Definition. Let E be an event. We say that the odds of an event E occuring are a to b if
P(E) = a/(a+b) Remark: This concept of ODDS is used often in gambling. When the odds in favor of a horse are 2 to 3, essentially this means that the probability the horse will win is 2/5. We say "essentially" because in actual betting, the probability is actually slightly less than 2/3, so that in the long run the gambling establishment makes more money than it gives. (This instructor is not particularly experienced in such betting or horse races.)

Problems on 3.3: Laws of Probability Exercise 3.3.1. Let E, F, G be three events. It is given

P(E)=0.3 P( E F) = 0.2 Find the probability that 1. E or F occur, 2. both E and G occur, and 3. E does not occur. Solution Exercise 3.3.2. Let E, F, G be events .

P(F)=0.7 P(G)=0.6 P( E G) = 0.7

1. If the odds in favor of E are 3 to 5, find the probability that E occurs. 2. If the odds against F are 3 to 4, find P(F). 3. If P(G) = 7/10, what are the odds in favor of G?

Exercise 3.3.3. The probability that a Christmas tree is taller than 6 feet is .30; the probability that a Christmas tree weighs more than sixty pounds is 0.25; and the probability that a Christmas tree is either taller than 6 feet or more than sixty pounds is .4. 1. Find the probability that a Christmas tree is both taller than 6 feet and weighs more than sixty pounds. 2. Find the probability that a Christmas tree is not taller than 6 feet.

3. Find the probability that a Christmas tree is either less than 6 feet tall or less than sixty pounds in weight. 4. Find the probability that a Christmas tree is neither taller than 6 feet nor heavier than sixty pounds. Solution Exercise 3.3.4. The probability that a student majors in liberal arts is .44; the probability that a student majors in business is .33; and the probability that a student majors in either liberal arts or business is .65. Find the probabilities 1. that a student majors in both liberal arts and business. 2. that a student majors in neither liberal arts nor business. Solution

3.4 Counting Techniques and Probability Counting techniques are important and useful to learn. You might like to know, for example, 1. the number of English words (formal) of 5 letters, (A formal word is any sequence of letters from the English alphabet. For example, eezq is a formal word.) 2. the number of ways you can deal a hand of 13 cards from a deck of 52 cards, or 3. the number of ways you can assign the first row of 11 seats to 231 guests. Before we go further into counting, let us recall the factorial notation. Notations. Let n be a positive integer. Then the n! (read as factorial n) is defined as n!= 1 . 2 . (n-2) . (n-1) n 0!=1. Factorial n is the product of all integers from 1 up to n. One of the main tools for such counting is the following principle: The Basic Counting Principle. Suppose we have an experiment that is a combination of r sub-experiments, performed one after the other, such that

1. the first sub-experiment has n1 outcomes; 2. corresponding to each outcome of the first sub-experiment, the second sub-experiment has n2 outcomes; 3. corresponding to each outcome of the first and the second sub-experiments, the third sub-experiment has
n3 outcomes;

r. corresponding to each outcome of each of the previous r-1 sub-experiments, the rth sub-experiment has nr outcomes. Then our original experiment will have n1n2 ... nr outcomes.

Remark. Here we have used the word "experiment" in a slightly different sense than the statistical experiments. The basic counting principle will be used to count the number of outcomes in sample spaces and events. Examples. 3.4.1. Count the number of words of length four that you can construct from the English alphabet. Answer: 26x 26x26x26 We use the counting principle by splitting this experiment into four sub-experiments: Stage 1. 2. 3. 4. Job to do Pick the first letter Pick the 2nd letter Pick the 3rd letter Pick the 4th letter Number of Ways 26 26 26 26

Answer = Product = 456976

3.4.2. Count the number of ways you can assign the 11 seats in the first row in a concert hall to 231 guests. Stage 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. Job to do Assign seat 1 Assign seat 2 Assign seat 3 Assign seat 4 Assign seat 5 Assign seat 6 Assign seat 7 Assign seat 8 Assign seat 9 Assign seat 10 Assign seat 11 Answer = Product = Number of Ways 231 230 229 228 227 226 225 224 223 222 221 221*222*...*230*231

3.4.3. Contrast: How many ways can you form a committee of 11 members from a group of 231 people? Unlike assigning seats, here the order of selection of the members will be ignored. The 11 members, when permuted around, will have different seat assignments but in the same committee. Forming the committee is a "combination" problem that comes below.

Remark. The difference between assigning 11 seats in a row and forming a committee of 11 is that in the first case the order of assignment is important. Assigning the first row to the same 11 guests in two different ways will count as

two different outcomes. When we form a committee, the order in which we pick 11 members does not make any difference.

Definition. Suppose we have n objects. We pick r of them one by one (without ever puttting them back) and arrange them in a row. Such an ordered arrangement will be called a permutation of n objects taken r at a time. The number of permutations of n objects taken r at a time is denoted by nPr. It follows from the basic counting principle that

nPr = n (n-1) (n-2) ... (n-r+1) = n!/(n- r)! Number of permutations nPr = product of r integers starting from n downward. In contrast, we can pick r objects from a collection of n objects one by one but place the object back in the collection before the next pick, and arrange all of them in a row. Such selection and arrangement is called picking with replacment. Constructing a formal word of length 4 is an experiment of picking with replacement. Remark: Example 3.1 is a problem on picking with replacement because a letter can be selected more that once. Example 3.2 is a permutation problem.

Definition. Suppose we have n objects in a container. We pick r of them all at a time. In this case the order of selection does not come into consideration. Such a selection is called a combination of n objects taken r at a time. The number of combinations of n objects taken r at a time is denoted by nCr and is given by nC r = n! (r! (n-r)!)

Examples. 1. Count the number of ways you can form a committee of 11 from a group of 231 people. Answer: 231C11 2. Count the number of ways you can deal a hand of 13 cards from a deck of 52 cards. Answer: 52C13.

Problems on 3.4: Counting Techniques and Probability Exercise 3.4.1. Find 5! Solution Exercise 3.4.2. A homeowner would like to install a new storm door. The local store offers 2 brand names; each brand has 4 different styles and 3 colors. How many choices does the homeowner have? Solution Exercise 3.4.3. Suppose in the World Cup soccer tournament, group A has 8 teams. Each team of group A has to play all the other teams in the group. How many games will be played among the group A teams. Answer: 8C2 Exercise 3.4.4. How many ways can you deal a hand of 13 cards from a deck of 52 cards? Answer: 52C13 Exercise 3.4.5. How many ways can you deal a hand of 4 spades, 3 hearts, 3 diamonds, and 3 clubs? Solution Solution-variation Exercise 3.4.6. We have 13 students in a class. How many ways can we assign the 4 seats in the first row? Solution

Exercise 3.4.7. Programming languages sometimes use a hexadecimal system (also called "hex") of numbers. In this system, 16 digits are used and denoted by 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F. Suppose you form a 6-digit number in a hexadecimal system. 1. What is the probability that the number will start with a letter digit? 2. What is the probability that the number is divisible by 16 (i.e., ends with 0)? Solution Here the sample space is the collection of all the 5-digit hex numbers.

Using the counting principle, the number of hex = n(S) = 166. Let E be the event that the number starts with a letter digit. Again, by the counting principle, the number of hex in E = n(E) = 6*165. So, P(E) = n(E)/n(S) = 6/16. Let F be the event that the number is divisible by 16. Since a number is divisible by 16 means, in hex, the first digit is 0. So, the number of hex in F = n(F) = 165*1 = 165. So, P(F) = 165/166 = 1/16.

Exercise 3.4.8. You are playing Bridge and you are dealt a hand of 13 cards. 1. What is the probability that you will get a hand of 4 spades, 3 hearts, 3 diamonds and 3 clubs? 2. What is the probability that you will get all 4 aces? 3. What is the probability that you will get all 13 spades? Solution Exercise 3.4.9. A committee of 9 is selected at random from a group of 11 students, 17 mothers and 13 fathers. 1. What is the probability that the committee has 3 students, 3 mothers, and 3 fathers, i. e., is a balanced committee? 2. What is the probability that the committee has 4 mothers and 5 fathers? 3. What is the probability that the committee has all students? Solution Exercise 3.4.10. Three scholarships of unequal value will be awarded from a group of 35 applicants. How many ways can such a selection be made? Solution

3.5 Conditional Probability and Independent Events Sometimes when new information becomes available, the probability of an event may have to be reevaluated in light of this new information. Suppose we have a sample space S and an event E. Now suppose we have new information that an event C has occurred. We will have to reevaluate the conditional probability of E given that C has occurred. The conditional probability of E given that C has occurred is denoted by P(E|C). Clearly, P(E|C) may be different from P(E). In

fact, now that C has occurred, our old sample space is no longer relevant. And C assumes the role of the new sample space. Example. Suppose we pick a KU student at random and let E be the event that the student is taller than 6 feet. Then we have the following observations. 1. The sample space S is the whole KU student population. 2. Since all the outcomes are equally likely, we have number of KU students who are taller than 6 feet P(E) = Total number of KU students = n(S) n(E) .

3. Now suppose we know that the student selected is a male. Let us denote the event that the student is a male by C. The probability that the student is taller than 6 feet, given that the student is a male, is higher than "simple" P(E). In fact, our new sample space is C, which is the whole KU male student population, not S, which is the whole KU student population. 4. We now have the probability that the student is taller than 6 feet in height given that the student is a male number of MALE students who are taller than 6 feet = P(E|C) = Total number of male KU- students n(EC) = n(C) Simple computations show that n(EC)/n(S) P(E|C) = n(C)/n(S) = P(C) P(EC) . .

Based on the above example, we give the following definition and formula. Definition. Let S be a sample space and E, C be two events. 1. The conditional probability of E given that C has occurred is P(EC) P(E|C) = P(C) if P(C) 0. 2. We get the following formula

P(EC) = P(E|C)P(C). Independent Events If the conditional probability P(E|F) = P(E) the "simple" probability, then we say that E and F are independent. In this case, P(EF) = P(E)P(F). Definition. We say that two events E and F are independent if P(EF) = P(E)P(F). If two events are not independent, then they are said to be dependent. Remark. Let us also describe what we mean by independence of 3 or more events. For events E1,E2, , En, we say they are independent if the "multiplication rule" applies. For example E,F,G,H are independent if all of the following holds: 2 events P(EF) = P(E)P(F), P(EH) = P(E)P(H), P(FH) = P(F)P(H), 3 events P(EFG) = P(E)P(F)P(G), P(EGH) = P(E)P(G)P(H), 4 events P(EFGH) = P(E)P(F)P(G)P(H) P(EFH) = P(E)P(F)P(H), P(FGH) = P(F)P(G)P(H) P(EG) = P(E)P(G), P(FG) = P(F)P(G), P(GH) = P(G)P(H)

Problems on 3.5: Conditional Probability and Independent Events Exercise 3.5.1. Let A, B be two events. Given that P(A) = .66 Find P(B|A). Solution Exercise 3.5.2. Given P(A|B) = .8 Find P(AB). Solution P(B) = .1 P(A B) = .11

Exercise 3.5.3. In a certain county, the probability that a person took a flu shot is .45 and the probability that a person will get flu, given that he/she took a flu shot is .06. What is the probability that a randomly selected person took a flu shot and will get flu? Solution Exercise 3.5.4. Consider the following two circuit diagrams:

Circuit 1

Circuit 2

For each of the two circuits do the following: As you can see, current flows through two switches A and B to the radio and back to the battery. It is given that the probability that the switch A is closed is 0.91 and the probability that the switch B is closed is 0.83. Assume that the two switches function independently. Find the probability that the radio is playing. Solution Exercise 3.5.5. An airplane has two engines. The probability that engine 1 fails is 0.023 and the probability that engine 2 fails is 0.06. Assume that the engines function independently. 1. What is the probability that both engines fail? 2. What is the probability that both will not fail? 3. What is the probability that neither will fail? Solution Exercise 3.5.6. Following are data from a hospital emergency room: 1. The probability that a patient in the emergency room will have health isurance is 0.75. 2. The probability that a patient in the emergency room will survive the treatment 0.85. 3. The probability that a patient in the emergency room will have health insurance and will also survive is 0.7. What is the conditional probability that a patient in the emergency room will survive, given that he/she has health insurance. Solution Exercise 3.5.7. The probability that you will receive a wrong number call this week is 0.3; the probability that you will receive a sales call this week is 0.8; and that the probability that you will receive a survey call this week is 0.5. What is the probability that you will receive one of each this week? (Assume that all these calls are independent.) esson 4 : Random Variables

4.1 Random Variables Homework 12 and 13 4.1 Random Variables

4.2 Probability Distribution

4.3 The Bernoulli and Binomial Experiments

Definition. Let S be a sample space. Then a random variable X assigns a numerical value X(w) to each outcome w in S.

Examples. Suppose we pick a KU student at random. Then our sample space S is the whole population of KU students. 1. Let X be the GPA of the student. If w is a student, X has a value X(w) which is the GPA of w. 2. Define Y as follows : Y(w) = 0 If w is Male Y(w) = 1 If w is Female 3. 4. 5. 6. Let Let Let Let Z be the height of student w. T be the number of credit hours completed by w. W be the weight of w. D be the total expenses (rounded up to the nearest dollar) of w in 1997.

Then X,Y,Z,T,W,D are all random variables. Definitions. A random variable X is said to be a discrete random variable if the values that X can assume can be written in a (possibly infinite) list x1, x2, x3, . A random variable X is said to be a continuous random variable if X can assume any value in an interval. Remark. In this course, examples of discrete random variables are always the number of something: number of typos, number of accidents on a street, number of defective items in a lot, and so on. Examples of continuous random variables are length, weight, and time. So, Z,W are continuous random variables and X,Y,T,D are discrete random variables. Examples. 1. Let X be the number of wrong number calls you receive in a day. Then X is a discrete random variable. 2. Let X be the waiting time before you receive the next wrong number call. Then X is a continuous random variable.

First, we will be concerned with the discrete random variables.

4.2 Probability Distribution The probability distribution of a random variable X is a table or a rule or a method that answers probability-related questions regarding X.

Definition. Suppose X is a discrete random variable that assumes the values x1,x2,. The probability distribution of X can be described by giving p(xi) = P(X = xi) in a table or by a formula. This function p(xi) is called the probability function of X. So, if the probability distribution of X is given in a table, then it looks like this:

Value x x1 x2 x3

Probability p(x) p(x1) p(x2) p(x3)

Properties of Probability function. Suppose X is a discrete random variable that assumes value x 1, x 2, x 3, and let p(x) be the probability function. Then we have the following:

1. 2.

0 p(xi) 1. p(xi) = 1.

Definition. Let X be a discrete random variable that assumes the values x1, x2, x3, Then the mean of X is defined as = xip(xi). The mean is also called the expected value of X and is denoted by E(X). The mean is also called the population mean. Example. Suppose you design a coin toss game. In this game, you give the opponent $3 if a head comes and you collect $1 if a tail comes. Let X be the money you receive. Then X assumes the values -3 and 1. You also have a loaded coin so that P(H) = 1/9 Then the probability distribution of X is given by Value x -3 1 So, the mean of X is given by = xip(xi)= (-3)(1/9)+1(8/9)=5/9. Interpretation of mean of X. In this example, (see the first example in section 4.1), the mean tells us your average win per game if you play for a long time. Similarly, if Z is the height then the mean = E(Z) is the actual mean height of the KU student population. If we take a large sample from the KU student population and compute the sample mean, it should approximate . Probability p(x) 1/9 8/9 P(T) = 8/9.

Definition. Let X be a discrete random variable that assumes values x1,x2, x3, Then the variance 2 of X is defined as 2= Variance(X)= (xi-)2p(xi).

Some simplification will show 2= Variance(X)= xi2p(xi)-2.

The standard deviation of X is defined as the positive square root of the variance of X. standard deviation of X= =Variance(X) The variance 2 is also called the population variance. If we take a large sample and compute the sample variance s2 then s2 will be an estimate for 2. Similarly, is called the population standard deviation.

Problems on 4.2: Probability Distribution Exercise 4.2.1. The number of passengers X in a car on a freeway has the following probability distribution. X=x 1 p(x) 0.35 Find: 1. the expected number of passengers in a car; 2. the Variance 2 of the number of passengers; 3. the probability that the number of passengers in a car is at least 3. Solution Exercise 4.2.2. Karin is a plumber who works for 3 different employers. Employer A pays her $120 a day, employer B pays her $70 dollars a day, and employer C pays her $180 a day. She works for whoever calls her first. The probability that employer A calls her first is 0.30; the probability that employer B calls first is .20; and the probability that employer C calls her first is 0.40 (the probability that no one calls is .10). What is the expected income and variance of Karin per day? Solution Exercise 4.2.3. An insurance company sells a flight insurance policy at a flat rate of $500 per flight. If a policyholder dies in flight, the insurance company pays $100,000 to the survivors. The probability that a policyholder will die in flight is . 003. What is the expected gain and variance of the company per sale? Solution 2 0.30 3 0.15 4 0.15 5 0.05

4.3 The Bernoulli and Binomial Experiments There are many random variables that we encounter fairly often. The first one that we discuss is called a Bernoulli random variable.

Definition. There are many statistical experiments that have only two outcomes. In such cases, the outcomes may be called a success or a failure. So the sample space is S={s,f}. Here s means success and f means failure. Such an experiment is called a Bernoulli trial. Given a Bernoulli trial, we can define a random variable as X = 1 if success X = 0 if failure

If the probability P(success) = p then we have P(failure) = 1-p. So, the probability distribution of a Bernoulli random variable is given by Value x 0 Probability p(x) 1-p

1 The mean of X is

= 0(1-p)+1p = p. The variance of X is 2 = xi2p(xi) - 2 = (0.(1-p)+1p) -p2 = p-p2 = p(1-p).

Binomial Random Variable

Definition. An interesting statistical experiment is a combination of n "identical and independent" Bernoulli trials. Such an experiment is called a binomial experiment. More formally, given a positive integer n and a number p with 0 p 1 a binomial(n,p) experiment (or B(n,p) experiment) is characterized as follows:

1.

A binomial experiment consists of n identical and independent Bernoulli trials.

2. The probability of success in each trial remains fixed and is equal to p.

Definition. Given a B(n,p)-experiment, let X = total number of successes in these n trials. Then X is called a binomial (n,p)-random (or B(n,p)-random) variable. Following are some important facts about a B(n,p)-random variable X: 1. X can assume values 0,1,,n. The probability distribution is given by p(r) = P(X = r) = P(r success) = nCr pr(1-p)n-r

where r runs through 0,1,2,,n.

2. The mean of X is = E(X) = np.

3. The variance of X is 2 = Variance(X) = np(1-p).

Problems on 4.3: Binomial Experiments Exercise 4.3.1. Let X be a B(6,.3)-random variable. Find P(X = 2). Also find the probability that X is at least 2. Solution Exercise 4.3.2. According to a report entitled "Pediatric Nutrition Surveillance" published by Centers for Disease Control (CDC), 18 percent of children younger than 2 years had anemia in 1997. On a particular day, a pediatrician examined 11 children. 1. 2. 3. 4. 5. What is the probability that none will have anemia? What is the probability that exactly 5 will have anemia? What is the probability that all will have anemia? Compute the expectation and variance of the number of children with anemia. What is the probability that at least 7 will have anemia?

Solution Exercise 4.3.3. A gardener planted 15 seeds. The probability that a seed will germinate is 0.1. 1. 2. 3. 4. 5. 6. Solution Exercise 4.3.4. In a particular county, 60 percent of the population is Hispanic. 1. What is the probability that a jury of 12 will have exactly 6 Hispanic members? 2. What is the probability that a jury of 12 will have more than 6 Hispanic members? Solution Exercise 4.3.5. From the hiring statistics of a corporation (say IBM), it is known that for every 4 interviews they give, they make 1 job offer. Suppose that the corporation interviews 8 candidates each time it comes to campus. What is the mean and standard deviation of the number of job offers made each time? Lesson 5 : Continuous Random Variables What is the probability that exactly 3 seeds will germinate? What is the probability that exactly 4 seeds will germinate? What is the probability that exactly 9 seeds will germinate? Compute the expected number of seeds that will germinate. Compute the standard deviation of the number of seeds that will germinate. What is the probability that at most 4 seeds will germinate?

5.1 Probability Density Function (pdf) Homework 14 - 16 5.1 Probability Density Function (pdf)

5.2 The Normal Random Variable

5.3 Nomal Approximation to Binomial

Given a sample space S, a continuous random variable was defined as a random variable X that can assume any value in an interval. The probability distribution of a continuous random variable is described very differently from that of a discrete random variable. We describe it as follows. Definition. Let S be a sample space and X be a continuous random variable. Then there is a function f(x), of real numbers x, to be called the probability density function, abbreviated as pdf of X. This pdf f(x) has the following properties: 1. We have f(x) 0 for all real numbers x. 2. For any two real numbers a b (also for a = - and b = ) the probability that X will be between a and b is given by the area under the graph of y = f(x), above the x-axis and between the vertical lines x = a and x = b. In mathematical notations we have P(a X b) = P(a X < b) = P(a < X b) = P(a < X < b) = the area under the graph of y = f(x), above x-axis, between the vertical lines x = a and x = b.

Look at the animations on

1. 2.

exponential probability. normal probability.

3. If you had calculus, we have P(a X b) = P(a X < b) = P(a < X b) = P(a < X < b) = abf(x)dx

4. It follows that for any real number a P(X = a) = 0. This is very much in contrast with the discrete random variables.

5.

The whole area under the graph of y = f(x) above the x-axis must be one.

Remark. Given a continuous random variable X, to get a model for f(x) we look at a large sample and look at the relative frequency histogram of the X-values.

Example. Let X have the following pdf: f(x) = 1 if 0 x 1 0 Otherwise

Then we say X is uniformly distributed between 0 and 1 because it has the same density everywhere between 0 and 1. Similarly, Y is said to be uniformly distributed between -1 and 3 if the pdf of Y is given by g(x) = 1/4 if -1 x 3 0 Otherwise

The Mean and Variance The mean , variance 2 and standard deviation of continuous random variables X are interpreted as we did for discrete random variables. As before, the mean , which is also called the expectation E(X), represents the average value of X. But the definitions involve some calculus, which we are trying to avoid. If you have had calculus, I am giving the following definitions. Suppose f(x) is the pdf of a continuous random variable X. Then the mean of X is

=E(X)=- xf(x)dx and the variance of X is 2 =Variance(X)=- (x- )2 f(x)dx and the standard deviation is the square root of the variance 2. Look at the following flash animations of graphs of some pdfs:

1. 2. 3.

Example 1. Normal Example 2. t Distribution Example 3. Chi-square Distribution

5.2 The Normal Random Variable The most commonly encountered random variable in nature is the normal random variable. As we have seen in the last section, the probability distribution of a random variable is determined by the pdf of the random variable. The pdf of a normal random variable is described below. PDF of a Normal Random Variable: Suppose f(x) is the pdf of a normal random variable X. Then we have the following properties of f(x).

1.

The graph of the pdf y = f(x) has a symmetric bell shape as illustrated below:

2. 4. 7. 8.

Look at the flash animation of the pdf of normal random variables. The graph is symmetric around the vertical line x = . The graph is also peaked at x = .

3. The pdf f(x) is completely known if we know the mean and the standard deviation . 5. The graph approaches the x-axis at both ends of the x-axis. 6. The larger the standard deviation is, the flatter the the graph of y = f(x) will be. In fact, f(x)= 1/[ (2 )] exp [-(x- )2/(22)] for - < x < .

If X is a normal random variable, we say X is normally distributed, or X has normal distribution. We also write X has N(,)-distribution.

Definition. A normal random variable is called a Standard Normal Random Variable if it has mean = 0 and standard deviation = 1. So, a N(0,1)-random variable is called a standard normal variable. In some textbooks the standard normal

random variable is denoted by Z. The GOOD NEWS is that a table is available to compute these probabilities. The following properties of Z will be useful. 1. The graph of the pdf y = f(x) of the standard random variable Z is symmetric around the y-axis. 2. The total area under the graph above the x-axis is one. 3. So, on each side of the y-axis, the area under the graph above the x-axis is .5.

4.

Visit the flash animation on Standard Normal Probability to see illustrations of the above.

Using the Probability Tables: Tables are used widely to compute probability. However, due to the use of various software programs on probability, the importance of such tables has declined. In this chapter, we will use the Z-table to compute probability for the standard normal random variable. We note the following: 1. Tables are available in many different formats.

2.
3. 4. 5. 6.

Visit the Z-table and try to understand it. This table gives P(Z<z) for numbers z. The probability P(Z<z) is the area on the left side of z, under the bell curve. The number z is read from the left column and top. The probability P(Z<z) is given in the middle. So, P(a < Z < b) = P(Z < b) - P(Z<a) = the difference between the probability P(Z < b) and P(Z<a) that we read from the table.

Inverse Probability: Sometimes we will be given the probability and asked to compute a "cut off" point.

1. 2. 3.

Example: We may be given P(Z < c) = .975 and asked to compute c. You will see from the table P(Z<1.96) = . 975 and conclude that c=1.96. Example: We may be given P(l < Z) = .005 and asked to compute l. P(l<Z) represents the area on the right side of l, under the bell curve. So, P(Z < l) = 1 - .005 = .995. From the table P(Z<2.58) = .995 (actually .9951, but the exact match is not always expected). So, l=2.58. Visit the animation on Inverse Z distribution to inspect a particular type of cut-off problem that we will use later.

Given a N(,)-random variable X, we can use the Z-table to compute probabilities for X because of the following theorem. Theorem. Let X be a N(, )-random variable. Then Z = [(X-)/()] is a standard random varable. So, a- P(a < X < b) = P( OR P(a < X < b) = P(A < Z < B) where A= (a-)/ and B= (b-)/). Now we can use the Z-table. Problem Solving: We will have two types of problems in this sectionprobability computation and problems of inverse probability (or cut-off points). <Z< b- )

1. 3.

For a problem on normal random variables X with mean and standard deviation , the first step

is STANDARDIZATION. 2. Then, we look at the Z-table. Example: Suppose X is a N(2, .5) random variable and P(X<L) = .95, what is the cut-off L? First, we standardize and we have P((X-)/ < (L-)/) = P(Z < (L-)/ ) = .95. From table, P(Z < 1.65) = .95 (approximately). So, L/ = 1.65 an L = +1.65 = 2 + 1.65*.5 = 2.825. Ubiquity of Normal Random Variables: Any random variable that we encounter in nature is, almost certainly, either normal or approximately normal. If there is one concept that you take from this course it is this: nature's random variables are normal or approximately normal. You will hear about normal random variables and the bell curve in your workplace or anywhere you may have to use statistics.

Problems on 5.2: the Normal Random Variable Exercise 5.2.1. Let Z be the standard normal random variable. 1. 2. 3. 4. Find Find Find Find the the the the probability probability probability probability P(-1.1 < Z < 2.5). P(Z < -2.1). P(-2.1 < Z < -1.5). P(1.5 < Z).

Experiment with the normal animation. Solution Exercise 5.2.2. Let X be a normal random variable with mean = 3 and standard deviation = 1.5 . 1. 2. 3. 4. Find Find Find Find the the the the probability probability probability probability P(-1.1 < X < 2.5). P(X < -2.1). P(-1.2 < X < -0.5). P(1.5 < X).

Experiment with the normal animation. Experiment with the Solution Exercise 5.2.3. The length of life of some light bulbs produced in a factory is normally distributed with mean 8640 hours and standard deviation 1440 hours. Find the probability that a bulb will last 1. less than 5040 hours; 2. between 5040 hours and 8640 hours. Solution Exercise 5.2.4. The length X of a fish in a lake has normal distribution with mean 67 cm and standard deviation 21 cm. What proportion (i.e, probability) of fish are between 44 cm and 110 cm long? Solution Exercise 5.2.5. The diameter of the pumpkins in my patch has normal distribution with mean 13 inches and standard deviation 4.5 inches. What proportion (i.e., probability) of pumpkins is above 22 inches? Solution Exercise 5.2.6. The annual expenditure X of a student is approximately normally distributed with mean = 11,000 dollars and standard deviation = 1500 dollars. What percent of students spend less than 10,000 dollars? Solution Exercise 5.2.7. Suppose the annual production X of milk per cow is normally distributed with = 5500 liters and standard deviation = 150 liters. What percent of cows have annual yield less than 5155 liters? Solution Exercise 5.2.8. The amount of vegetable oil X produced by a machine in a day is normally distributed with = 130 liters and standard deviation = 25 liters. What is the probability that a machine will produce between 120 liters and 150 liters on a day? Solution Exercise 5.2.9. The weight X at birth of babies is normally distributed with mean = 114 oz and standard deviation = 18 oz. What percent of babies will have birth weight below 141 oz? Solution Exercise 5.2.10. Let Z be the standard normal random variable. 1. 2. 3. 4. Given Given Given Given that that that that P(-1.1 < Z < c)=.6881, find c. P(Z < c)=0.0222, find c. P(c < Z < 1.5) = 0.0919, find c. P(c < Z) = 0.102, find c.

Experiment with the normal animation. Solution Exercise 5.2.11. The length X of a fish in a lake has normal distribution with mean 67 cm and standard deviation 21 cm. On a fishing trip to the lake, you are instructed to release those in the lower 33 percent in length. What is the cut-off length? Solution Exercise 5.2.12. The telephone company's data shows that length X of their international calls has normal distribution with mean 11.5 minutes and standard deviation 4.3 minutes. The company decided to give a special rate for the longest 20 percent calls. What is the cut-off time length? Solution Exercise 5.2.13. The weight X of babies (of a fixed age) is normally distributed with with mean = 212 oz and standard deviation = 25 oz. Doctors would be concerned (not necessarily alarmed) if a baby is among the lower 5.05 percent in weight. Find the cut-off weight L below which the doctors will be concerned. Solution Exercise 5.2.14. Monthly water consumption X per household, in a subdivision in Kansas City, has normal distribution with mean 15000 gallons and standard deviation 3000 gallons. It has been decided that a surcharge will be imposed for those in the top 25 percent. Find the cut-off consumption U in gallons. Solution

5.3 Normal Approximation to Binomial A wide range of random variables behave approximately like a normal random variable. One such example is binomial(n,p)-random variables. Roughly, if X is a B(n,p) random variable, then X behaves approximately like a normal random variable with mean = np and standard deviation = [np(1-p)]1/2. As we know, a B(n,p) random variable X is discrete and P(X=r) = nC rpr(1-p)n-r On the other hand, if Y is a N(, ) random variable then P(Y = r) = 0. Because of this, some correction needs to be done. The following theorem states how to use normal approximation to binomial random variables. Theorem. Suppose X is a B(n,p) random variable. If n is large and p is not very close to 0 or 1, then X behaves, approximately, like a N(, ) random variable where = np and standard deviation = [np(1-p)]1/2. We have, for r=0,1,,n P(X = r) = P(r-0.5 < X < r + .5) =P(L < Z < R) where L=(r-0.5-)/ and R=(r+0.5-)/. More generally, for r,s=0,1,,n P(r X s) = P(r-0.5 < X < s + .5) =P(L < Z < R) where L=(r-0.5-)/ and R=(s+0.5-)/. Now use the Z table. This adjustment by .5 on two sides is called continuity correction. r=0,1,2,,n.

Problems on 5.3: Normal Approximation to Binomial Exercise 5.3.1. A Lawrence bank knows that 35 percent of its customers will visit the drive-through window. If 400 customers visit the bank, what is the approximate probability that more than 120 will visit the drive-through window? Solution Exercise 5.3.2. It is known that the probability that a household owns a food processor is 0.1. If 190 households are interviewed, find the approximate probability that 1. more than 26 households own a food processor; 2. less than 30 households own a food processor. Solution Exercise 5.3.3. The campaign committee of a candidate claims that sixty percent of the voters are in favor of the candidate. You interview 150 voters. Assuming that the campaign committe's claim is accurate, what is the approximate probability that less than 77 will favor the candidate? Solution Exercise 5.3.4. A technique is used to fertilize eggs in a fertility clinic laboratory. It is known that the probability that an egg will be fertilized by this technique is 0.1. If 500 eggs are treated, what is the probability that at least 60 eggs will be fertilized? Solution Exercise 5.3.5. The probability that a computer chip produced in a factory is defective is is .2. If you have a sample of 60 chips, what is the probability that the number of defective chips will be less than 20? Solution Exercise 5.3.6. The probability that a light bulb produced by a machine is defective is p = 0.2. Suppose a quality control inspector takes a sample of 120 bulbs. What is the probability that more than 30 bulbs will be defective? Solution Exercise 5.3.7. Suppose the probability that a student has access to the Internet is p = 0.8. Suppose you interview 160 students. What is the probability that less than 120 students will have access to the Internet? Solution Exercise 5.3.8. Suppose that the probability that a person favors medical use of marijuana is p = 0.6. If 780 individuals are interviewed, what is the probability that less than 450 will be in favor? Solution Exercise 5.3.9. Suppose that the probability that a middle-income family invests in the stock market is p = 0.8. If we interview 880 middle-income families, what is the probability that more than 700 have invested in the stock market? Solution Exercise 5.3.10. Suppose that an insurance company knows from experience that the probability that a life-insurance policyholder will survive another 10 years is p = 0.9. The company has 2280 policyholders. What is the probability that more than 2025 will survive another 10 years. Lesson 6 : Sampling Distribution

Introduction

6.1 Central Limit Theorem and Sampling Distribution of the Proportion

Homework 17

Introduction The sample mean x that we have computed in the previous chapters is, in fact, the observed value of a random variable X. Similarly, the sample variance s2 that we have computed before is the observed value of a random variable S2. Each time you collect a sample/data, the computed sample mean x is the value of the random variable X for this sample. This is explained in the following example.

Example. Suppose we want to study the height distribution of the U.S. population. We collect data of size n = 1713. We

shall consider that height xi of the ith individual in this sample is, in fact, the observed value of a random variable Xi. Here Xi is the notation for height of the ith member of the sample, which could be the height of any person from the whole U.S. population. When we finished collecting data we have n measurements x1, x2, , xn. They are, respectively, the observed values of n random variables X1, X2, , Xn. We (re)define the sample mean X as the random variable X1+X2++Xn X= X = n We also (re)define sample variance S2 as the random variable 1 S2 = n- 1 n i=1 .

(Xi -

) 2.

So, the sample mean we computed before in Lesson 2 is a value of X. We also say that X1, X2, , Xn is a sample from the population X = height of an American. We assume that our sampling was done with replacement. Such a sample has the following properties.

1. 2. 3. 4. 5.

Let X = height of an American and let mean of X be and variance 2. Then X is called the parent or the population random variable. Also and 2 are called the population mean and variance. Then, each of the sample member Xi has the same distribution as X. So, mean of Xi is and variance of Xi is 2. The sample members X1,X2, , Xn are all mutually independent. The distribution of X is called the sampling distribution of X. Theorem. The mean of the sample mean X is the population mean , that is E(X) = E(X) = The variance of the sample mean X is given by Var(X) = 2/n So, the standard deviation of X, denoted by X, is given by X = /n.

6.

Definition. The standard deviation X is also called standard error.

Remark. In the above discussion, we have assumed that the sampling was done with replacement. That means that each time a sample member is drawn, it is placed back before we select the next member. A member could, therefore, appear more than once. Although this may seem unnatural, when we are working with a large population this is not likely to happen and is most natural from the statistical point of view. (How often would one receive calls twice for the same poll?) The type of sampling where we do not place back the item selected before we select the next one is called sampling without replacement. Although many textbooks have a lengthy discussion of this concept, we will not emphasize it. All our samples are drawn with replacement and have the above properties.

6.1 Central Limit Theorem and Sampling Distribution of the Proportion Central Limit Theorem

Suppose X1,X2, ,Xn is a sample from a population X with mean and variance 2. Assume n is large. 1. Then the sample mean X is, approximately, distributed as N(,X) where X= /n.

2. So, approximately, P(a < X <b)=P(L < Z < R) where L=(a-)/ X OR a- P(a < X < b) = P /n 3. If the parent population X is Normal, then 1) and 2) are exact. Sampling Distribution of the Proportion Suppose you are conducting a poll to determine the proportion p (or percentage) of people in favor of a certain presidential candidate. You interview a randomly selected sample of n voters. Then you let X be the number of people among these n voters who are in favor of the candidate. Then X/n is the proportion in this sample that are in favor of the candidate. We use this sample proportion X/n as an estimate for the proportion of the entire voter population that are in favor of the candidate. This is the number X/n that the pollsters report on TV every evening before the election. Here p is the proportion of voters that are in favor of the candidate. So, X is a B(n,p) random variable. We have already seen (section 5.3 in lesson 5) that, approximately, X follows a N(, ) distribution, where = np, = (np(1-p)). From this it follows that the sample proportion X/n, approximately, has N (p, ) distribution where =(p(1-p)/n)1/2. In fact, the same could be derived from the central limit theorem. Let Y=1 if success Y=0 if failure Here by "success" we mean that the voter is in favor of the candidate. Then Y is a Bernoulli(p) random variable and the mean of Y is p and the variance(Y) = p(1-p). The response of each voter in the sample could thus be represented as a random variable as follows Xi=1 if ith sample is a success Xi=0 if ith sample is a failure Then X1,X2, , Xn is a sample from the Y- population, and the sample proportion X/n = X =(X1+X2+ +Xn)/n is the sample mean. So, by CLT the sample proportion X=X/n, approximately, has N(p,) distribution where =(p(1-p)/n)1/2. The final formulas regarding sample proportion X=X/n are as follows: 1. The mean and the standard deviation of X=X/n are given by <Z< /n b- . and R=(b-)/ X

=p

= (p(1-p)/n)1/2.

2. So, approximately, P(a < X <b)=P(L < Z < R) where L=(a-)/ OR a- p P (a < X/n < b ) = P X/n <Z< X/n b- p . and R=(b-)/

Remark. The same thing applies when you are trying to estimate the proportion of success p. Some examples might be the proportion of defective items, the proportion of people in favor of capital punishment, the proportion of immigrants. Remark. The normal approximation of the sample proportion given above is not really different from the normal approximation of the binomial random variable (section 5.3). The only difference is the way we use them. In section 5.3, we used continuity correction. For large n, continuity correction is, in fact, negligible and will not have any effect.

Problems on 6.1: Central Limit Theorem and Sampling Distribution of the Proportion Problems on Central Limit Theorem: Exercise 6.1.1. It is known that the tuition paid per semester by students in a university has a distribution with mean $2,050 and standard deviation $310. If 64 students are interviewed, what is the approximate probability that the sample mean tuition paid will be above $2,060? Solution Exercise 6.1.2. The monthly water consumption X per household in a subdivision in Kansas City has normal distribution with mean 15000 gallons and standard deviation 3000 gallons. What is the probability that the mean consumption of the 44 households in the subdivision will exceed 16000 gallons? Solution Exercise 6.1.3. According to some data, the annual Kansas wheat export X has a mean 733 million dollars and standard deviation 163 million dollars. What is the probability that over the next 10 years Kansas wheat exports will exceed 8040 million dollars? Solution Problems on Population Proportion: Exercise 6.1.4. According to a report entitled "Pediatric Nutrition Surveillance" published by Centers for Disease Control (CDC) 18 percent of the children younger than two had anemia in 1997. On a particular day in that year, a pediatrician examined 180 children. 1. What is the expected (sample) proportion of children with anemia? 2. What is the variance of the sample proportion of children with anemia? 3. What is the probability that the proportion will exceed 0.20? Solution Exercise 6.1.5. On one day during an impeachment hearing, it is claimed that 75 percent of eligible voters think the President should not be impeached. Suppose we interview 700 voters. Assuming the above, what is the probability that the sample proportion of voters who do not think the President should be impeached 1. is less than .73? 2. is less than .70? 3. is less than .60?

Lesson 7: Estimation

Introduction 2 7.3 Confidence Interval for Introduction

7.1 Point and Interval Estimation 7.4 About the Population Proportion

7.2 When Is Unknown Homework 18 - 24

The name of the game in statistics is trying to understand the POPULATION on the basis of the information available in the SAMPLE. Part of what we mean by "understand" is estimating the values of the population parameters. The game here is to use suitable sample STATISTICS to estimate population parameters. For example, we may like to use the sample mean x as an estimate for the population mean . We consider two methods of estimating parameters.

1. 2.

The first one is called point estimation. In point estimation, we give a number as an estimate for the parameter. For example, if we are trying to estimate the mean height of the American population, we may take a sample of a certain size, compute the sample mean height x, and call it an estimate for . The second one is called interval estimation. In interval estimation we give an interval (L, U) and say that the parameter will be within this interval (with a certain level of confidence). For example, when estimating the mean height of the American population, we may take a sample, compute the sample mean x and say that the population mean is in the interval (x-1, x+1). Obviously, in interval estimation, the smaller the length, U-L, of the interval and the higher the level of confidence, the better the estimation is.

7.1 Point and Interval Estimation As we have already mentioned, we use a statistic to estimate a parameter. The statistic T used to estimate a parameter is called an estimator of . The computed value t of T is called a point estimate or an estimate of . For example, the sample mean X is an estimator of and the computed value x is an estimate of . The estimator is a sampling random variable. Similarly, the sample variance S2 is an estimator of the population variance 2 and the computed value s2 is an estimate of 2. It may be intuitively clear to you why X and S2 would be reasonable estimators, respectively, for and 2. Mathematically, the reasons are as follows: 1. We have E(X) = E(S2) = 2.

For this reason we say X and S2 are unbiased estimators, respectively, for and 2. var(X) = 2/n is small if n is large. So, for large n, the standard deviation of X decreases. This means that values of X will be X close to the mean more frequently. This improves the level of confidence for X as an estimator of . View the animation on normal distribution to see how the probability mass concentrates around the mean as the standard deviation decreases. Interval Estimation We would almost never expect a point estimate t of a parameter to be exactly equal to the actual value of . This is why it is more reasonable to give an interval (L,U) and say that would be within this interval. Here L, U will be statistics. Since the computed values of L = l,U = u will depend on the sample, we do not expect that the value of will always be within this computed interval (l,u). We are happy as long as the true value of falls within the interval (l,u) most often (or often enough), allowing the possibility of being "wrong" a few times. But how often is often enough? The probability P(L < < U) tells us how often the paramenter will fall within (l,u). So, it is also reasonable to give the probability P(L < < U) or P( (L,U)). This is what we do in interval estimation, also called a confidence interval of .

2.

Definition. Let be a population parameter. An interval estimate for provides the following: 1. It gives an interval (L,U) as an estimate for . Here L,U are statistics. 2. It also gives the probability P(L < < U). This number P(L < < U) = 1- is called the level of confidence. And (L,U) is said to be a (1-)100 percent confidence interval of . 3. In practice, will be a small number, like, 0.1, 0.01, 0.05. We need the following definition. Definition: Given a number 0 < < 1, the number z is defined by the formula P(Z > z) = . View the animation on inverse Z-distribution to understand the numbers z . As mentioned above, for us a will be a small number .1, .01, .05 and so on. At the end of the Z-table is a list of the numbers z that we may need frequently.

A (1-)100 percent confidence interval for the mean : Suppose X is a random variable with mean and variance 2. We want to construct a confidence interval for . We assume that is known. Let X1,X2, , Xn be a sample from X. Note that from CLT we have, approximately, P(-z/2 < Z < z/2 ) = 1 - where Z=(X-)n/. If we simplify, we get P(X-E < < X+E)=1- where So we have the following theorem. Theorem. Assume that is known. Then a (1-)100 percent confidence interval for is given by X-E < < X+E where Remarks. 1. If you go on computing (1-)100 percent confidence intervals on a regular basis, the true value of will not be within the confidence interval 100 percent times. E=z/2 /n. E=z/2 /n.

2.

The confidence interval we computed above may also be called a (1-)100 percent two sided confidence interval for . There could be all kinds of confidence intervals. For example, if P(L < < ) =1 - . then (L, ) will be a (1-)100 percent one sided (upper) confidence interval for .

Definitions and Formulas:

1. 2. 3.

The length l of this (1-)100 percent confidence interval for is given by l = 2z/2/n. The margin of error E is defined as E = z/2/n. The sample size n needed for a (1-)100 percent confidence interval to have a preassigned margin of error E is given by

n = (z/2/E)2. To be sure, always round upward in this class. Also use the Z-table for online homework.

Use of Calculators (if you have a TI-83): Z-interval 1. Press stat and then select TESTS.

2.
3. 4. 5. 6. 7.

Select Z-interval and enter. Input: you will have to select stats (not data) in this section. Feed in the values of , x, n and c-level. Select calculate and enter. It will give you the confidence interval. The margin of error = E = (width of the interval)/2. To compute the sample size, use the formula above.

Problems on 7.1: Point and Interval Estimation Exercise 7.1.1. Assume that you have a normal population with mean and standard deviation = 15. Suppose you have collected a sample of size 25 and the sample mean X was found to be 81. 1. Find a 99 percent confidence interval for .

2.

Find the margin of error at 99 percent level of confidence. Solution

Exercise 7.1.2. Assume that you have a normal population with mean and standard deviation = 9.8. Suppose you have collected a sample of size 14 and the sample mean X was found to be 151.1. 1. Find a 99 percent confidence interval for .

2.

Find the margin of error at 99 percent level of confidence. Solution

Exercise 7.1.3. The time taken by an athlete to run an event is normally distributed with mean and known standard deviation = 3.5 seconds. To estimate the mean , he ran 16 times and the sample mean was found to be X = 33 seconds. 1. Find the margin of error in estimating the true mean with 95 percent level of confidence.

2.

Find a 99 percent confidence interval for . Solution

2 Exercise 7.1.4. A population has normal distribution with variance = 289. How large a sample do we need to estimate the mean within 3 units from the true value of , with 90 percent confidence? Solution Exercise 7.1.5. The tuition X paid by a student per semester in a university has a distribution with mean and = $416. How large a sample should you draw so that you are 95 percent sure that the true value of will be within $10 of the sample mean x? Solution

7.2 When Is Unknown Let X be a normal random variable with mean and variance 2. Unlike in the last section, in this section we assume that is not known, and we try to compute a confidence interval of . In the last section, the main tool (or fact) that we used was that Z=(X-) n/ has N(0,1) distribution. In this section, we use the distribution of

T=(X-) n/S. The distribution of T is known as t-distribution with degrees of freedom n-1, which we have not discussed. As we did for the N(0,1) random variable, we will now give the properties of t-distribution.

About t-distribution Given a positive integer , there is a random variable T = t that is said to have t-distribution with degrees of freedom . The useful properties of t-distribution are listed below:

1.

A t-random variable has degrees of freedom. If a random variable T has t-distribution with degrees of freedom then we say that T has t distribution.

2. The t-random variables are continuous random variables. 3. The mean of a t-random variable is ZERO. 4. The graph of the pdf of a t-random variable is symmetric around the y-axis and has a bell shape.

a. b. 5. 6.

Flash animation: t-distribution Flash animation: probability computation.

For a T = t random variable, if the degrees of freedom is large, then it can be approximated by a N(0,1) random variable. For a number 0 < < 1 and any positive integer , we define a number t, by the equation P(T > t, ) = where T has t-distribution with degrees of freedom . View the animation on inverse-T distribution to undertand the numbers t .

7.

Tables are available, one for each degree of freedom , that can be used to compute the probability for T-random variables. We will need only some of the numbers t . A table sufficient for us is provided at link for a table . , Theorem. Let X be a normal random variable with mean and standard deviation . Let X ,X ,, X be a 1 2 n sample of size n from the X population. Then T=(X-) n/S.

has t-distribution with degrees of freedom n-1. So, P(-tn-1,/2 < (X-)n/S < tn-1,/2 ) = 1-.

If we simplify, we get P(X-E < < X+E)=1- where E=tn-1,/2S/n.

A (1-)100 percent Confidence Interval for Under the set up of the theorem, a (1-)100 percent confidence interval for is given by X-E < < X+E E=tn-1,/2s/ n E is also called the margin or error. A Frequently Asked Question:To estimate , when do we use the ZInterval and when do we use the TInterval? Answer: We use the TInterval only when is not known. where

Use of Calculators (if you have a TI-83): T-interval 1. If we have raw data, enter the data into the Calculator. 2. Press stat and then select TESTS.

3.

Select T-interval and enter.

4. Input: you will have to select stats or data, depending on what is given. 5. Feed in the values of sample standard deviation s, x, n or the List where you have the data and c-level. 6. Select calculate and enter. It will give you the confidence interval. 7. The margin of error = E = (width of the interval)/2. Problems on 7.2: When Is Unknown Exercise 7.2.1. Assume that we have normal populations with mean and standard deviation . We have a sample of size n = 18 that has sample mean x = 170.5 and standard deviation s = 13.3. Find the margin of error and compute a 99 percent confidence interval for . Solution Exercise 7.2.2. Suppose that the time taken to complete a problem in a Math 365 test is normally distributed with mean and standard deviation . A sample of size 23 was taken, and sample mean and standard deviation were found to be x = 4.7 and s = .47. Estimate the mean time taken to complete a problem using a 98 percent confidence interval. Solution Exercise 7.2.3. It is assumed that the lifetime (in hours) of lightbulbs produced in a factory is normally distributed with mean and standard deviation . To estimate the following data was collected on the lifetime of bulbs. 5110 7783 4671 4560 6441 6074 3331 4777 5055 4707 5270 5263 5335 4978 4973 5418 1837 5123

Compute a 95 percent confidence interval for . Write down the formula for (1-)100 percent confidence interval that you use here. Solution Exercise 7.2.4. To estimate the mean weight (in pounds) of salmon in a river the following sample was collected: 34.7 31.8 33.8 41.5 38.2 44.5 20.3 29.2 27.8 25.3 45.3 29.6 43.1 39.5 37.3 29.1 32.5 37.3 32.3

Compute a 99 percent confidence interval for the sample mean . Write down the formula for (1-)100 percent confidence interval that you use here. Solution Exercise 7.2.5. Suppose we collect a sample from a normal population of size n = 40 with sample mean X = 18.6 and standard deviation s = 9.486. Construct a 95 percent confidence interval for mean . Solution Exercise 7.2.6. The time taken by an athlete to run an event is normally distributed with mean and unknown standard deviation . To estimate the mean he ran 16 times and the sample mean was found to be X = 33 seconds and the sample standard deviation s = 3.5 seconds. 1. Find the margin of error in estimating the true mean with 99 percent level of confidence.

2.

Find a 99 percent confidence interval for . Solution

7.3 Confidence Interval for 2 Let X be the normal random variable with mean and variance 2. In this section, we will construct a confidence interval for 2. We will take a sample X1,X2, , Xn of size n from the X population. Let X be the sample mean and let S2 be the sample variance. To compute a confidence interval for 2, we will be using the distribution of U = (n-1)S2/2 The distribution of U is known as 2 distribution with degrees of freedom n-1, which we have not discussed. Next we will give the properties of a 2 random variable. About 2-distribution Given a positive integer , there is a random variable 2 that is said to have 2 distribution with degrees of freedom . The useful properties of 2 distribution are listed below.

1. 2. 3. 4.

A 2 random variable has a degree of freedom. If a random variable U has 2 distribution with degrees of freedom then we say that U has 2-distribution. The 2 random variables are all continuous random variables. A 2 random variable is always nonnegative. 2 The graph of the pdf of a random variable is skewed to the right. If the degrees of freedom, , is large then it can be approximated with a N(0,1) random variable. View the animations on pdf of Chi-Square random variable and probability distribution of Chi-Square.

5. 6.

If U is a 2 random variable then the mean of U is . (We will not need this.) This fact is reflected in the animation above. For a number 0 < < 1 and any positive integer , we define a number 2, by the equation )= P(U > 2 v, 2 where U has distribution with degrees of freedom .

2 View the animation on inverse Chi-Square distribution to undertand the numbers

7.

2 Tables are available, one for each degree of freedom , that can be used to compute probability for -random 2 variables. For our purpose, only some of the numbers will be needed. Here is a link for a table that will be , sufficient for us.

Theorem. Let X be a normal random variable with mean and variance 2. Let X1,X2,,Xn be a sample of size n from the X population. Then T = (n-1)S2/2 has 2 distribution with degrees of freedom n-1. So, P(2 n-1,1-/2 < (n-1)S2/ 2 < 2 n-1,/2 ) = 1-. If we simplify, we get P(L < 2 < U) where L = (n-1)S2/2n-1,/2 U = (n-1)S2/2n-1,1-/2 Theorem. Under the same set-up as in the above theorem, a (1-)100 percent confidence interval for the variance 2 is given by l < 2 < u where l = (n-1)s2/2n-1,/2 u = (n-1)s2/2n-1,1-/2 OR (n- 1)s2 <2< 2n- 1, [()/2] 2n- 1, 1- [()/2] (n- 1)s2 . = 1-

Use of Calculators: The TI-83 will not compute the confidence interval for 2. If data is given, it is important to use the calculator to compute the sample variance s2. Problems for 7.3: Confidence Interval for 2 Exercise 7.3.1. Suppose that we have collected a sample of size n = 26 from a normal population with mean and 2 2 2 variance . The sample variance was found to be s = 26.7. Compute a 95 percent confidence interval for . Solution Exercise 7.3.2. The following is sample data on the amount (in 1000 bushels) of wheat harvested by Kansas farmers in 2002. 206 600 300 225 200 933 385 320 280 260

1.
Solution

Compute a 99 percent confidence interval for the variance of harvest 2.

Exercise 7.3.3. The following is data on monthly gas consumption (in ccf) during the winter months by a household.

154 228

222 240

264 393

257 278

127 140

1.
Solution

Compute a 99 percent confidence interval for the variance 2.

7.4 About the Population Proportion Once again, let p be the population proportion of a certain attribute. We want to compute a confidence interval for p. We let X = 1 if success X = 0 if failure where "success" means that the sample has the attribute. So, X is a Bernoulli(p) random variable. We draw a sample X1,X2,, Xn from the X population, let X = X1++Xn be the total number of success and X=X/n be the sample proportion of success. We have seen that, approximately, the sample proportion X has N(X, X)-distrubution where X = p and X = ((p(1-p))/n). Therefore, P(-z/2 < (X-p)/X < z/2 ) = 1-. In an attempt to compute a confidence interval for p we simplify and get P(X-z/2 X < p < X+z/2 X) ) = 1-. Since p is unknown, this will not produce a confidence interval for p. But the sample proportion x of success is a point estimate of p. So we have an approximate (1-)100 percent confidence interval for p given by x-e < p < x+e where e = z/2 (x(1-x)/n) Following are some of the useful formulas and definitions that we may need.

1. 2.

The margin of error e is defined as e = z/2 (x(1-x)/n) A conservative margin of error E is defined as E = z/2/4n. It can be checked that the margin of error e is always less or equal to the conservative margin of error E.

3.

Theorem. For a (1-)100 percent confidence interval for p, if we are given a preassigned conservative margin of error E, then the sample size n that we need to take is given by n = (z/2/2E)2 , rounded to the higher integer.

Remark. In the days of Clinton's impeachment, we often heard TV newscasters read something like the following. President Clinton has 64 percent approval rating. The poll has a margin of error plus or minus 3.1 percentage points. The poll surveyed 972 people. They mean that the sample proportion x of people who "approve" President Clinton is 0.64. Normally they don't tell us the level of confidence they are using. Assuming that they are using a 95 percent confidence interval, they mean that E = z/2 /4n = 1.96/(4x972) = 0.031.

Use of Calculators (if you have a TI-83): 1-PropZint 1. Press stat and then select TESTS.

2.
3. 4. 5. 6. 7.

Select 1-PropZint and enter. Feed in the values of number of success x, n and c-level. Select calculate and enter. It will give you the confidence interval. The margin of error = e = (width of the interval)/2. To compute the conservative margin of error, use the formula in the definition. To compute the sample size, use the formula above.

Problems on 7.4: About the Population Proportion Exercise 7.4.1 In a sample of 197 apples from a lot, 19 were found to be sour. Set a 99 percent confidence interval for the proportion p of sour apples in the lot. Solution Exercise 7.4.2. A new vaccine was tried on 147 randomly selected individuals, and it was determined that 97 of them developed immunity. Find a 95 percent confidence interval for the proportion p of individuals in the population for whom the vaccine would help. Solution Exercise 7.4.3. Before a congressional election, a poll was conducted. Out of 887 randomly selected voters interviewed, 389 said that they would vote for Candidate A, and 359 said that they would vote for Candidate B. 1. Construct a 98 percent confidence interval for the proportion p of voters who would vote for A. 2. Construct a 98 percent confidence interval for the proportion p of voters who would vote for B. 3. What is the conservative margin of error for both? Solution Exercise 7.4.4. If a pollster wanted to estimate the proportion p of Americans who think that the President should not be impeached, how large a sample should he/she take so that the true value of p will be within .02 of the sample proportion, with 99 percent confidence? Solution Exercise 7.4.5. The proportion p of defective lightbulbs produced by a machine needs to be estimated within .01 to determine whether the machine needs to be replaced. How large a sample should we take to do this with 90 percent confidence? Solution Exercise 7.4.6. In a poll released on October 28,1998, it was revealed that 60 percent of Americans wanted President Clinton rebuked but not impeached. The poll was conducted among 1,013 adults, and it had a margin of error of 3 percentage points. 1. Can you relate the last two numbers? 2. What is the level of confidence used here? Solution: News media polls use 95 percent confidence intervals. When they say "margin of error," they mean "conservative margin of error." The conservative margin of error E and level of confidence 1 - are related by the formula E = z/2 /4n. For this problem E = .03, 1 - =.95, and n =1,013. We can check z/2 /4n = 1.96/(4x1013) = 0.03079.

Lesson 8 : Comparing Two Populations

Introduction 8.2 When and are Unknown 1 2 Homework 25 - 27 Introduction

8.1 Confidence Interval of - 1 2 8.3 Comparing Two Population Proportions

In this lesson we try to compare two populations. We will consider the following:

1. Compute a confidence interval of the difference 1- 2 of the means of two populations. For example, we may like
to estimate the difference 1 - 2 between the mean 1 = annual male income and the mean 2 = annual female income in the United States.

2. Compute a confidence interval of the difference p1-p2 of the proportions of an attribute present (or proportions of
"success") in two populations. For example, we may like to estimate the difference p1-p2 between p1 = the proportion of defective items produced by the new machine and p2 = the proportion of defective items produced by the old machine. 8.1 Confidence Interval of 1- 2 Suppose X, Y are two similar random variables. Let mean and standard deviation of X be, respectively, 1 and 1. Let mean and standard deviation of Y be, respectively, 2and 2. We want to compute a confidence interval for the difference 1- 2. So we do the following.

1. We draw a sample X1, X2, , Xm, of size m, from the X population and we draw a sample Y1, Y2, , Yn, of size n,
from the Y population. Let X = (X1+X2+ +Xm)/m Y = (Y1+Y2+ +Yn)/n be the corresponding sample means. 2. BY CLT, we have that X has N(1, 1/m ) distribution and Y has N(2, 2/n ) distribution.

3. You would agree that X-Y is a natural estimator of 1- 2.


4. Now we assume that the X samples and Y samples are mutually independent. In that case, it follows that X-Y has N(1 - 2, ) - distribution, where

= ( 12/m + 22/n ). 5. It follows that P(-z/2 < ((X-Y) - (1 - 2)) / < z/2 ) = where is as above in (4). 6. If we simplify, we get P(X-Y -z/2 < 1 - 2 < X-Y +z/2 ) = where is as above in (4). 1 - . 1 - .

7. Theorem. A (1-)100 percent confidence interval for 1- 2 is given by


x-y -z/2 < 1 - 2 < x-y +z/2

where is as above in (4). This formula is usable if we know the values 1 and 2.

8. The margin of error is given by


E = z/2 where is as above in (4).

Use of Calculators (if you have a TI-83): 2-SampZinterval 1. Press stat and then select TESTS.

2. 4.

Select 2-SampZinterval and enter.

3. Input: you will have to select stats (not data) in this section. Feed in the values of 1, 2, x,y, m, n and c-level. 5. Select calculate and enter. It will give you the confidence interval. 6. The margin of error = E = (width of the interval)/2.

Problems on 8.1: Confidence Interval of 1 - 2 Exercise 8.1.1. Suppose we have two normal populations with means , and standard deviation , respectively. 1 2 1 2 It is known that = 8.1 and = 11.3. A sample of size m = 64 was collected from the first population, and the sample 1 2 mean was found to be x = 3.7. A sample of size n = 99 was collected from the second population, and the sample mean was found to be y = 4.1. Compute a 95 percent confidence interval for the difference of mean - . 1 2 Solution Exercise 8.1.2. The birth weight of babies in developed and developing countries are normally distributed with mean 1, 2 and standard deviation 1, 2, respectively. (My data is not real.) Given 1 = 2.3 pounds and 2 = 2.9 pounds. A sample of size m = 35 babies from the developed nations were collected and the sample mean birth weight was found to be x = 8.9 pounds. A sample of size n = 48 babies from the developing nations was collected and the sample mean birth weight was found to be y = 7.1 pounds.

1. Compute a point estimate of the difference of mean birth weight 1- 2. 2. Determine the margin of error of the difference 1- 2 at the 95 percent level of confidence. 3. Construct a 95 percent confidence interval for 1- 2.
Solution Exercise 8.1.3. African elephants and Indian elephants are different in height, weight, and length of ear and tusk. It is natural to assume that all these are normally distributed. The mean height and standard deviation of African elephants are 1, 1 = 1.2 feet, respectively. The mean height and standard deviation of Indian elephants are 2, 2 = 1.1 feet, respectively. A sample of size 25 African elephants were collected and the sample mean height was found to be x = 10.9 feet. A sample of size 28 Indian elephants was collected and the sample mean height was found to be y = 9.1 feet.

1. Compute a point estimate of the difference of mean height 1- 2. 2. Determine the maximum error of the difference 1- 2 at the 99 percent level of confidence. 3. Construct a 99 percent confidence interval for 1- 2.
Solution

8.2 When 1 and 2 are Unknown As in the last section, we have two populations X, Y. We assume that X has N(1, 1) distribution and Y has N(2, 2) distribution. Unlike in the last section, we assume that1, 2 are unknown. We try to find a confidence interval for 1 - 2.

We take a sample X1, X2, , Xm of size m from the X population, and we take a sample Y1,Y2, , Yn from the Y population. Following are some facts and notations.

1. Assumptions: We make an important assumption that the variances 12 and 22 are equal. So, we write
1 = 2 = .

And, we also assume that the X-sample and the Y-sample are mutually independent. 2 Let X and S X

2. be the sample mean and sample variance of the X-sample and let Y and SY2 be the sample mean and sample
variance of the Y-sample.

3. Definition. Define the pooled estimate Sp2 for 2 as follows


S p2 =

[(m-1)SX2+(n-1)SY2 ]/ [m+n-2] =

[ (Xi-X)2 + (Yj-Y )2 ] / [m+n-2] Although both SX2, SY2 are estimators of 2, Sp2 is a better estimator for 2 because it uses both the samples. One can see that Sp2 is a weighted average of SX2and SY2. 4. It follows that T = [ (X - Y) - (1 -2) ] / [Sp(1/m + 1/n) ] has a t-distribution with m+n-2 degrees of freedom.

5. Using the same kind of computations that we have done before, we see that a (1-)100 percent confidence
interval for 1- 2 is given by x-y-E where E=tm+n-2,/2 Sp (1/m + 1/n) < 1- 2 < x-y+E

Use of Calculators (if you have a TI-83): 2-SampTint 1. If we have raw data, enter the data into the calculator in 2 lists (say L1,L2). 2. Press stat and then select TESTS.

3. 5.

Select 2-SampTinterval and enter.

4. Input: you will have to select stats or data, depending on what is given. Feed in the values of sample standard deviation s1, s2, x, y, m, n or the Lists where you have the data and clevel. 6. Select calculate and enter. It will give you the confidence interval and also the pooled estimate of the equal standard deviation . 7. The margin of error = E = (width of the interval)/2.

Problems on 8.2: When 1 and 2 Are Unknown Exercise 8.2.1. Suppose that we are comparing two "similar" normal populations with means 1, 2 respectively and the populations both have standard deviation . We collected a sample of size m = 11 from the first population that produced a sample mean x = 13.2 and sample standard deviation s1 = 2.33. A sample of size n = 13 was collected from the second population that had sample mean y = 11.5 and sample variance s2 = 2.73.

1. Compute the pulled estimate sp for . 2. Find a point estimate for 1- 2. 3. Compute the margin of error in estimating 1- 2 at the 90 percent level of significance.

4. Compute a 90 percent confidence interval for 1- 2.


Solution Exercise 8.2.2. Suppose we have two normal populations with means 1, 2 and equal standard deviation . A sample of size m = 64 was collected from the first population and the sample mean and standard deviation were found to be x = 3.7, s1 = 9.2 . A sample of size n = 99 was collected from the second population and the sample mean and standard deviation were y = 4.1, s2 = 8.7.

1. Compute the pulled estimate sp for . 2. Compute the margin of error for a 95 percent confidence interval for 1- 2. 3. Compute a 95 percent confidence interval for the difference of mean 1- 2.
Solution Exercise 8.2.3. The birth weight of the babies in developed and developing countries are normally distributed with mean 1, 2 and equal standard deviation . (My data is not real.) Suppose the following data about the birth weight from developed and developing nations were collected. Developed 8.8 7.1 8.2 10.1 7.2 8.1 5.3 7.9 9.9 6.3 7.7 8.3 8.8 9.7 9.1 8.9 7.8 6.3 8.1 9.0 5.2 6.3 7.1 9.1 6.3 5.7 Developing 5.2 8.1 8.1 7.1 6.8 8.3 7.9 7.0 6.3 8.3 5.9 6.3 4.9 6.1 7.7 5.5 6.9 5.3 5.8

1. Compute a point estimate of the difference of mean birth weight 1- 2.


2. Compute the pulled estimate for .

3. Determine the maximum error of the difference 1- 2 at the 95 percent level of confidence. 4. Construct a 95 percent confidence interval for 1- 2.
Solution Exercise 8.2.4. African elephants and Indian elephants are different in height, weight, and length of ear and tusk. It is natural to assume that all these are normally distributed. Assume that the height of African and Indian elephants have an equal mean . The mean heights of African elephants and Indian elephants are 1, 2, respectively. Suppose the following data were collected on the height of elephants from the two continents (these are not real data). African 10.9 8.8 9.1 13.1 11.7 12.9 8.7 12.9 9.3 11.7 10.5 9.5 9.9 9.1 11.3 10.7 11.5 11.1 12.3 11.3 7.1 9.3 7.9 8.7 8.3 9.7 9.9 8.8 Indian 8.2 8.9 9.2 9.3 9.1 8.8 8.8 10. 1 10.3 9.1 8.1 9.9

9.9

1. 2. 3.
Solution

Compute a point estimate of the difference of mean height 1- 2. Determine the maximum error of the difference 1- 2 at the 99 percent level of confidence. Construct a 99 percent confidence interval for 1- 2.

8.3 Comparing Two Population Proportions In this section, we compute a confidence interval for the difference p1-p2 of two population proportions. An example follows. Example. We would like to have an estimate for the difference between the proportion p1 of males who are making more than fifty thousand dollars annually and the proportion p2 of females who are making more than fifty thousand dollars annually. We construct a confidence interval for p1-p2. Similarly, we might like to compare the proportion of defective items produced by an old machine and new machine in a factory. Assume we have two populations. Let p1 be the proportion of Population 1 that has an attribute A and let p2 be the proportion of Population 2 that has the attribute A. We want to compute a confidence interval for p1-p2. So, we take a sample of size m from Population 1 and let X be the number of sample members that have the attribute A and X=X/m be the sample proportion that has the attribute A. ( We may say that X is the number of "success" in this sample from Population 1 and X=X/m is the proportion of "success".) We take a sample from Population 2 of size n, which is independent of the other sample. Let Y be the number of sample members that has attribute A and Y=Y/n be the sample proportion that has the attribute A. (So, Y=Y/n is the sample proportion of "success" from Population 2.) (Let me explain the context of the example above. We interview m males and X would be the number of males in this sample who make more than fifty thousand annually and X=X/m would be the proportion of the males in this sample who make more than fifty thousand annually. Similarly, we interview n females and Y=Y/n would be the proportion of females in this sample who make more than fifty thousand.) We develop a confidence interval for p1-p2 as follows.

1.

Notation. For the sample proportions, we have the following notatons: X=X/m Y=Y/n

2. 3. 4.

As we have seen before, by CLT, we have that X has N(p1,1) distribution where 1 = (p1(1-p1) /m) and Y has N(p2,2) distribution where 2 = (p2(1-p2) /n). You would agree that X-Y is a natural estimator of p1-p2. As we have assumed that the X samples and Y samples are mutually independent, it follows that X-Y has N(p1p2,) distribution where = ( 12 + 22 ).

5. So, it follows that P(-z/2<( (X- Y)-(p1-p2))/ < z/2 ) = 1-

6. 7.

If we simplify, we get P((X- Y) -z/2 < p1-p2< (X- Y) +z/2) = 1- As in section 7.4, we use X as an estimate for p1 and Y as an estimate for p2 and get the following theorem.

Theorem. An approximate (1-)100 percent confidence interval for p1-p2 is given by X-Y -E < p1-p2 < X-Y+E where E= Z/2( X(1-X)/m + Y(1-Y)/n )

8.

The E is called the margin of error.

Use of Calculators (if you have a TI-83): 2-PropZint 1. Press stat and then select TESTS.

2. 3.

Select 2-PropZint and enter.

Feed in the values of number of successes x, y, sample sizes n1, n2 and c-level. 4. Select calculate and enter. It will give you the confidence interval. 5. The margin of error = E = (width of the interval)/2. Problems on 8.3: Comparing Two Population Proportions. Exercise 8.3.1. Suppose two independent samples were collected from two populations. We want to compare the proportions p ,p , respectively, of an attribute A present in these two populations. Use 95 percent confidence interval to 1 2 estimate p -p . We are given that x = 55 had the attribute A in a sample of size m = 117 from the first population and y 1 2 = 37 had the attribute A in a sample of size n = 79 from the second sample. Solution Exercise 8.3.2. To compare the proportions p ,p of defective items produced by new and old machines, respectively, 1 2 samples were collected. In a sample of 57 items from the new machine, 6 were found to be defective; and in a sample of 41 items from the old, 9 were defective. Compute a 99 percent confidence interval for p -p 1 2 Solution Exercise 8.3.3. To compare the proportions p1,p2 of men and women, respectively, who watch football, data was collected. In a sample of 199 men, 83 said that they watch football; and in a sample of 161 women, 51 said they watch football. (These are not real data.) Construct a 99 percent confidence interval for p1-p2. Lesson 9 :Testing Hypotheses

9.1 The Philosophy of Testing Hypotheses 2 9.4 Testing Hypotheses on Variance 9.7 Comparing Means of Two Populations: , Unknown 1 2 9.1 The Philosophy of Testing Hypotheses

9.2 Developing a Test 9.5 Population Proportion 9.8 Comparing Proportions p , 1 p of Two Populations 2

9.3 Testing on a Single Population 9.6 Testing of Hypotheses to Compare Two Populations Homework 28 - 32

In this lesson we will test a hypothesis H0, called Null hypothesis, against hypothesis HA, called the alternative hypothesis. Only one of these two hypotheses is true.Based on the collected sample and testing criterion that we will set up, we will accept only one of them and reject the other. Example 1. Suppose we want to test the hypothesis that the disparity between the wages (annual income) of working men and women does not exist any more. Let 1 be the mean annual income of men and 2 be the mean annual income of working women. So, our Null hypothesis H0 and the alternative hypothesis HA may be written as H0 : 1- 2 > 0 HA : 1- 2 = 0

Example 2. A TV commentator mentions that only about 10 years ago the average life expectancy of a human being was 75, and now it has increased substantially. To test the claim of this commentator, we let be the average life expectancy of a human being. Then we set up our Null and alternative hypotheses as follows: H0 : =75 HA : >75

1.

Definition. A statistical hypothesis is a statement, claim, or proposition regarding a population. Most often, it is about the values of the population parameters. In the above two examples, H0 and HA are statistical

hypotheses. 2. It is important to consider which is a Null hypothesis and which is an alternative hypothesis in a given context. Essentially, one is the negation of the other.

3.

The Null hypothesis H0 represents the status quo; it is something that you have believed for a long time, or it is some assumption or method that has been working reliably for you for a long time. You want to hold on to the Null hypothesis unless there is very strong evidence, in the collected data, that the alternative hypothesis is better. The alternative hypothesis represents a new claim or something out of the ordinary. It could be a researcher's new technology or some sales person's claim that his/her product is better. We would be very skeptical about the alternative hypothesis and would accept it only if there is very strong evidence, in the collected data, in favor of it.

4.

5.

Given a Null hypothesis H0 and an alternative hypothesis HA, a test of hypothesis is a rule or a procedure to decide, based on the collected sample, whether to accept H0 or HA. Our test will be based on the value of a test statistic. The rule is also called the decision rule or a test of significance.

6.

Two Types of errors. In this process of testing, we may commit two types of errors.

1. 2. 3.

If we reject H0 when it is in fact true, then it is called a type one error. If we accept H0 when it is in fact false, then it is called a type two error. The probability of committing a type one error is called the level of significance and is, normally, denoted by . Usually, will be a .1, .05, .01 or a small number.

9.2 Developing a Test Let X be a random variable with mean and standard deviation . Some of our hypotheses testing will look like the following. H0 : = 75 HA : 75 or H0 : = 75 HA : > 75 or H0 : = 75 HA : < 75 More generally, we test hypotheses like H0 : = 0 HA : 0 or

H0 : = 0 HA : > 0 or H0 : = 0 HA : < 0

To Develop a test: Suppose we have a random variable X with mean and standard deviation . We want to develop a test procedure for the following null and alternative hypotheses. H0 : = 0 HA : 0 We take a sample X1,X2, , Xm of size m from the X population and let X be the sample mean. 1. We assume that sample size m is large enough, so we have by CLT that X has N(, X) distribution, where X = /m.

2.

Both type one and type two errors can be controlled by increasing the sample size m. But once the sample size is fixed, it is not possible to control both simultaneously. If you want to reduce the probability of type one error, the probability of type two error will go up. The converse is also true. Since we are more concerned about type one error, we will try to minimize the probability of type one error, which is also called the level of significance. So we want to develop a test at the level of significance .

3. Since X is a good estimator for , and since the alternative hypothesis is HA : 0 we will reject our null hypothesis H0 only if X and 0 are far apart, that is, if | X - 0| is large.

4.

Also, if H0 is true, then = 0 and Z=(X-0) /X

has N(0,1) distribution, where X = /m.

Expression Z above will be called a test statistic and we will accept H0 if the observed (absolute) value |z| of |Z| is small and reject H0 if the observed value |z| of |Z| is large.

5.

If H0 is true, then P(Z ( -z /2, z/2 )) =

6. So, at the level of significance , our decision rule is Reject H0 if z ( -z/2, z/2 )where z = (x-0) /X Accept H0 otherwise. 7. The above decision rule works only if we know the value of . Some Hypotheses and Decision Rules. We will assume that the value of is known.

1.

Two-tail test: Suppose we are testing H0 : = 0 HA : 0 At the level of significance , our decision rule is Reject H0 if z ( -z/2, z/2 ) where z = (x-0) /X Accept H0 otherwise.

2.

Left-tail test: Suppose we are testing H0 : = 0 HA : < 0

At the level of significance , our decision rule is Reject H0 if z < -z where z = (x-0) /X Accept H0 otherwise.

3.

Right-tail test: Suppose we are testing H0 : = 0 HA : > 0 At the level of significance , our decision rule is Reject H0 if z > z where z = (x-0) /X Accept H0 otherwise.

Definition. The set of values (that is, the intervals) that leads to the rejection of the Null hypothesis H0 is called the rejection region or the critical region. Definition. Suppose we have a test statistic T to test H0 against HA. Let the observed value of T = t. The P-value is defined as the probability, assuming H0 is true, that T will take a value at least as extreme as t or worse. In the above decision rules, our test statistic is Z = (X-0) /X If Z = z is the observed value of Z, then we have the following. 1. For the two-tail test, the P-value is given by p=P(Z (-|z|,|z|)) 2. For the left-tail test, the P-value is given by p=P(Z < z) 3. For the right-tail test, the P-value is given by p=P(Z > z) Use of Calculators and P-values:

1.

In the TI-83 menu the above test is called the Z-Test, which comes under TESTS.

2. When we use calculators (say TI-83) for testing hypotheses, the calculator will give us z-values and p-values. 3. We can use the z-values with the above decision rules to test hypotheses. 4. Alternately, at the level of significance , if the P-value=p then Reject H0 if p < Accept H0 otherwise. Remark. For the rest of this chapter, we will test hypotheses for various parameters.

1. In each case, as above, we will have three teststhe two-tail test, the left-tail test, and the right-tail test. 2. In each case, the calculator will give the value of the test statistic (as the z-value above) and the p-value.

3.

If we use the p-value for a test, then the decision rule will remain the same for all the tests to come: Reject H0 if p < Accept H0 otherwise.

Problems on 9.2: Developing a Test Known Exercise 9.2.1. Assume that you have a normal population with mean and standard deviation = 15. Suppose you have collected a sample of size 25 and the sample mean X was found to be 81. We want to test the null hypothesis H0 : = 75 HA : 75 At the 5 percent level of significance will you reject or accept the null hypothesis? Solution Exercise 9.2.2. (Change the level of significance.) Assume that you have a normal population with mean and standard deviation = 15. Suppose you have collected a sample of size 25 and the sample mean X was found to be 81. We want to test the null hypothesis H0 : = 75 HA : 75 At the 1 percent level of significance will you reject or accept the null hypothesis? Solution : Same as 9.2.1 Exercise 9.2.3. (Change the alternative hypothesis) Assume that you have a normal population with mean and standard deviation = 15. Suppose you have collected a sample of size 25 and the sample mean X was found to be 81. We want to test the null hypothesis H0 : = 75 HA : > 75 At the 5 percent level of significance will you reject or accept the null hypothesis? Solution Exercise 9.2.4. The time taken by an athlete to run an event is normally distributed with mean and known standard deviation = 3.5 seconds. The coach believes that his mean has improved from last year's mean 34 seconds. To test, the athlete ran 16 times and the sample mean was found to be X = 31 seconds. 1. Formulate the null and the alternative hypotheses. 2. At 5 percent level of significance, would the coach accept or reject his belief that the athlete has improved? Solution

9.3 Testing on a Single Population In this section, we assume that X is a N(,) random variable. In the last section, we assumed that was known; but in this section we assume that is not known. We will do all three tests as in the above section, but assume that the value of is not known. Once again, we draw a sample X1,X2,,X m of size m from the X population. Let X and S2 be the sample mean and variance, respectively. The test statistic we use is T=((X-0) m) /S

If H0: = 0 is true then T has t-distribution with degrees of freedom m-1. Using the same kind of arguments, we formulate the following decision rules.

1.

Two-tail test: Suppose we are testing H0 : = 0 HA : 0 At the level of significance , our decision rule is Reject H0 if t ( -tm-1, /2, tm-1, /2 )where t = ((x-0) m) /s Accept H0 otherwise.

2.

Left-tail test: Suppose we are testing H0 : = 0 HA : < 0 At the level of significance , our decision rule is Reject H0 if t < -tm-1, where t = ((x-0) m) /s Accept H0 otherwise.

3.

Right-tail test: Suppose we are testing H0 : = 0 HA : > 0 At the level of significance , our decision rule is Reject H0 if t > tm-1, where t = ((x-0) m) /s Accept H0 otherwise.

Use of Calculators and P-values:

1.

In the TI-83 menu the above test is called the T-Test, which comes under TESTS. Use it when is not known.

2. The calculator will give us t-values and p-values. 3. We can use the t-values with the above decision rules to test hypotheses. 4. Alternately, at the level of significance , if the P-value=p then Reject H0 if p < Accept H0 otherwise.

Problems on 9.3: Testing on a Single Population Unknown

Exercise 9.3.1. It is assumed that the lifetime (in hours) of light bulbs produced in a factory is normally distributed with mean and standard deviation . The mean lifetime for an average light bulb on the market is 6000 hours. To estimate , the following data was collected on the lifetime of light bulbs. 5110 7783 4671 4560 6441 6074 3331 4777 5055 4707 5270 5263 5335 4978 4973 5418 1837 5123 5487 5017

The producer claims that the mean life expectancy of the bulbs is more than the average bulbs on the market. 1. Formulate your null and alternative hypotheses. 2. Write down your decision rule.

3. At one percent level of significance, what will you decide? Solution Exercise 9.3.2. To estimate the mean weight (in pounds) of salmon in a river, the following sample was collected. 34.7 45.3 31.8 29.6 33.8 43.1 41.5 39.5 38.2 37.3 44.5 29.1 20.3 32.5 29.2 37.3 27.8 32.3 25.3

Last year the mean weight was found to be 35 pounds. You want to test to determine if the mean weight has changed significantly this year. 1. Formulate your null and alternative hypotheses. 2. Write down your decision rule. 3. At one percent level of significance, what will you decide? Solution Exercise 9.3.3. A supplier of light bulbs claims that the mean lifetime of his bulbs is longer than that of the bulbs available on the market. It is known that the mean lifetime of the bulbs on the market is 3456 hours. To test the claim of the supplier, you test a sample of 26 bulbs and find the sample mean to be 3720 hours and the sample standard deviation to be s = 1152 hours. At 5 percent level of significance, would you accept the claim of the supplier? Solution Exercise 9.3.4. It is believed that the mean length of babies at birth in the United States is higher than the world wide mean of 18.7 inches. A sample of 26 babies in the United States was collected, and the sample mean and standard deviation was found to be x = 19 inches, s = 1 inch. At 1 percent level of significance, do you believe that babies in the United States are longer? Solution Exercise 9.3.5. A car manufacturer claims that a new model of car will get more mileage per gallon than the old model. The old model gets a mean mileage of 33 miles per gallon. To test the claim, 9 cars from the new model were tested and the sample mean was found to be x = 35 miles and standard deviation s = 2.2 miles. At 5 percent level of significance, would you accept the claim of this manufacturer? Solution

9.4 Testing Hypotheses on Variance 2 Once again, let X be a N(, ) random variable. We would like to test the Null hypothesis that H0 : 2 = 20. As usual we draw a sample X1,X2, ,Xm of size m from the X population. Let S2 be the sample variance. The test statistic we use is Y = (m-1)S2/02. If H0 : 2 = 02 is true, then Y has 2-distribution with degrees of freedom m-1. Using the same kind of arguments, we formulate the following decision rules.

1.

Two-tail test: Suppose we are testing H0 : 2 = 0 2 HA : 2 02 At the level of significance , our decision rule is

Reject H0 if y ( 2 m-1,1-/2, 2 m-1, /2 ) where y = (m-1)s2/02 Accept H0 otherwise.

2.

Left-tail test: Suppose we are testing H0 : 2 = 0 2

HA : 2 < 02 At the level of significance , our decision rule is Reject H0 if y < 2 m-1,1-where y = (m-1)s2/02 Accept H0 otherwise.

3.

Right-tail test: Suppose we are testing H0 : 2 = 0 2

HA : 2 > 02 At the level of significance , our decision rule is Reject H0 if y > 2 m-1, where y = (m-1)s2/02 Accept H0 otherwise. 2 Remark. The TI-83 does not have a test for . So, one has to use the above decision rules for this section.

Problems on 9.4: Testing Hypotheses on Variance 2 Exercise 9.4.1 Suppose that we have collected a sample of size n = 23 from a normal population with mean and 2 2 variance . The sample variance was found to be s = 46.7. At 5 percent level of significance, would you conclude 2 that is bigger than 25? Solution Exercise 9.4.2 Following is data on the life expectancies of a group of people older than 75. 87 92 81 76 81 87 79 88 88 79 81 89 97 91 82 2 At one percent level of significance, would you conclude that the variance, , of life expectancies is higher than 16? Solution Exercise 9.4.3 Following is data on a household's monthly gas consumption (in ccf) during the winter months. 154 228 222 240 264 393 257 278 127 140

2 2 At 5 percent level of significance, would you conclude that the variance of gas consumption is less than 6400 ccf ? Solution

9.5 Population Proportion Let p be the population proportion that has a particular attribute A. We want to test Null hypothesis H0 : p = p 0 . As usual, we draw (or interview) a sample of size m. Let X be the number of sample members that has this attribute and X = X/m be the sample proportion. (So, X is the sample proportion of "success.") The test statistic we use is Z=(X-p0) /X where X = [(p0(1-p0)) /m].

If H0 : p = p0 is true, then Z has approximately N(0,1) distribution. As before, our decision rules are

1.

Two-tail test: Suppose we are testing H0 : p = p0 H A : p p0 At the level of significance , our decision rule is

Reject H0 if z ( -z/2, z/2 ) where z = (x-p0) /X Accept H0 otherwise.

2.

Left-tail test: Suppose we are testing H0 : p = p0 H A : p < p0 At the level of significance , our decision rule is Reject H0 if z < -z where z = (x-p0) /X

Accept H0 otherwisep.

3.

Right-tail test: Suppose we are testing H0 : p = p0 HA : p > p0 At the level of significance , our decision rule is Reject H0 if z > z where z = (x-p0) /X Accept H0 otherwise.

Use of Calculators and P-values:

1. 2.

In the TI-83 menu the above test is called the 1-PropZTest, which comes under TESTS.

The calculator will ask for p0, the number of success x, and the sample size n. 3. The calculator will give us z-values and p-values; p-cap is, in fact, sample proportion of success x = x/n. 4. We can use the z-values with the above decision rules to test hypotheses. 5. Alternately, at the level of significance , if the P-value=p then Reject H0 if p < Accept H0 otherwise.

Problems on 9.5: Population Proportion Exercise 9.5.1. In a sample of 197 apples from a lot, 19 were found to be sour. 1. At one percent level of significance, would you conclude that more than 10 percent of the apples are sour? 2. At five percent level of significance, would you conclude that more than 10 percent of the apples are sour? 3. At ten percent level of significance, would you conclude that more than 10 percent of the apples are sour? Solution Exercise 9.5.2. A new vaccine was tried on 147 randomly selected individuals, and it was determined that 61 of them got the virus. It is known that usually fifty percent of the population get the virus. 1. At one percent level of significance, would you conclude that the vaccine is effective? 2. At five percent level of significance, would you conclude that the vaccine is effective? 3. At ten percent level of significance, would you conclude that the vaccine is effective? Solution Exercise 9.5.3. Before an election for a congressional seat, a poll was conducted. Out of 887 randomly selected voters interviewed, 389 said that they would vote for Candidate A, and 359 said that they would vote for Candidate B.

1.

At five percent level of significance, would you conclude that candidate A will receive more than 40 percent of the

vote? Solution 2. At ten percent level of significance, would you conclude that candidate A will receive more than 40 percent of the vote?

3.

At ten percent level of significance, would you conclude that candidate B will receive more than 40 percent of the vote? Solution

9.6 Testing of Hypotheses to Compare Two Populations As we have computed confidence intervals to compare two populations, in this section we will do significance tests to compare two populations. Let X be a random variable with mean 1 and standard deviation 1 and let Y be a random variable with mean 2 and standard deviation 2. (For example, X could be the height of an American male and Y could be the height of an American female.) We may like to compare the equality (or inequality) of means 1, 2. So, our Null hypothesis is given by H0 : 1 = 2 or equivalently H0 : 1- 2 = 0. So, as before we collect a sample X1,X2, ,Xm, of size m from the X-population and a sample Y1,Y2, ,Yn, of size n, from the Y-population. Let X and S12 be the sample mean and variance, respectively, of the X-sample. Let Y and S22 be the sample mean and variance, respectively, of the Y-sample. First, assume that 1, 2 are known If 1, 2 are known, then the test statistic that we use is Z = (X-Y)/d where

( 12 /m + 22 /n )

If the Null hypothesis H0 : 1- 2 = 0 is true, then Z has N(0,1) distribution. As before, our decision rules are formulated as follows.

1.

Two-tail test: Suppose we are testing H0 : 1 - 2= 0 HA : 1 - 2 0 At the level of significance , our decision rule is Reject H0 if z ( -z/2, z/2 ) where z = (x-y) /d Accept H0 otherwise.

2.

Left-tail test: Suppose we are testing H0 : 1 - 2 = 0 HA : 1 - 2 < 0 At the level of significance , our decision rule is Reject H0 if z < -z where z = (x-y) /d Accept H0 otherwise.

3.

Right-tail test: Suppose we are testing H0 : 1 - 2 = 0 HA : 1 - 2 > 0 At the level of significance , our decision rule is Reject H0 if z > z where z = (x-y) /d Accept H0 otherwise.

Remark. If sample sizes m,n are large, we can use S1, S2 as an estimate for 1, 2 in the above expression for Z. So, the modified formula for Z would be : Z = (X-Y)/sd where Sd Use of Calculators and P-values: = ( S12 /m + S22 /n )

1. 2.

In the TI-83 menu the above test is called the 2-SampZTest, which comes under TESTS.

Use it when 1 and 2 are known. 3. The calculator will give us z-values and p-values. 4. We can use the z-values with the above decision rules to test hypotheses. 5. Alternately, at the level of significance , if the P-value=p then Reject H0 if p < Accept H0 otherwise.

Problems on 9.6: Testing of Hypotheses to Compare Two Populations 1, 2 Known Exercise 9.6.1. Suppose we have two normal populations with means , and standard deviation , , respectively. 1 2 1 2 It is known that = 8.1 and = 11.3. A sample of size m = 64 was collected from the first population, and the sample 1 2 mean was found to be x = 3.7. A sample of size n = 99 was collected from the second population, and the sample mean was found to be y = 4.1. At 5 percent level of significance, would you conclude that ? 1 2 Solution Exercise 9.6.2. Suppose the birth weight of babies in developed and developing countries are normally distributed with mean 1, 2 and standard deviation 1, 2, respectively. (My data is not real, as is often the case.) It is known the 1 = 2.3 pounds and 2 = 2.9 pounds. A sample of size m = 35 babies from the developed nations was collected, and the sample mean birth weight was found to be X = 8.9 pounds. A sample of size n = 48 babies from the developing nations was collected, and the sample mean birth weight was found to be y = 7.6 pounds. 1. At 5 percent level of significance, would you conclude that the mean birth weight of babies in the developed nations is higher than that of the developing nations? 2. At 1 percent level of significance, would you conclude that the mean birth weight of babies in developed nations is higher than that of developing nations? Solution Exercise 9.6.3. African elephants and Indian elephants are different in height, weight, and length of ear and tusk. It is natural to assume that all these are normally distributed. The mean and standard deviation height of African elephants are 1, 1= 1.5 feet, respectively. The mean and standard deviation of the height of Indian elephants are 2, 2= 1.3 feet, respectively. A sample of size 25 African elephants was collected, and the sample mean height was found to be x = 10.9 feet. A sample of size 28 Indian elephants was collected, and the sample mean height was found to be y = 9.1 feet. 1. At 5 that 2. At 1 that Solution percent level of the Indian percent level of the Indian of significance, would you conclude that the mean height of African elephants is higher than elephants? of significance, would you conclude that the mean height of African elephants is higher than elephants?

9.7 Comparing Means of Two Populations: 1, 2 Unknown As we did with confidence intervals, we consider the case where 1, 2 are not known, but we assume that standard deviations are equal: 1 = 2 = . In this case, we have the estimator Sp for given by Sp = ( [(m-1)SX2+(n-1)SY2 ]/ [m+n-2] )1/2 where SX and SY are the respective sample standard deviations of the corresponding samples. The test statistic that we use is T = (X-Y) /[Sp ( 1/m+1/n) ] If the Null hypothesis H0 : 1- 2 = 0 is true, then T has a t-distribution with degrees of freedom m+n-2. We formulate the test hypotheses and the decision rules as follows.

1.

Two-tail test: Suppose we are testing H0 : 1 - 2 = 0 HA : 1 - 2 0 At the level of significance , our decision rule is

Reject H0 if t ( -tm+n-2,/2, tm+n-2,/2 ) where t = (x-y) / [sp ( 1/m + 1/n )] Accept H0 otherwise.

2.

Left-tail test : Suppose we are testing H0 : 1 - 2 = 0 HA : 1 - 2 < 0 At the level of significance , our decision rule is Reject H0 if t < -tm+n-2, where t = (x-y) / [sp ( 1/m + 1/n )] Accept H0 otherwise.

3.

Right-tail test: Suppose we are testing H0 : 1 - 2 = 0 HA : 1 - 2 > 0 At the level of significance , our decision rule is Reject H0 if t > tm+n-2, where t = (x-y) / [sp ( 1/m + 1/n )] Accept H0 otherwise.

Use of Calculators and P-values:

1. 2. 4.

In the TI-83 menu the above test is called the 2-SampTTest, which comes under TESTS.

Use it when 1 = 2= are UNKNOWN and equals. Either s1 and s2 will be given or raw data will be given. 3. Always use Pooled estimate of by selecting YES for "Pooled". The calculator will give t-values and p-values and also the pooled estimate SXP. 5. We can use the t-values with the above decision rules to test hypotheses. 6. Alternately, at the level of significance , if the P-value=p then Reject H0 if p < Accept H0 otherwise.

Problems on 9.7: Comparing Means of Two Populations 1, 1 Unknown: Exercise 9.7.1. Suppose that we are comparing two similar normal populations with means 1, 2, respectively, equal standard deviation . We collected a sample of size m = 11 from the first population that produced a sample mean x = 13.2 and samples standard deviation s1 = 2.33. A sample of size n = 13 was collected from the second population that had sample mean y = 11.5 and sample variance s2 = 2.73. At 5 percent level of significance, would you conclude that 2? 1 Solution Exercise 9.7.2. Suppose we have two normal population with means , and equal standard deviation . A sample of 1 2 size m = 64 was collected from the first population and the sample mean and standard deviation were found to be x = 3.1, s = 9.2 . A sample of size n = 99 was collected from the second population and the sample mean and standard deviation 1 were y = 4.4, s = 8.7. At 5 percent level of significance, would you conclude that 2. 2 1 Solution

Exercise 9.7.3. Suppose the birth weight of babies in developed and developing countries are normally distributed with mean 1, 2 and equal standard deviation . (My data is not real, as is often the case.) The following data about birth weight in developed and developing nations were collected. 8.8 7.1 8.2 10.1 7.2 8.1 5.3 7.9 9.9 6.3 7.7 8.3 8.8 9.7 9.1 8.9 7.8 6.3 8.1 9.0 5.2 6.3 7.1 9.1 6.3 5.7 5.2 8.1 8.1 7.1 6.8 8.3 7.9 7.0 6.3 8.3 5.9 6.3 4.9 6.1 7.7 5.5 6.9 5.3 5.8

1. At 5 percent level of significance, would you conclude that the mean birth weight of babies in the developed countries is higher than that in developing countries? 2. At 1 percent level of significance, would you conclude that the mean birth weight of babies in the developed countries is higher than that in developing countries? Solution Exercise 9.7.4. African elephants and Indian elephants are different in height, weight, and length of ear and tusk. It is natural to assume that all these are normally distributed. Assume that the height of Arican and Indian elephants have an equal standard deviation . The mean heights of the African elephants and Indian elephants are1, 2, respectively. The following data were collected on the height of the elephants from the two continents (these are not real data): 10.9 8.8 9.1 13.1 11.7 12.9 8.7 12.9 9.3 11.7 10.5 9.5 9.9 9.1 11.3 10.7 11.5 11.1 12.3 11.3 7.1 9.3 7.9 8.7 9.9 8.3 9.7 9.9 8.8 8.2 8.9 9.2 9.3 9.1 8.8 8.8 10. 1 10.3 9.1 8.1 9.9

1. At 5 that 2. At 1 that Solution

percent level of significance, would you conclude that the mean height of African elephants is higher than of Indian elephants? percent level of significance, would you conclude that the mean height of African elephants is higher than of Indian elephants?

9.7 Paired t-test Once again, we are testing equality of means 1, 2 of two populations. So, our Null Hypothesis is H0 : 1- 2 = 0. We continue to denote the first population random variable by X and the second population random variable by Y. We also assume that X and Y have normal distribution, and that they are independent. In certain situations, it is natural to collect samples in "pairs" (X,Y) from the two populations and consider the difference D = X-Y. So, D has mean D = 1- 2 and our Null hypothesis becomes H0 : D = 0. Also D has

N(D, D )-distribution where D = ( 12 + 22 ). We will collect samples in pairs (X1,Y1), ,(Xn,Yn) and look at the corresponding D-sample: D1 = X1-Y1, , Dn = Xn-Yn. Let S2D = [ (Di-D)2] / (n-1) be the sample mean and variance, respectively, of the D-sample. The test statistic that we will use is T = (Dn) /SD If the Null hypothesis H0 : D = 1- 2 = 0 is true, then T has a t-distribution with degrees of freedom n-1. The following are decision rules for the Paired t-test. D = ( D1++Dn )/n

1.

Two-tail test: Suppose we are testing H0 : 1 - 2 = 0 HA : 1 - 2 0 At the level of significance , our decision rule is Reject H0 if t ( -tn-1,/2, tn-1,/2 ) where t = (Dn) /SD Accept H0 otherwise.

2.

Left-tail test: Suppose we are testing H0 : 1 - 2 = 0 HA : 1 - 2 < 0 At the level of significance , our decision rule is Reject H0 if t < -tn-1, where t = (Dn) /SD Accept H0 otherwise.

3.

Right-tail test: Suppose we are testing H0 : 1 - 2 = 0 HA : 1 - 2 > 0 At the level of significance , our decision rule is Reject H0 if t > tn-1, where t = (Dn) /SD Accept H0 otherwise.

Example. Suppose we are comparing two models of cars to see how fast they accelerate. In this case, to avoid any variation due to individual drivers, we take n drivers and let each driver drive one of each model of car. So, (xi,yi) are the accelerations of the first and second model driven by driver 1. Thus, we will have n pairs of observations.

Remark. The same technique of paired t-test will give us that a (1-)100 percent confidence interval for D = 1- 2 is d-tn-1,/2sd < 1- 2 < d-tn-1,/2sd

9.8 Comparing Proportions p1, p2 of Two Populations Once again, we have two populations and let p1 be the proportion of Population 1 that has a certain attribute A and let p2 be the population proportion of Population 2 that has attribute A. We want to compare p1 and p2. We want to test the equality of these two proportions. So,our Null hypothesis is H0 : p1-p2 = 0. We take a sample of size m from Population 1 and let X be the number of the sample members that have this attribute A, and X = X/m be the sample mean. Similarly, we take a sample (or interview) of size n and let Y be the number of the sample members that have this attribute A and Y = Y/n be the sample mean. (So, X, Y are proportion of "success" of the two samples.) Write P=(X+Y)/(m+n) If the null hypothesis H0 : p 1 = p 2 is true, then p is the natural estimate for p1 = p2. The sample statistic we use here is Z = (X-Y) /sD where sD = [P(1-P)(1/m + 1/n) ] If H0 : p1-p2 = 0 is true, then Z has, approximately, N(0,1) distribution. Now our test hypotheses and the decision rules are as follows.

1.

Two-tail test: Suppose we are testing H 0 : p 1 - p2 = 0 HA : p1 - p2 0 At the level of significance , our decision rule is Reject H0 if z ( -z/2, z/2 ) where z = (X-Y)/sD Accept H0 otherwise.

2.

Left-tail test: Suppose we are testing HO : p1 - p2 = 0 HA : p1 - p2 < 0 At the level of significance , our decision rule is as follows: Reject H0 if z < -z where z = (X-Y)/sD Accept H0 otherwise.

3.

Right-tail test: Suppose we are testing

H 0 : p 1 - p2 = 0 HA : p1 - p2 > 0 At the level of significance , our decision rule is Reject H0 if z > z where z = (X-Y)/sD Accept H0 otherwise. Use of Calculators and P-values:

1. 2.

In the TI-83 menu the above test is called the 2PropZTest, which comes under TESTS.

The calculator will give us z-values and p-values. Also, in our notations, p1-cap = X, p2-cap = Y, p-cap = P 3. We can use the z-values with the above decision rules to test hypotheses. 4. Alternately, at the level of significance , if the P-value=p then Reject H0 if p < Accept H0 otherwise.

9.8: Problems on Comparing Proportions p1, p2 of Two Populations Exercise 9.8.1. Suppose two independent samples were collected from two populations. We want to compare the proportions p1,p2 , respectively, of an attribute A present in these two populations. We are given that x = 55 had the attribute A in a sample of size m = 117 from the first population, and y = 37 had the attribute A is a sample of size n = 79 from the second sample. At 1 percent level of significance, would you conclude that p > p ? 1 2 Solution Exercise 9.8.2. To compare the proportions p1,p2 of defective items produced by new and old machines, respectively, samples were collected. In a sample of 57 items from the new machine, 6 were found to be defective; and in a sample of 41 items from the old machine, 9 were defective. At 5 percent level of significance, would you conclude that p < p ? 1 2 Solution Exercise 9.8.3. Data was collected to compare the proportions p1,p2 of men and women, respectively, who watch football. In a sample of 199 men, 83 said that they watch football; and in a sample of 161 women, 51 said they watch football. (These are not real data). 1. At 5 percent level of significance, would you conclude that the proportion of men who watch football is higher than the proportion of women who watch football? 2. At 1 percent level of significance, would you conclude that the proportion of men who watch football is higher than the proportion of women who watch football?

You might also like