Lecture 15
Lecture 15
AIDS data indicating the number of months a patient with AIDS lives after taking a
new antibody drug are as follows (smallest to largest):
3; 4; 8; 8; 10; 11; 12; 13; 14; 15; 15; 16; 16; 17; 17; 18; 21; 22; 22; 24; 24; 26; 26;
26; 27; 27; 29; 29; 31; 32; 33; 33; 34; 34; 35; 37; 40; 44; 44; 47;
What are the mean, median and mode?
The mean = 23.6
The median location = (40+1)/2 = 20.5
The median is located between the 20th and 21st values (the two 24s)
Therefore median = (24+24)/2 = 24
The mode = 26
The standard deviation is useful when comparing data values that come from
different data sets.
If the data sets have different means and standard deviations, then comparing the
data values directly can be misleading.
For each data value, calculate how many standard deviations away from its mean
Use the formula: value = mean + (#ofSTDEVs)(standard deviation);
Two students, John and Ali, from different high schools, wanted to find out who
had the highest GPA when compared to his school. Which student had the
highest GPA when compared to his school?
For each student, determine how many standard deviations (#ofSTDEVs) his
GPA is away from the average, for his school.
Pay careful attention to signs when comparing and interpreting the answer.
z = # of STDEVs = (value – mean)/ standard deviation = (x – μ)/σ
For John, z = # o f STDEVs = (2.85 – 3.0) /0.7 = - 0.21
For Ali, z = # o f STDEVs = (77 − 80) /10 = - 0.3
John has the better GPA when compared to his school because his GPA is 0.21
standard deviations below his school's mean while Ali's GPA is 0.3 standard
deviations below his school's mean.
The t-test is any statistical hypothesis test in which the test statistic follows a Student's
t-distribution under the null hypothesis.
A t-test is most commonly applied when the test statistic would follow a normal
distribution if the value of a scaling term in the test statistic were known.
The t-test can be used to determine if the means of two sets of data are significantly
different from each other.
•An Independent Samples t-test compares the means for two groups.
•A Paired sample t-test compares means from the same group at different times
(say, one year apart).
•A One sample t-test tests the mean of a single group against a known mean
The t test tells you how significant the differences between groups are; In other
words it lets you know if those differences (measured in means/averages) could
have happened by chance.
Up until the mid-1970s, some statisticians
used the normal distribution
approximation for large sample sizes and
used the Student's t-distribution only for
sample sizes of at most 30.until the mid‐1970s,
some t‐distribution only for sample sizes of at most 30.
With the paired t-test, the null hypothesis is that the pairwise difference between
the two tests is equal (H0: µd = 0).
Paired t-test is used when you run a t-test on dependent samples.
Dependent samples are essentially connected — they are tests on the same
person or thing.
The t-score is a ratio between the difference between two groups and the
difference within the groups.
The larger the t score, the more difference there is between groups.
The smaller the t score, the more similarity there is between groups.
A t score of 3 means that the groups are three times as different from each
other as they are within each other.
For example a p-value = .01 means there is only a 1% probability that the
results from an experiment happened by chance.
In most cases, a p-value of 0.05 (5%) is accepted to mean the data is valid.
1. Calculate the difference (di = yi − xi) between the two observations on each
pair, making sure you distinguish between positive and negative differences.
2. Calculate the mean difference, .
3. Calculate the standard deviation of the differences, Sd, and use this to
calculate the standard error of the mean difference, SE( d ) = , Sd / √n
4. Calculate the T value = d / (SE( d ). Under the null hypothesis, this statistic
follows a t-distribution with n-1 degrees of freedom (df).
5. Then use a table of t value with the df to look up the p value for the paired t-
test.
Of course, you can use Excel to calculate this
If researchers desire a specific margin of error, then they can use the error
bound formula to calculate the required sample size.
The error bound can be calculated from the confidence interval and the
mean
e.g. if the confidence interval is (67.18, 68.82)
The mean = (67.18+68.82)/2 = 68
The Error Bound (EB) = (68.82 - 67.18)/2 = 0.82
The sample size = (z2σ2) / EB2
The z is corresponding to the desired confidence level.
A researcher planning a study who wants a specified confidence level and error
bound can use this formula to calculate the size of the sample needed for the
study.
ECE 710 Lecture 15 E. Lou 21
Example
The population standard deviation for the age of Foothill College students is 15
years. If we want to be 95% confident that the sample mean age is within two
years of the true population mean age of Foothill College students, how many
randomly selected Foothill College students must be surveyed?
From the problem, we know that σ = 15 and EB = 2.
z = 1.96, because the confidence level is 95%. (see slide 10)
A hypothesis test involves collecting data from a sample and evaluating the data.
Then, the statistician makes a decision as to whether or not there is sufficient
evidence, based upon analyses of the data, to reject the null hypothesis.
To perform a hypothesis test, a statistician will:
1. Set up two contradictory hypotheses.
2. Collect sample data.
3. Determine the correct distribution to perform the hypothesis test.
4. Analyze sample data by performing the calculations that ultimately will allow you to
reject or decline to reject the null hypothesis.
5. Make a decision and write a meaningful conclusion.
Since the null and alternative hypotheses are contradictory, you must examine
evidence to decide if you have enough evidence to reject the null hypothesis or not.
The evidence is in the form of sample data.
H0 Ha
Equal(=) Not equal (≠) or greater than (>) or less than (<)
Greater than or equal to (≥) Less than (<)
Less than or equal to (≤) Greater than (>)
H0 : µ = 2.0
Ha : µ ≠ 2.0
When you perform a hypothesis test, there are four possible outcomes
depending on the actual truth (or falseness) of the null hypothesis H0 and the
decision to reject or not. The outcomes are summarized in the following table:
Suppose the null hypothesis, H0, is: The victim of an automobile accident is
alive when he arrives at the emergency room of a hospital.
Type I error: The emergency crew thinks that the victim is dead when, in fact, the
victim is alive.
Type II error: The emergency crew does not know if the victim is alive when, in fact,
the victim is dead.
α = probability that the emergency crew thinks the victim is dead when, in fact, he is
really alive = P(Type I error).
β = probability that the emergency crew does not know if the victim is alive when, in
fact, the victim is dead = P(Type II error).
The error with the greater consequence is the Type I error. (If the emergency crew
thinks the victim is dead, they will not treat him.)
Use the sample data to calculate the actual probability of getting the test
result, called the p-value.
The p-value is the probability that, if the null hypothesis is true, the
results from another randomly selected sample will be as extreme
or more extreme as the results obtained from the given sample.
A large p-value calculated from the data indicates that we should not reject the
null hypothesis.
The smaller the p-value, the more unlikely the outcome, and the stronger the
evidence is against the null hypothesis.
We would reject the null hypothesis if the evidence is strongly against it.
A p-value of approximately zero tells us that it is highly unlikely that a loaf of bread
rises no more than 15 cm, on average. That is, almost 0% of all loaves of bread
would be at least as high as 17 cm. purely by CHANCE had the population mean
height really been 15 cm.
There is sufficient evidence that the true mean height for the population of the
baker's loaves of bread is greater than 15 cm.