0% found this document useful (0 votes)
9 views

Lecture 15

Uploaded by

Gursimar Singh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Lecture 15

Uploaded by

Gursimar Singh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Data Analysis

ECE 710 Lecture 15 E. Lou 1


Descriptive Statistics

The most common terms used in basic statistics are:


Mean, Mode and Median

For a set value:


1, 2, 2, 3, 4, 7, 9
Mean = (1+2+2+3+4+7+9)/7 = 4
Median = 3
Mode = 2

ECE 710 Lecture 15 E. Lou 2


Mean, Mode, Median

ECE 710 Lecture 15 E. Lou 3


Mean, Mode, Median

AIDS data indicating the number of months a patient with AIDS lives after taking a
new antibody drug are as follows (smallest to largest):
3; 4; 8; 8; 10; 11; 12; 13; 14; 15; 15; 16; 16; 17; 17; 18; 21; 22; 22; 24; 24; 26; 26;
26; 27; 27; 29; 29; 31; 32; 33; 33; 34; 34; 35; 37; 40; 44; 44; 47;
What are the mean, median and mode?
The mean = 23.6
The median location = (40+1)/2 = 20.5
The median is located between the 20th and 21st values (the two 24s)
Therefore median = (24+24)/2 = 24
The mode = 26

ECE 710 Lecture 15 E. Lou 4


Statistics – Sample Distribution

You can think of a sampling distribution as a relative frequency distribution with


a great many samples.
Suppose thirty randomly selected students were asked the number of movies that
they watched the previous week. The results are in the relative frequency table
shown below.

ECE 710 Lecture 15 E. Lou 5


Statistics – Sample Distribution

What is the average number of movie that the students watched?


The sample mean x is an example of a statistic which estimates the
population mean μ.

Mean = (data sum) / number of data values

Mean = (5x0 + 15x1 + 6x2 + 3x3 + 4 x1) /30 = 1.3 movies

ECE 710 Lecture 15 E. Lou 6


Standard Deviation

If x is a number, then the difference "x – mean" is called its deviation.


In a data set, there are as many deviations as there are items in the data set. The
deviations are used to calculate the standard deviation.
If the numbers belong to a population, in symbols a deviation is x – μ.
For sample data, in symbols a deviation is x – x.
The lower case letter s represents the sample standard deviation and the Greek
letter σ (sigma, lower case) represents the population standard deviation.
If the sample has the same characteristics as the population, then s should be a
good estimate of σ.

ECE 710 Lecture 15 E. Lou 7


Standard Deviation – Z score

The standard deviation is useful when comparing data values that come from
different data sets.
If the data sets have different means and standard deviations, then comparing the
data values directly can be misleading.
For each data value, calculate how many standard deviations away from its mean
Use the formula: value = mean + (#ofSTDEVs)(standard deviation);

#ofSTDEVs is often called a "z-score"

ECE 710 Lecture 15 E. Lou 8


Example

Two students, John and Ali, from different high schools, wanted to find out who
had the highest GPA when compared to his school. Which student had the
highest GPA when compared to his school?

ECE 710 Lecture 15 E. Lou 9


Example

For each student, determine how many standard deviations (#ofSTDEVs) his
GPA is away from the average, for his school.

Pay careful attention to signs when comparing and interpreting the answer.
z = # of STDEVs = (value – mean)/ standard deviation = (x – μ)/σ
For John, z = # o f STDEVs = (2.85 – 3.0) /0.7 = - 0.21
For Ali, z = # o f STDEVs = (77 − 80) /10 = - 0.3

John has the better GPA when compared to his school because his GPA is 0.21
standard deviations below his school's mean while Ali's GPA is 0.3 standard
deviations below his school's mean.

ECE 710 Lecture 15 E. Lou 10


Confidence Interval

In statistics, a confidence interval (CI) is a type of estimate computed from


the statistics of the observed data.
The interval has an associated confidence level that the true parameter is in
the proposed range.

ECE 710 Lecture 15 E. Lou 11


Example
If you worked in the marketing department of an entertainment company, you might be
interested in the mean number of songs a consumer downloads a month from iTunes.
If we do not know the population mean μ, but we do know that the population standard
deviation is σ = 1 and our sample size is 100. Then, by the central limit theorem, the
standard deviation for the sample mean is
s = σ/√𝒏 = 1/ √ 100 = 0.1

x – 0.2 and x +0.2 will be the CI boundary.


If x = 2, the 95% confidence interval is (1.8, 2.2).

ECE 710 Lecture 15 E. Lou 12


Student’s t-test

The t-test is any statistical hypothesis test in which the test statistic follows a Student's
t-distribution under the null hypothesis.
A t-test is most commonly applied when the test statistic would follow a normal
distribution if the value of a scaling term in the test statistic were known.
The t-test can be used to determine if the means of two sets of data are significantly
different from each other.
•An Independent Samples t-test compares the means for two groups.
•A Paired sample t-test compares means from the same group at different times
(say, one year apart).
•A One sample t-test tests the mean of a single group against a known mean

ECE 710 Lecture 15 E. Lou 13


Student’s t-test

The t test tells you how significant the differences between groups are; In other
words it lets you know if those differences (measured in means/averages) could
have happened by chance.
Up until the mid-1970s, some statisticians
used the normal distribution
approximation for large sample sizes and
used the Student's t-distribution only for
sample sizes of at most 30.until the mid‐1970s,
some t‐distribution only for sample sizes of at most 30.

ECE 710 Lecture 15 E. Lou 14


Paired t-test

With the paired t-test, the null hypothesis is that the pairwise difference between
the two tests is equal (H0: µd = 0).
Paired t-test is used when you run a t-test on dependent samples.
Dependent samples are essentially connected — they are tests on the same
person or thing.

ECE 710 Lecture 15 E. Lou 15


T-Score

The t-score is a ratio between the difference between two groups and the
difference within the groups.

The larger the t score, the more difference there is between groups.
The smaller the t score, the more similarity there is between groups.

A t score of 3 means that the groups are three times as different from each
other as they are within each other.

ECE 710 Lecture 15 E. Lou 16


T-values and P-values

Every T-score (t-value) has a p-value to go with it.


A p-value is the probability that the results from your sample data occurred by
chance.
P-values are from 0% to 100%.
They are usually written as a decimal.
A p-value of 5% is 0.05. Low p-values are good.
They indicate your data did not occur by chance.

For example a p-value = .01 means there is only a 1% probability that the
results from an experiment happened by chance.
In most cases, a p-value of 0.05 (5%) is accepted to mean the data is valid.

ECE 710 Lecture 15 E. Lou 17


Example
Suppose a sample of n students were given a diagnostic test before studying a
particular module and then again after completing the module. We want to find out
if, in general, our teaching leads to improvements in students’ knowledge/skills (i.e.
test scores). We can use the results from our sample of students to draw
conclusions about the impact of this module in general.
Let x = test score before the module, y = test score after the module
Use Paired t-test
Null hypothesis - there is no difference after completing the module

ECE 710 Lecture 15 E. Lou 18


Example

1. Calculate the difference (di = yi − xi) between the two observations on each
pair, making sure you distinguish between positive and negative differences.
2. Calculate the mean difference, .
3. Calculate the standard deviation of the differences, Sd, and use this to
calculate the standard error of the mean difference, SE( d ) = , Sd / √n
4. Calculate the T value = d / (SE( d ). Under the null hypothesis, this statistic
follows a t-distribution with n-1 degrees of freedom (df).
5. Then use a table of t value with the df to look up the p value for the paired t-
test.
Of course, you can use Excel to calculate this

ECE 710 Lecture 15 E. Lou 19


Example
Student Pre-Score Post-Score
Mean difference = 2.05
1 18 22 Sd = 2.837
2 21 25
3 16 17 SE ( d) = 0.634
4 22 24
5 19 16 t = 3.231 df = 19
6 24 29 p = 0.004
7 17 20
8 21 23 Therefore, there is a
9 23 19
10 18 20 strong evidence that, on
11 14 15 average, the module
12 16 15
13 16 18 does lead to
14 19 26
15 18 18
improvements.
16 20 24
17 12 18
18 22 25
19 15 19
20 17 16

ECE 710 Lecture 15 E. Lou 20


Sample Size

If researchers desire a specific margin of error, then they can use the error
bound formula to calculate the required sample size.
The error bound can be calculated from the confidence interval and the
mean
e.g. if the confidence interval is (67.18, 68.82)
The mean = (67.18+68.82)/2 = 68
The Error Bound (EB) = (68.82 - 67.18)/2 = 0.82
The sample size = (z2σ2) / EB2
The z is corresponding to the desired confidence level.
A researcher planning a study who wants a specified confidence level and error
bound can use this formula to calculate the size of the sample needed for the
study.
ECE 710 Lecture 15 E. Lou 21
Example

The population standard deviation for the age of Foothill College students is 15
years. If we want to be 95% confident that the sample mean age is within two
years of the true population mean age of Foothill College students, how many
randomly selected Foothill College students must be surveyed?
From the problem, we know that σ = 15 and EB = 2.
z = 1.96, because the confidence level is 95%. (see slide 10)

n = (1.96)2(15) 2/22 = 216.09 using the sample size equation.


Use n = 217: Always round the answer UP to the next higher integer to
ensure that the sample size is large enough

ECE 710 Lecture 15 E. Lou 22


Hypothesis Testing

A hypothesis test involves collecting data from a sample and evaluating the data.
Then, the statistician makes a decision as to whether or not there is sufficient
evidence, based upon analyses of the data, to reject the null hypothesis.
To perform a hypothesis test, a statistician will:
1. Set up two contradictory hypotheses.
2. Collect sample data.
3. Determine the correct distribution to perform the hypothesis test.
4. Analyze sample data by performing the calculations that ultimately will allow you to
reject or decline to reject the null hypothesis.
5. Make a decision and write a meaningful conclusion.

ECE 710 Lecture 15 E. Lou 23


Null and Alternative Hypotheses

The actual test begins by considering two hypotheses.


They are called the null hypothesis and the alternative hypothesis.
These hypotheses contain opposing viewpoints.
H0: The null hypothesis: It is a statement of no difference between sample means
or proportions. In other words, the difference equals 0.
Ha: The alternative hypothesis: It is a claim about the population that is
contradictory to H0 and what we conclude when we reject H0.

Since the null and alternative hypotheses are contradictory, you must examine
evidence to decide if you have enough evidence to reject the null hypothesis or not.
The evidence is in the form of sample data.

ECE 710 Lecture 15 E. Lou 24


Null and Alternative Hypotheses

Mathematical Symbols Used in H0 and Ha:

H0 Ha
Equal(=) Not equal (≠) or greater than (>) or less than (<)
Greater than or equal to (≥) Less than (<)
Less than or equal to (≤) Greater than (>)

H0 always has a symbol with an equal in it.


Ha never has a symbol with an equal in it.

ECE 710 Lecture 15 E. Lou 25


Example

We want to test whether the mean GPA of students in American colleges is


different from 2.0 (out of 4.0). The null and alternative hypotheses are:

H0 : µ = 2.0
Ha : µ ≠ 2.0

ECE 710 Lecture 15 E. Lou 26


Type I and Type II Errors

When you perform a hypothesis test, there are four possible outcomes
depending on the actual truth (or falseness) of the null hypothesis H0 and the
decision to reject or not. The outcomes are summarized in the following table:

α and β should be as small as possible because they are probabilities of errors.


ECE 710 Lecture 15 E. Lou 27
Example

Suppose the null hypothesis, H0, is: The victim of an automobile accident is
alive when he arrives at the emergency room of a hospital.
Type I error: The emergency crew thinks that the victim is dead when, in fact, the
victim is alive.
Type II error: The emergency crew does not know if the victim is alive when, in fact,
the victim is dead.
α = probability that the emergency crew thinks the victim is dead when, in fact, he is
really alive = P(Type I error).
β = probability that the emergency crew does not know if the victim is alive when, in
fact, the victim is dead = P(Type II error).
The error with the greater consequence is the Type I error. (If the emergency crew
thinks the victim is dead, they will not treat him.)

ECE 710 Lecture 15 E. Lou 28


p-value

Use the sample data to calculate the actual probability of getting the test
result, called the p-value.
The p-value is the probability that, if the null hypothesis is true, the
results from another randomly selected sample will be as extreme
or more extreme as the results obtained from the given sample.

A large p-value calculated from the data indicates that we should not reject the
null hypothesis.
The smaller the p-value, the more unlikely the outcome, and the stronger the
evidence is against the null hypothesis.
We would reject the null hypothesis if the evidence is strongly against it.

ECE 710 Lecture 15 E. Lou 29


Example
Suppose a baker claims that his bread height is more than 15 cm, on average. Several
of his customers do not believe him. To persuade his customers that he is right, the
baker decides to do a hypothesis test.
He bakes 10 loaves of bread. The mean height of the sample loaves is 17 cm. The
baker knows from baking hundreds of loaves of bread that the standard deviation for
the height is 0.5 cm. and the distribution of heights is normal.
What is the H0: μ ≤ 15. The alternate hypothesis is Ha: μ > 15
Since σ is known (σ = 0.5 cm.), the distribution for the population is known to be
normal with mean μ = 15
standard error of the mean = σ/√n = 0.16

ECE 710 Lecture 15 E. Lou 30


Example
Suppose the null hypothesis is true (the mean height of the loaves is no more than 15
cm). Then is the mean height (17 cm) calculated from the sample unexpectedly large?
The hypothesis test works by asking the question how unlikely the sample mean
would be if the null hypothesis were true.
The graph shows how far out the sample mean is on the normal curve.
The p-value is the probability that, if we were to take other samples, any other
sample mean would fall at least as far out as 17 cm.
The p-value, then, is the probability that a
sample mean is the same or greater than 17
cm. when the population mean is, in fact,
15 cm. We can calculate this probability using
the normal distribution for means.

ECE 710 Lecture 15 E. Lou 31


Example

A p-value of approximately zero tells us that it is highly unlikely that a loaf of bread
rises no more than 15 cm, on average. That is, almost 0% of all loaves of bread
would be at least as high as 17 cm. purely by CHANCE had the population mean
height really been 15 cm.

Because the outcome of 17 cm. is so unlikely (meaning it is happening NOT by


chance alone), we conclude that the evidence is strongly against the null
hypothesis (the mean height is at most 15 cm.).

There is sufficient evidence that the true mean height for the population of the
baker's loaves of bread is greater than 15 cm.

ECE 710 Lecture 15 E. Lou 32

You might also like