03 - Statistics Foundations Part 2
03 - Statistics Foundations Part 2
03 - Statistics Foundations Part 2
Welcome
My name is Eddie Davila, and I'm a university instructor with degrees in business
and engineering. I write ebooks, and of course I develop online educational
content. I'm a huge sports fan. I love to follow the entertainment industry. And
I'm passionate about science and health. And I can tell you that in every
important facet of my life, having a better understanding of statistics allows me to
improve my performance and often to find a greater level of satisfaction whether
I'm working or playing.
And I'll tell you, even if you know what many of these things are, I think you'll
walk away with a new perspective. Actually, I'm hoping you'll never look at these
concepts the same way again.You won't just understand the power of data and
statistics. You'll know their inherent weaknesses too. Welcome to Statistics
Fundamentals Part Two. Improved performance and increased satisfaction are just
around the corner.
And hopefully, you're also comfortable with normal distribution curves and z-
scores. Plus, you should be comfortable with means, medians, standard deviation,
and basic probabilities. And look, even if some of those things make you a little
bit uncomfortable, don't worry. Often we use pictures, charts, and tables to help
illustrate the concept. Sometimes, we attack problems in more than one way. And
of course, through the power of the internet, you can always pause and rewind.
So, whether you're math muscles are strong, or you're just beginning to
rediscover math concepts, I think the probability of success and discovery is quite
high. Thanks for exploring Statistics Fundamentals Part 2, and good luck.
And while tables and charts are helpful, often they're not enough. A data pool of
10,000 data points would give us tables and charts that might be too
P a g e 2 | 73
overwhelming to provide the guidance we seek. Often we want to understand the
data set by knowing its center. There are several ways to determine the center of
a data set. One way is to determine the mean, otherwise known as the
average. An average of all 10,000 data points. We might also consider the
median.
If we listed all 10,000 values, from the smallest to the largest value, the median
would be the one right in the middle. Knowing the central point is helpful, but
perhaps you want to knowhow the 10,000 values are distributed. For example,
you might want to know the range, the difference between the biggest and
smallest value in a data set. Perhaps you want a standard deviation. The standard
deviation is sort of the average distance between each data point and the mean
of the data set.
It is essentially a measure of how much variation exist between the data points in
our pool. Let's set this up just a bit. Do you remember the concept of a normal
distribution? Remember, a data pool that is normally distributed would mean that
your data is symmetrically distributed around the data pool's mean. That's where
we get our very pretty and also very helpful normal distribution curve. These
distribution curves tell us how data is distributed, how many data points are at
P a g e 3 | 73
the mean, how many are located at this reading, and how many we would expect
at this reading.
So, now let's pair this up with the concept of standard deviation. How are the
concepts of standard deviation in the normal distribution curve related? Well
hopefully you remember that the empirical rule tells us that we expect about 68%
of our data to be within one standard deviation of the mean. We would then
expect 95% of our data points to be within two standard deviations of the
mean. And finally, 99.7% of our data points should be within three standard
deviations from the mean, which brings us to Z-Scores.
P a g e 4 | 73
Z-Scores are a measure of the number of standard deviations a particular data
point is from the mean. So for a data point we will call X, a Z-Score of 1.55 means
that the X is 1.55 standard deviations from the mean. Data sets, tables,
charts, means, medians, ranges, standard deviations, normal distributions, and Z-
Scores. Do you remember these concepts? Do you know what they mean? Do you
know how to find or calculate some of these statistics? If so, good job.
- So you understand data sets, tables and charts. You understand means,
medians, ranges, and standard deviations. Excellent. But that's not all that was
P a g e 5 | 73
covered in Statistics Fundamentals 1.So, what else do you need to
remember before we start covering new statistical territory. Well, Statistics
Fundamental 1 spent quite a bit of time exploring the basis of
probability. Probability is essentially a ratio. The ratio of a particular event or
outcome versus all the possible outcomes.
What are the odds that the second card in the deck is an ace, given that the
first card off the top of the deck was already revealed to be an ace? This would be
a case of conditional probability. Probability can even help us understand more
complex issues, like false positives in the world of medicine. Using Bayes’
theorem, you could calculate the chances of a person testing positive for a
disease even though they didn't actually have the disease.
This would be an example of a discreet random variable since when we roll the
die, the possible outcomes are one, two, three, four, five, or six. These are discreet
numbers, so we cannot get an outcome of 2.4 or 5.99. On the other hand, if we
are measuring the time it took runners to run 100 meters, now the outcomes are
continuous.
P a g e 6 | 73
We may have times of 12.45 seconds, 13.954, 10.35278. And so, as we explore
probabilities, we need to be aware of the differences between measuring
probabilities in systems with discreet outcomes and probabilities in systems
with continuous outcomes. Which brings us around to normal
distributions, probability densities, and even something called the Fuzzy Central
Limit Theorem.
And let's not forget, binomial experiments. Experiments where we only have two
possible outcomes. Pass or fail. Acceptable or defective. Heads or tails. All of
these things were introduced, explained, and explored in Statistics Fundamentals
Part 1. Do you remember these concepts? Do you know what they mean? If so,
good job. If not, don't be shy about revisiting these concepts in Statistics
Fundamentals 1 to refresh your memory.
In one way or another, a basic understanding of these terms and concepts will be
important in moving through Statistics Fundamentals 2.
- In Statistics Fundamentals part one, you saw means and medians, standard
deviations,probability, normal distributions. You now have a nice foundation
of statistics knowledge. So let's jump into some new material. What lies ahead in
Statistics Fundamentals part two? Well, it's time to start to put our foundational
statistics to use. We'll be looking at three primary issues in statistics, sampling,
confidence intervals and hypothesis testing. In our sampling section, we'll take
samples of data, calculate standard statistics like means or probabilities.
We will then look at the distribution of our results. Of course, we'll also
discuss how much data needs to be collected and how to properly collect this
data. So as we consider the issues of samples and sample size, we're really
looking to collect data in order to find meaningful statistics that will inform us
about a population. These are called inferential statistics.
P a g e 7 | 73
For example, what percentage of American adults are in favor of the death
penalty? Without asking every American adult, how can we find the answer to this
question? Take samples.
Measure the results of each sample. Of course, you'll need to consider when,
where and how these samples are collected. This is a serious issue and these
results could impact politicians, communities, the media and perhaps even the
courts. Understanding sampling is vital to truthful communication and informed
decision making. What happens if someone provides you with sample data and
asks you to infer from it answers to specific questions? Suppose I tell you that out
of a sample of 100 registered voters in a town with a population of 150
thousand, 60% of the 100 registered voters in our sample reported that they
planned on voting for keeping the present mayor.
Should the mayor feel confident that she will win the upcoming election? Does
this really mean that she can expect to get 60% of the votes on election
day? Well, we can't really say exactly what percentage of the votes the incumbent
mayor will get but we can create a confidence interval. We could provide
calculations that might tell us that based on the sample collected, we are 95%
certain that the incumbent mayor will get between 50.4% and 69.6% of the vote.
P a g e 8 | 73
This type of language illustrates of level of confidence or a confidence interval. If
you were the mayor, how would that report make you feel? As you can see,
confidence intervals provide a level of confidence for a given interval. In our last
section, we will put certain results to the test.For example, the university has one
thousand students. 500 students are male and 500 are female.
The annual tuition for this school is $50,000. 20 of the one thousand students will
be chosen at random to get free tuition for the year. 19 of the winners are
female. Only one is male. How can we test whether or not this outcome is a result
of chance versus likely being an outcome that was influenced by outside
forces? Hypothesis testing will provide us with a process to test these findings.
P a g e 9 | 73
As you can see, while means and probabilities still dominate the discussion, we
are now putting these numbers to work. By understanding sampling, confidence
intervals and hypothesis testing, you'll have the power to investigate, discover
and better understand the world around you.
Chapter 2 – Sampling
Sample considerations
Measuring everything is just way too expensive, too time consuming and in some
cases, it's just impossible. Political operatives can't poll every voter. Cell phone
companies can't measure the quality level of every single item the produce. A
farmer can't measure the actual size of every tomato grown. Scientists, they can't
track the health of every single person in the country. Instead of measuring
everything, they just measure a small group or subset of the total population.
P a g e 10 | 73
That small subset of measurements is a sample. And under the right
circumstances, this sample can act as a representative of the entire
population. Gathering that representative sample is challenging though. Let's
consider a political poll for the mayor of the city. The city has a population of one
million eligible voters. A polling organization is trying to predict the election
outcome between two candidates.
One named Silver and one named Diamond. The polling organization reports that
if the election were held today, Diamond would get 60% of the vote and Silver
would get 40% of the vote. You work on the Silver campaign, so you're
concerned. Before you panic though, you need to question the quality of the
sample used. What are probably some of your biggest concerns about the
sample? How many of the one million eligible voters were polled? Would a
hundred be enough, how about a thousand? Or would you want a sample size of
at least a hundred thousand? The bigger the required sample size, the more
expensive the survey is to conduct.
How are those people chosen, who actually decided to answer the survey, did
some people decline to be surveyed? Which organization did the polling? Did
they intentionally, or perhaps unintentionally collect data that was not
P a g e 11 | 73
representative of the population? Did they drive the poll to give
favourable results for one particular candidate? What specifically were the
questions asked in the survey? Were they confusing or misleading? And
depending on the nature of the study, political, scientific, environmental,
commercial or entertainment related, there are likely many other factors that
should be considered in evaluating the quality of a sample.
Yes, the size of a sample is important in determining the worth of the data
collected. But before we approach the concept of sample size, let's consider the
other aspects of gathering a quality sample. Strangely enough, despite the
endless list of sample considerations, the best samples, they're the ones that are
chosen at random. Yep, a random sample is the gold standard when it comes to
collecting data.
But as you might now expect in the world of statistics, nothing comes easy. And
so before we move forward, we confront the difficulty of gathering a truly
random sample.
Random samples
- So often, we cannot gather the data for an entire population. It's either too
expensive or just not possible. Ideally, statisticians look to gather sample
data from parts of the population. Actually, to get statistically reliable
results, statisticians need to make sure their data was selected randomly from the
population. And the most dependable type of data comes from what we call a
simple random sample. This means that the sample is chosen such that each
individual in the population has the same probability of being chosen at any
stage during the sampling process.
And each subset of k individuals has the same probability of be chosen for
the sample as any other subset of k individuals. But while the name tells
us gathering a truly random sample is simple, it's not. It's actually quite
difficult. Why? Well, a simple random sample must exhibit two key
P a g e 12 | 73
characteristics. The sample must be unbiased, and the data points must be
independent.
Let's discuss what these two things are, and let's also see why each characteristic
can be so elusive. A simple random sample is one where every member of the
population has an equal chance of being chosen. Now, why is this so difficult to
achieve? Let's consider surveys performed over the phone. We can just select
some phone numbers from a list. How could this be biased? Some people have
multiple phones, work, home, cell phones.
They may be more likely to get calls. The time of the call might better target
certain types of people. Some people may never answer calls from unknown
numbers. How 'about in-person polls and surveys? The location of the pollster
may favour certain people. The pollster may seek out certain types of
people. Perhaps the pollster is attractive or intimating to certain people. That
could influence participation.
If the data was collected on a university campus, perhaps the majority of the
people are from very similar demographics. How 'about online surveys? They are
completely optional. So what kind of people actually fill out these surveys? Can
the same person fill out multiple surveys? If the person polled is paid for their
participation in the survey, this might mean we are gathering data from people
that want or need money.
P a g e 13 | 73
This probably has you questioning almost every study or poll you've ever
seen. Yes, some unethical organizations intentionally seek out biased
samples. But as you can see, sometimes even the most reputable
organizations can have difficulty gathering a truly random sample. Let's move to
the characteristic of independence. The data in the sample must exhibit
independence. What does this mean? It means that the selection of one
member must not influence the selection of other units.
How 'about when you have a complete data set in a spreadsheet? Let's say you
have data for 5,000 adults that were surveyed. How can you gather a simple
random sample of 20? Well, in most spreadsheets, you can ask the spreadsheet to
choose 20 responses at random. If you plan on doing some statistics
computations, you may want to research your software package. Then again, who
is to say the 5,000 adult responses were gathered in the appropriate manner? So
the simple random sample can be elusive, but it is vital to presenting data that is
statistically dependable.
Yes, a simple random sample is the gold standard when it comes to collecting
data. But as you might have already discovered, in the world of statistics, nothing
comes easy.
P a g e 14 | 73
you understand, that the simple random sample is still the only way to get
dependable statistical outcomes. I can already hear you saying, hey, if the simple
random sample is the only way to ensure dependable results, why would anyone
use these alternatives? Well, these alternative methods are simpler to
organize, easier to carry out, and often, they seem both logical and sound.
P a g e 15 | 73
For an opportunity sample, the sampler simply takes the first n number of units
that come along. Suppose you wanted to do a study on being a parent in the
modern age. Maybe you distribute 10,000 surveys to parents of children that
attend public elementary schools in the city of Chicago. 525 surveys are
returned. The location, the age of the children, the public school setting, the letter
that accompanies the survey, and the motivations of those that actually fill
out the survey will likely create both bias and independence challenges.
P a g e 16 | 73
P a g e 17 | 73
A stratified sample is one where the total population is broken up into
homogeneous groups.Let's say, we're trying to figure out the average amount of
sugar in a single cookie, regardless of the type of cookie. We could break up the
population into so many different cookie types.Chocolate chip, peanut butter,
oatmeal, sugar, ginger, snickerdoodle, oatmeal raisin. From there, we might take a
sample of 30 cookies from each category.
P a g e 18 | 73
Perhaps chocolate chip cookies make up 50% of all cookies and ginger cookies
make up only 3% of all the cookies. Our very fair-looking system might actually
be biased against the most popular cookies. A cluster sample is similar to
stratified samples in that we are breaking things up into groups. What's the
difference? In stratified groups, all the members of each group were the same. In
clusters, the groups are likely to have a mix of characteristics.
They're heterogeneous. Suppose we are testing a new product. We might ask for
samples of people in 20 major cities, what they think about the new
P a g e 19 | 73
product. While the people in a single sample might all be from the same
city, each sample might contain men and women, people of different
races, politics and socio-economic backgrounds. So, as you can see, these
alternative sampling methods appear rather logical and in some cases, fairly
simple. Actually, my guess is that if you yourself have ever done a simple
study, you've probably used one of these methods to collect data, for the very
reasons I just stated.
P a g e 20 | 73
P a g e 21 | 73
Is that a bad thing? No, not really, but again, it is important to understand the
limitations of your study, the reported results and very likely, your analysis.
At the same time, these simple but flawed sampling techniques may be windows
into things we did not consider. They may inspire bigger questions, help us gain
funding, and ultimately, these alternative sampling methods may spark a more
robust and rigorous study. The simple random sample will always be the gold
standard, but these alternative sampling methods should not be completely
dismissed.
- A sample is a group of units drawn from a population, and the sample size is the
number of units drawn and measured for that particular sample. The total
population itself may be very large or, perhaps, immeasurable, so a sample is just
looking at a slice of the population in the hopes of providing us a representative
picture of the entire population. As you might guess, the larger the sample
size, the more accurate our measurement or, at least, the more confidence we
have that our sample is actually providing us a glimpse of the whole population.
P a g e 22 | 73
But just how important is sample size? Let's first establish how an experiment
might look. Let's say we own a machine that manufactures forks. The forks
manufactured from this system are either judged as acceptable or as
defective. You might remember this type of scenario from when we looked at
binomial random variables in Stats Fundamentals One.
Anyway, this magic fork-manufacturing machine, over its entire existence, it will
manufacture90% good forks and 10% defective forks. This machine will not break
down, it will not get tired, as I said, it's a magic fork machine. So the value of p for
this machine is 0.90. Remember, p is the proportion of good forks this machine
produces.
90% of all the forks this machine will ever produce will be good forks. Look, we
don't actually know that, the machine can't tell us that, and we can't know the
actual true p until the machine has produced a life-time of forks. But, if enough
samples are collected, over time, we would find that the average of the
proportions of all the samples measured would approach the true p, 0.90.
In our central limit theory video, I will show you how this happens, but for now,
just take my word for it. Anyway, if we had five forks in our sample and four were
good ones and one was a bad one, p-hat, which is the acceptable proportion for
this particular sample, p-hat would befour good forks over five total forks in our
sample. P-hat is equal to 0.80.
P a g e 23 | 73
So, you'd collect a bunch of samples, average their p-hats, and hope that the
samples were pointing you toward the true p. I know, I know, you're wondering,
what does this have to do with sample size? Well, without actually knowing the
true p, 0.90, using only the average of all of our previous p-hats, we'd like to
know, for any given sample, how likely is it that I'm close to the real p, 0.90? Let's
assume our p-hats are normally distributed.
If that's true, we can calculate our standard deviation for our p-hat and assume
that 68% of our p-hats would fall within one standard deviation of the true value
of p. And this is where sample size becomes important. Why? Well, let's take a
look at the formula for standard deviation, where n is our sample size. P and the
quantity one minus p, they won't change, but sample size will.
P a g e 24 | 73
So, the bigger the sample size, the smaller our standard deviation. For this
sample, if n is equal to five, then one standard deviation would be 13.4% from
90% in either direction, which means that, with a sample size of five, we would
expect 68% of all of our samples to have between76.6% and 100% good forks,
since we can't have more than 100% good forks in any sample.
If n equals 25, one standard deviation would be 6%. At n equals 100, one
standard deviation would be 3%. And at n equals 400, one standard
deviation would be 1.5%, which means that, with a sample size of 400, we would
expect at least 68% of all of our samples to have between 88.5% and 91.5% good
forks.
P a g e 25 | 73
As you can see, a larger sample size can really make us feel so much more
comfortable with our results. It gives us more confidence when we apply the
sample results to our larger population.
- The central limit theorem. Just saying the words can be a little intimidating. It
sounds complicated. Perhaps something only a hardcore statistician would dare
study. It turns out that it's a rather simple concept, a concept that's not only
important to our world of statistics, but dare I say, it's also a concept that is rather
interesting. Let's start simple. A distribution of discrete numbers. We start on the
left, where we have five values of five.
P a g e 26 | 73
We move right along our distribution. Two units of 10. Four units of 15. Six units
of 20. And on the right of our distribution, we have three values of 25. 20
different readings in our entire population. If we average out the values of our 20
different readings, we get an average of 15.0.Now suppose we didn't want to tally
up all 20 values, but we still wanted to find the average of the data set.
Could we use samples to direct us to the population mean? Let's try it. Let's take
samples of four units every day. Here's our first sample. Sample one, 10, 15, 20,
25. Our sample mean for this sample is 17.5. Sample two, five, 15, 20, and 20. Our
sample mean for this sample is 15.
Sample three, five, five, 15, and 20. Our sample mean here is 11.25. We have three
samples, thus we have three sample means, 17.5, 15, and 11.25. If we average
those, we get the mean of our means, 14.58. Not too far off from our actual
population mean of 15.0.
P a g e 27 | 73
Now I know what you're saying, these are three hand-picked samples, each with
only a sample size of four. Couldn't this just be luck? Well, the central limit
theorem tells us, that it's not luck. The central limit theorem tells us the more
samples we take, the closer the means of our sample means will get to the
population mean.
Actually, it's even more interesting, because as we start to take many more
samples, dozens of samples, hundreds of samples, even thousands of
samples, the sample means, if plotted as a histogram, would start to look like this.
Yes, we start to see that the sampling distribution of our sample means is looking
quite a bit like a normal distribution. But wait, it gets more interesting. In our very
simple example, we had a sample size of four. Suppose we increase our sample
P a g e 28 | 73
size to six. If we again take thousands of samples, look what happens to our
distribution.
And, look if our sample size increases to ten. As you are probably noticing, as
sample size increases, the curve becomes more normal. The curve got taller and
more narrow. And this means that the standard deviation gets smaller. As I said,
the central limit theorem is sort of simple. Really interesting, and obviously
incredibly important in the world of statistics.
But wait, there's more. Not only does the central limit theorem work with our
example with a tiny population and discrete values, it works with massive
populations and continuous values. So no matter if you're interested in learning
about the average test scores of a small school, the average weight of
watermelons grown in North America, or the political preferences of voters in the
United States, the central limit theorem is there to provide you the guidance to
understand the overall population with the assistance of some simple random
samples.
- We've already begun to see the impact of sample size. In general, the larger the
sample size, the more confidence we have in our results. Now let's shift our
attention to the standard error. In short, the standard error is the standard
deviation of our proportion distribution. Through an example, let's take a look at
how we calculate the standard error and also what that calculated number would
P a g e 29 | 73
mean to us. In the cell phone industry, companies struggle to keep their clients
happy.
Suppose a reputable, national poll finds that 60% of adults are satisfied with their
cell phone provider. Let's take that as our population proportion 0.60. We'd like
to see if cell phone service in our city reflects what is being seen nationally. In an
attempt to measure this, we'll take simple random samples of 100 cell phone
users in our city. Now, we know that 100 people can't possibly reflect the
satisfaction levels of everyone in our city, therefor we can assume that each
sample will carry with it some level of standard error.
So, how big is this standard error. Well the answer to that question really depends
on you guessed it: sample size. The standard error is ultimately related to
the standard deviation, our formula for standard deviation is seen here, P is the
proportion population, in this case 0.60 and n is sample size. In this case, 100.
So we can see that for N=100, our standard deviation is approximately 0.05 or
5%. And remember, if we assume a normal distribution, 68% of all the samples
taken should fall within one standard deviation of the population proportion. So
in this situation, for simple random samples with 100 ratings, we would expect
68% of the samples to provide sample proportions or p-hats between 55% and
65%.
P a g e 30 | 73
55% is our lower limit. 65% is our upper limit and 60% our population
proportion that would be at the center. So if tomorrow we gathered a simple
random sample of 100 cell phone customers and 57% of those customers were
satisfied, we can say that our city was likely on par with the national proportion
of 60% because we were within the 5% standard error.
Then again, if we could afford to take simple random samples with sample sizes
of 1000 cell phone customers. Notice what happens here, since N is now
1000. Our standard deviation drops to about 0.015 or 1.5%. So if N=1000, we
would expect 68% of those larger samples to have p-hats between 58.5% and
61.5%.
P a g e 31 | 73
So let's recap, the standard error in situations where we are looking at
proportions is the standard deviation. This is the formula for the standard
deviation of a sample proportion, p-hat. The bigger our sample size, the smaller
our standard deviation. This standard deviation is our standard error. The
standard error allows us to set up a range around the population proportion that
extends the equivalent of one standard deviation in both the positive and
negative direction.
The formula for our upper limit is p plus the standard deviation and for the lower
the limit, p minus our standard deviation. Once the range is established, if we
assume the probability distribution is nearly normal, then we would expect that
68% of the simple random samples gathered in the upcoming weeks, that
they would fall within one standard deviation or the standard error. What
happens when 68% of our samples are not falling within our calculated upper and
lower limits? Perhaps, it signals that something in our city is different form the
overall nation.
P a g e 32 | 73
Customers, companies, or a combination of the two might create a unique
environment in our city. Perhaps there is a flaw in the reported national
average of 60%, maybe their data gathering techniques were flawed. Perhaps, the
market has changed since that number was first reported or perhaps our
sampling method was biased. Don't forget, while standard errors are there to
help us judge and analyze future samples, samples that fall beyond the standard
error, they should be analyzed not necessarily judged as failures.
- Suppose we know that the average player in a men's college basketball league
weighs 180 pounds. Let's also say that the median player weighs about 190
pounds, so that means quite a few of the smaller players in the league are
bringing down that average. This league has over 4,000 players. Would we have
to weigh everyone of those 4,000 plus players to know the average weight of a
player in the league? Well, if you remember, the Central Limit Theorem tells us
that by taking some simple random samples, we can get a very good
approximation of the true population average.
If we take five random samples, with a sample size of only four, we might find
that those five tiny samples will have sample means that average to perhaps 182
pounds. Now, if we take five random samples, but increase the sample size to
25, we would likely see the mean of the sample means closer to 180.5
pounds. Don't believe me? Try this yourself.
There are plenty of online simulations that will run these types of
random experiments for you. That's actually how I came up with my numbers. I
played around with some of these online simulations. I had simulators pull five
P a g e 33 | 73
simple random samples with varying sample sizes for the given
distribution. Population mean of 180 pounds and population median of 190
pounds. These were the numbers I got. It's incredible, isn't it? Only five simple
random samples with sample sizes as low as four can get us very close to our
true population mean of 180 pounds.
Now, this basketball league has the capability of tracking all 4,000 plus players, so
we knew that the average weight of the players in this league was actually 180
pounds, but hopefully my example has helped convince you that even when we
have a massive population, the Central Limit Theorem tells us we can trust our
simple random samples to point us in the direction of the true population
mean. Let's say instead of 4,000 well-tracked male college basketball players, we
wanted to know the average weight of 18 to 24 year old men in United States
colleges.
There are millions of young males in college, and these men are not tracked
nearly as well as the basketball players, but that shouldn't be a concern. The
Central Limit Theorem tells us that if we are diligent in collecting simple random
samples, we should trust our sample means in this scenario, with a population of
over three million students, to be as accurate as the example of five samples
collected from the pool of 4,000 college basketball players. And it works the other
way too.
If you have a school of only 50 students, a very small population, you can
approximate the population mean for those 50 students by simply taking a few
simple random samples. The Central Limit Theorem. It really helps us feel at ease
P a g e 34 | 73
as we use sampling to help us approximate population means in so many
different scenarios.
- Through the use of the central limit theorem, we've seen how taking just a few
random samples can guide us in the direction of the population mean. Of course
when we use only a few samples to try and figure out a population mean, we
understand that the average of our sample means comes with a standard
error. So how do we figure out the standard error for our simple random
samples? Let's say we're trying to figure out how long it takes to get our coffee
drinks from our local cafe between 7 a.m.
P a g e 35 | 73
If we take the average of the sample means, we will find that the average time
to get a coffee drink was about 1.52 minutes. And the standard deviation of
those four sample means is 0.25 minutes. The standard deviation of our sample
means this is our estimated standard error. What's interesting is that there's a
relationship between the standard error of our population means and the
standard deviation of the population.
Take a look at this formula, Sigma Xbar is equal to Sigma over the square root of
our sample size n. Sigma Xbar is our standard error. So it is the standard deviation
of our four sample means. On the other side of the equation, we have another
Sigma, this Sigma is the standard deviation for the entire population.
So by plugging in our calculated standard error from our cafe example which was
0.25 minutes, and then plugging in 5 for n, our sample size. We can then solve for
Sigma, our populations standard deviation. We can see that based on these four
P a g e 36 | 73
samples with a sample size of five, we can estimate the population's standard
deviation is 0.56 minutes.
Again, we can see how sample size can have a huge impact on working with
samples to find out information about the entire population. In essence, what the
formula tells us is that if we use larger sample sizes, our standard error gets
smaller. This is also important as we collect samples in the future, why, well,
when we collect a sample of drink service times from our cafe tomorrow and the
rest of the week, we will know that 68% of our samples should have sample
means that fall within 0.25 minutes of 1.52 minutes, our average.
Upper limit would then be 1.77 minutes and our lower limit, 1.23 minutes. The
standard error formula is very simple, but still very informative, by
understanding the simple relationship between sample size, the standard
deviation of our population and the stand deviation of our sample means, we can
better understand our population as well as the samples we take going forward.
- At this point, you should hopefully feel comfortable with the concepts of sample
size and the central limit theorem. The central limit theorem tells us that if we
take enough simple random samples, we can get an excellent approximation of
our population means. In other words, rather than measure everything in the
population, we can take some random samples. Those random samples will
provide us with the measurements that will be nearly normal in our
P a g e 37 | 73
distribution, and will direct us to the population mean. And you also, hopefully,
remember that the larger the sample size of those random samples, the smaller
the standard deviation of our distributions, so the more certain we are about our
resulting population mean.
Lots of samples make us feel confident about our population numbers. I have so
many data points, the evidence to support our approximation is very strong. In
this section, in which we will cover confidence intervals, we're going to go in the
opposite direction. What happens when we have only one sample? If we have
only one sample, how confident are we that this single sample mean is near our
actual population mean? In this section, you'll often see results that look like this.
We are 95% confident that the average adult in the United States drinks between
two and three liters of beverages per day. As you can see, one random sample
will allow us to calculate our range and attach to it a level of confidence. Think
about how incredibly powerful this is, the efficient use of resources, the overall
savings, and the ability to be 95% confident, or perhaps even more confident
than that.
But before we move on, let's take a moment to discuss what a 95% confidence
level means. It means that if, instead of taking a poll only once, we took a similar
poll 20 times. 19 times, the results of the poll, in other words, the resulting ranges
of those 19 polls, they would capture the population mean. But of course, one of
the 20 times, the reported range would not include our population mean.
P a g e 38 | 73
Remember that the next time a pre-election poll predicts the wrong candidate to
win. So, let's get started with that exact example. Let's see how they create
confidence intervals for those pre-election polls.
- Let's create a 95% confidence interval for an election poll where the voters have
two choices: Candidate A and Candidate B. As you may have guessed, we'll be
working with proportions. Before we start creating a 95% confidence interval for
this scenario, let's recap a few things. First, if we took a lot of voter samples, the
distribution would be approximately normal. Second, the larger the voter sample
size, the smaller the variation, and thus, the smaller the standard deviation of the
resulting distribution.
Third, a 95% interval is one where we are 95% certain that our interval which will
be centered at the sample's proportion will contain the actual population
proportion. Now what we're going to do is take a single sample. This single
random sample might include 50 eligible voters, but the resulting sample
proportion of eligible voters that favor Candidate A is a single number we can call
p-hat.
This single sample proportion is a single dot on this distribution. Now the
question is, is it a dot that is close to the population's true proportion or is it a
dot that is very far away from the true proportion. I also want you to
remember we don't actually know the true population proportion. So while our
single sample is here, we don't know if the true population proportion is here or
here or here.
P a g e 39 | 73
So, what are we actually doing? Well, let's imagine that this is the true population
proportion distribution. We're going to take a sample and build an
interval around that sample. And we're going to hope that the true population
proportion falls within that interval. So maybe, this is our sample and its
interval. That interval actually contains the population proportion.
Here's another sample proportion and its interval. This too captures the
population proportion.
And here's another sample proportion and its interval. But as you can see, this
interval does not contain the population proportion. Now, what we really want to
P a g e 40 | 73
create is an interval, an interval of a certain length we will call y. This interval will
have a lower limit and an upper limit.
And this interval will be centered on our sample proportion, p-hat. This interval
would need to be big enough where if we took 20 samples, 19 of the 20
intervals would contain the true population proportion. In other words, 95% of
my samples would have an interval within range of p. So let's recap. When we
gather a sample with a certain sample proportion p-hat and when we create an
interval around that p-hat, we are 95% certain that the real population proportion
is somewhere between the lower and upper limit of that interval.
And now that we understand what we're trying to do, let's attempt to create a
real confidence interval.
P a g e 41 | 73
Anything over 50% in the real election would result in a win for your
candidate. So far, based on the results of the poll, things look promising for your
candidate but remember, this was just one sample with a sample size of
100. Now, look, I understand that my small pre-election poll likely didn't provide
the actual percentage of votes Candidate A will get on the actual day of the
election but maybe we're close.
So let's create a 95% confidence interval. In other words, let's use our sample
result to create an interval that very likely includes the actual percentage of
votes Candidate A will get on election day. Let's take a look at a normal
distribution curve. If we want a 95% confidence level, that would mean we'd want
to capture 95% of the area under the curve between two points equidistant from
the sample proportion which means 2.5% of the area under the curve on the right
side of the curve would not be included and 2.5% of the area under the curve on
the left side of the curve would not be included.
So how do we find these two points which establish our interval? We'll have to
take a look at z-scores. Z-scores tell us how many standard deviations away
from the mean we would need to be to capture a certain percentage of the
total distribution. So we pull up a z-score table. We find 0.975. This means
P a g e 42 | 73
97.5% of the data points are to the left of this point and thus, 2.5% of the data
points would fall to the right of this point.
As you can see, our z-score is 1.96. This means if we go 1.96 standard
deviations in the positive direction and 1.96 standard deviations in the negative
direction, 95% of the area under our distribution will fall between these two
limits. So let's see where we are so far. Our sample proportion is 0.55.
P a g e 43 | 73
This will be the center of our interval. Therefore, the upper limit will be 0.55 plus
1.96 times our standard deviation and our lower limit will be 0.55 minus 1.96
times our standard deviation. So of course now, we need to find our standard
deviation. In the absence of the population proportion and the population's
standard deviation, we can use the formula for the standard error.
It's basically the formula for the standard deviation but we use the sample
proportion as our p hat. Since we use the sample proportion, p hat, instead of
the population proportion, p, we can't call it the standard error. When you
use the sample proportion, it's called the sampling error. So we put p hat into
our formula and of course, in our particular example, we polled 100 voters so n is
equal to 100.
P a g e 44 | 73
As you can see, our standard deviation or sampling error in this case is 0.05. So
when we plug that standard deviation into our formula, we get an interval that
has a lower limit of 0.452 and an upper limit of 0.648. Uh oh, our interval goes all
the way down to 0.452 which means that our margin of error tells us that losing is
still possible.
Then again, a large proportion of our interval is over 50%. Still, your candidate is
probably a bit nervous but how about if we took a bigger sample? How about if
the campaign is willing to fund a poll of one thousand voters? The numbers on
this poll are a bit lower for your candidate. This poll tells us 54% of the voters are
for Candidate A. Let's calculate our sampling error with our new sample
proportion and sample size of one thousand.
Our sampling error is now 0.16. So let's calculate our interval limits. Look at
that. Our new interval stretches from 0.509 to 0.571. If the election were to have a
result identical to anything within our 95% confidence interval, Candidate A
would win. Remember, according to this sample, there is now a 95% chance that
on election day, Candidate A will receive between 50.9% and 57.1% of the vote.
P a g e 45 | 73
There's only a 5% chance that the election day results will fall outside of that
interval. And don't forget, it's possible that those 5% might include results that
are even better than 57.1% of the vote for Candidate A. No matter how you slice
it, that should make Candidate A's team feel pretty good, right? Yeah, there's
always that one person on the team that asks, can we get a 96% interval or what
about 98%? So for those people, next, we'll create confidence intervals that are
greater than 95%.
- For some people, 95% just isn't good enough. So what happens if someone
demands a 98% confidence interval? Well, let's remember a 95% confidence
interval stretches in equal distances in opposite directions from our sample
proportion. How far? Enough to include 95% of the probability distribution, which
means that a 98% confidence interval would have to stretch a little bit farther so
our interval would include 98% of the probability distribution.
Notice my numbers didn't really get any better. It's more like saying, "I'm 75%
sure "my lost car keys are in my living room, "but I'm 99% sure my lost car
keys "are somewhere in this house. I simply increase the likely location of my
keys and that increased the likelihood that this area contained my keys. So, when
someone demands that we provide a 98% confidence interval instead of a
confidence interval of 95%, it's important that they understand what the
difference is between the two intervals.
With that in mind, let's go ahead and figure out how to calculate the limits of this
expanded interval. Actually, it's very simple. Remember, to find the limits of our
95% confidence interval, we used these formulas. Our sample proportion, p-hat
plus or minus our sampling error, which is really just our sample's standard
deviation, times 1.96. Why 1.96? Because that was the appropriate z-score for
95%.
P a g e 46 | 73
So really, the only thing we will change to adjust our interval is the z-score of
1.96. But how do we find the right z-score for, let's say, 98%? Don't get fooled,
you do not want the z-score for 0.98. You actually need the z-score for
0.99. Why? Well, let's take a look at our distribution.
We want to set limits where 98% of the data is under the curve between the
limits, so 2% of the distribution falls outside of the limits. 1% of the distribution
on the right end of the curve and 1% on the left end of the curve. So we need to
find the z-score for 0.99. Here's a z-score table.
P a g e 47 | 73
Within the table, we are looking for 0.9900 or the closest number that is greater
than that. In this case, the appropriate z-score is 2.33, and since the interval will
stretch in equal distances in opposite directions, we will use 2.33 for both our
upper and lower limit. So, if we poll 1,000 people and 540 of those people favor
Candidate A instead of Candidate B, then we know we have a p-hat of 0.54 and
we know that n is equal to 1,000.
P a g e 48 | 73
Using this formula, we can calculate our sampling error. Our sampling error,
therefore, is 0.017.We now have what we need to calculate our 98% confidence
interval. Look at that. If an election victory requires at least 50% of the
vote, victory is still within our margin of error. I know, I know there's always the
person that wants more. How about 99%? Using the same logic as with 98%, we
realize we need to find the z-score for 0.995, which is 2.58.
Let's calculate our 99% interval. Here we see that the chance for a narrow election
loss is within our margin of error. Nonetheless, it looks like, based on our simple
random sample, we can feel fairly confident that an election win is likely. Still,
even with this strong statistical evidence, the candidate can lose.
If the candidate lost after we reported that a win seemed rather likely, how might
that loss be explained? Let's try and figure that out next.
And now they realize that they actually lost. How could this have happened? Well
this is where it helps to be a well rounded statistician. Beyond having a
knowledge of the numbers and formulas, you need to understand the real
environment that surrounds the poll. In this case, it would be helpful if we
understood how political polls are done and also the nature of the actual
election. What might go wrong during the actual poll? Lying, Correspondents
might want to throw off the polls, they may just lie.
Or perhaps, they're embarrassed to tell a pollster about their true opinions, and
thus they would rather give them an answer that would please the
pollster. Maybe they didn't lie. Perhaps the respondent just changed their
mind between the time of the poll and the actual date of the election. It's possible
some people were unsure who they wanted to vote for on the day of the poll. But
they chose a candidate for the poll just to please the pollster.
Perhaps, there were issue in gathering a random sample. The location of the
poll. The people chosen. How they were chosen. The incentives used to entice
more participants. It takes a very experienced organization to gather a truly
random sample. Sometimes, in an effort to influence voters that are still uncertain
about who they will vote for, some politically biased polling organizations might
P a g e 50 | 73
actually seek biased polling results that they can use in the media to show that
their candidate is popular among likely voters.
These organizations may have had poor sample selection, poorly worded
questions, or other questionable if not deceitful practices. As you can see, the
polling process itself is filled with challenges. So let's move on to election
day. What might go wrong on Election Day. Bad weather. A health epidemic.
Unsafe travel conditions. Car trouble. Work or family commitments. This is just a
short list of reasons people might not be able to get to the voting booth on
Election day. Perhaps between the time of the pre election poll, and election day,
something changed. An event occurred that changed the way voters made their
decisions. Perhaps a scandal involving one of the candidates was
uncovered. Sometimes, voters just choose to stay home.
Why? They don't really care that much. Maybe they just forgot. Perhaps they
heard the lines were long. Maybe they had been watching the news, and analysts
made it seem as though the outcomes were certain. Perhaps the voter thought
the election is a done deal. My vote isn't going to make a difference at this
point. Hopefully you aren't growing distrustful of statistics. If you investigate
confidence intervals that have been reported over the last few decades, you'll see
that very often, they are very accurate.
Nonetheless, when the confidence interval misses the mark, it's important to
know where the poll might have gone wrong. Actually, it's best to know these
things before the study is even performed. If you're looking to use confidence
intervals to make important decisions, be sure you investigate how the study was
done and which assumptions and simplifications were included in the
P a g e 51 | 73
development of the study. As I said, a good statistician has to know more than
numbers and formulas.
They need to really understand the environment that they're looking to measure.
P a g e 52 | 73
So, if we wanted a different confidence interval, we would just find the
appropriate Z-Score. Now let's move from proportions to means. Here are the
formulas for developing a 95% confidence interval for means. Pretty much the
same thing, except we substitute in the sample mean where we had the
population proportion. So, suppose we wanted to create a 95% confidence
interval for the average time it takes females, in their twenties, to run a mile.
We then use this standard deviation, 1.36, to compute our sampling error. If we
plug these numbers into our formulas for the 95% confidence interval, we will get
an interval from 8.08 minutes to 9.84 minutes. According to this simple random
sample, there is a 95% chance that the population mean for the one mile run time
for females in their twenties is somewhere between 8.08 minutes and 9.84
minutes.
P a g e 53 | 73
Just as with our proportion intervals, by adjusting our sample size, which
influences our sampling error, we can impact the size of the interval. And, of
course, when we ask for an interval with a different confidence level, we can just
adjust our Z-Score, which will also influence our confidence interval. There you
have it.
You are now ready to develop confidence intervals for situations that use
proportions as well as situations that require means. Congratulations.
- Have you ever come upon situations, outcomes, or events that just seem
odd? In a city made up of 51% women, where jury pools are said to be chosen at
random, a certain jury pool of 50 people contains only eight women. A national
restaurant chain provides a game piece for every drink a customer buys. There are
10 prizes worth over $100,000. Two of those prizes are won by relatives of
restaurant employees.
Three employees from a particular chemical factory with 400 employees are
diagnosed with brain cancer in a two-year period. When you hear things like this,
they make you think. It doesn't seem right. Is that even possible? And if it is
possible, how likely is it that it could have happened at random? Sometimes these
questions and the related answers could impact our careers and companies.
They may help us make decisions. They might influence our superiors to
act. Perhaps you work at a healthcare company. Your company has developed a
P a g e 54 | 73
drug to treat the common cold. It's reported that the average adult with the
common cold will experience cold symptoms for about 8.5 days. When testing
this new medicine on a random sample of 250 people with the common cold, it's
found that these patients recovered about 1.2 days sooner that those that did not
take this drug.
Is this significant? Could this sample just be the result of chance, or did this drug
have an impact? Should the drug be tested further? Does this mean this new drug
should be approved for use? This is where hypothesis testing comes
in. Hypothesis testing is an extremely popular method for exploring outcomes. In
general, statisticians will make an assumption about a population.
They then collect a random sample from the population. They measure the
sample. And finally, they see whether or not the sample measurement supports
their assumption. It can be complex, but when done properly, hypothesis testing
can be extremely powerful. But, hypothesis testing really requires that you put to
use almost everything you've seen in Stats Fundamentals One, as well as
everything covered thus far in Stats Fundamentals Two.
Science, medicine, business, education, public policy, and even sports and
entertainment. No matter your field, make sure you understand the most basic
elements of hypothesis testing.
- The adult residents of a large town with an adult population of 35,000 are half
male and half female. Each week, 50 adults are chosen at random to participate in
jury duty. Women have complained that they are getting called to jury duty more
often than the men. Jury administrators contend the system is random and fair. A
committee is setup to investigate. They use the next jury pool as a sample. They
find that in that pool of 50 potential jurors, 14 are men and 36 are women.
In our first step, we need to setup our hypotheses. There will typically be two
hypotheses. H sub-zero or H not, this is our null hypothesis. We might refer to
this as what we consider to be the status quo. In our case, this basically
accepts that these jury numbers did happen by chance. In this hypothesis test, the
null hypothesis states that everything's okay. And thus, the odds of a women
being picked for the jury duty was at least 50%.
P a g e 56 | 73
So, our null hypothesis is p is less than or equal to 0.50. In other words, women
had a 50% chance, or perhaps even less than a 50% chance, of being chosen for
jury duty. Let's move to our alternative hypothesis, H sub-a, This would be the
opposite of the null hypothesis. This one would say that women did not have a
50% chance of being chosen for jury duty.
In fact, the chance of a women being chosen for the jury are greater than 50%. So
here, our alternative hypothesis is p is greater than 0.50. With our two hypotheses
now stated, we'll then also want to state as significance level. Essentially, this sets
a threshold for our test. In other words, suppose through our test we find that 36
or more women might end up on a 50 person jury pool by chance 30% of the
time or 20% of the time.
Or, what if it's only 10% of the time? Would you believe that this actually
happened by chance or would you say, if it is below some significance level
alpha, you would think that something was wrong. So, let's set our significance
level at 5%. If 36 or more women ending up on a jury has less than a 5% chance
of occurring at random, then, we will reject our null hypothesis.
P a g e 57 | 73
In our second step, we look to find a statistic that will assess the validity of our
null hypothesis. How could we see if this outcome, 36 women and 14 men, or an
outcome that is even more extreme, could happen at random under these
circumstances. Here, we'll go back to stats one, and use binomial
probabilities. Our p here is equal to 0.50. That's the probability of a women being
chosen for the jury panel.
And, we will have 50 trials since that's how many seats there are on the jury
panel. Finally, the number of successful trials will be 36. We have our null
hypothesis. We also have our test statistic. Now, we find the p value for that test
statistic. The p value is the probability that this outcome, 36 women and 14
men, or an outcome even more extreme, could occur by chance.
So, we're looking for the probability that the number of women chosen for the
jury panel would be 36 or more. What do we find? Whether you did the
calculation the hard way, with a binomial table, with a spreadsheet, or perhaps
with an online binomial calculator, you would find the probability of this
particular outcome would be 0.0013 or 0.13%.
P a g e 58 | 73
Those are some pretty long odds. So, our final step, time to compare our p
value to our fixed significance level. What we found was that assuming that the
odds that a man and a women were equally likely to be chosen for a jury there
was only a 0.13% chance that at random 36 or more women would be chosen for
a panel of 50 potential jurors.
Our fixed significance level, alpha, was 0.05 or 5%. Clearly, the p value fell short of
our significance level and thus, we must reject the null hypothesis. This means, we
believe that something is making it much more likely for a women to be chosen
versus a man. Now, it doesn't prove that the cause is evil or intentional, nor does
it prove that the cause is unintentional and innocent.
It simply means, we reject the null hypothesis. Let's briefly talk about that
too. Our only outcomes possible for this test would have been to reject the
hypothesis, which is what we just saw. The alternative would have been do not
reject the null hypothesis. Notice, that does not mean we said accept the
hypothesis. We were looking to contradict the hypothesis. It's sort of like saying, a
person on trial is guilty or not guilty.
Guilty means the evidence is there to convict. Not guilty means there was a lack
of evidence. Not guilty does not necessarily mean the jury believed the person
was innocent, they just lacked the evidence to prove guilt. Hypothesis testing is
an extremely important part of the world of statistics and every field that leans of
statistics for assistance. This was a very abbreviated version of the hypothesis
testing process.
P a g e 59 | 73
It's probably not enough to make you an expert, but it will be helpful in
understanding statistical studies presented at work and also in the media.
The company that developed this medicine thinks the drug should be considered
for federal approval. Finally, consider the national average for the college
entrance exam, 1000 points. The Regent Test Prep Academy claims that their
P a g e 60 | 73
students consistently beat that national average. These are all situations where
hypothesis testing would be useful. But each of these situations would require a
different type of hypothesis test.
Let's look at each situation individually. In our first situation, we had a claim that
said people between the ages of 18 and 24 checked their phones 74 times per
day. Some folks doubted that claim though. Notice, this group did not say the
number was too high, nor did they contend the number was too low. They just
expressed doubt in the stated average of 74 times per day. In this case, out
hypotheses look like this.
Our null hypothesis, H sub zero, or H naught. Mu is equal to 74.0. Our alternative
hypothesis H sub A, Mu is not equal to 74.0 if we look at our normal
distribution, what we have is this, 74.0 is the mean of our null hypothesis. Let's say
that we thought that anything more than 1.7 standard deviations from the mean
in either direction, would mean we could reject our null hypothesis.
On the the other hand, anything that was less than 1.7 standard deviations, from
the null hypothesis mean, would tell us that we could not reject the null
hypothesis. As you can see, we have two rejections areas here, one rejection area
in the positive direction, greater than the mean. The other in the negative
direction, less than the mean. This is considered a two tailed test because the null
hypothesis is tested in both directions.
P a g e 61 | 73
On the other hand, in our example where the average person recovers from the
cold in 8.5 days, our test group recovers in 7.3 days. This is a one tail
test. Why? Well in this case, our hypotheses look like this. Our null hypothesis H
sub zero, Mu is greater than or equal to 8.5 days. Our alternative hypothesis H
sub a, Mu is less than 8.5 days.
So our null hypothesis is saying that patients do not recover faster with the
drug and perhaps they may even take longer to recover. Both of these
situations would indicate the medicine was not helpful. The alternative
hypothesis is indicating that the medicine does in fact have an impact. On our
normal distribution graph, we have 8.5 as our null hypothesis mean.
We can set two areas, 1.7 standard deviations from the mean. The difference here
is that the drug can only be considered helpful if the patients actually get better
faster. Which means that this area to the left, this small single tail, represents the
area where we would reject the null hypothesis. The large area the right would
indicate that we could not reject the null hypothesis.
That would be bad news for the drug company. So lets take a look at our test
prep school. This example is very similar to the cold medicine example. The
P a g e 62 | 73
difference here is that we are looking for an increase in the test scores. Here are
our hypotheses for this situation. Our null hypothesis, H sub zero, H naught. Mu is
less than or equal to 1000 points. Our alternative hypothesis H sub A, Mu is
greater than 1000 points.
Our null hypothesis is saying students of the Regent School do not see increased
test scores. Instead they see average scores or perhaps even below average
scores. The alternative hypothesis says that the Regent students do score over the
national average. What do we see on our normal distribution? Again, we see one
tail. If we land in this area on the right, we would reject the null hypothesis.
If we land anywhere else in the large area to the left, we would not reject the null
hypothesis. As you start to look for opportunities to utilize hypothesis testing, be
sure you consider whether your hypothesis test is a one tailed test, or a two tailed
test.
P a g e 63 | 73
This hypothesis states that the candidate would get 50% or less of the votes, and
thus not have enough of the votes to win the election. H sub a, our alternative
hypothesis, the candidate wins. This would be the opposite of the null
hypothesis. This one would say that this candidate would get a majority of the
vote and thus win the election. Our alternative hypothesis is p is greater than 0.5.
Our significance level for this test will be 5%. If this has less than a 5% chance of
occurring, then we reject our null hypothesis. We're looking at a one-tailed
test, where the rejection region is on the right-hand side of our distribution. If our
proportion does not fall in the rejection region, we'll not have enough evidence
to reject the null hypothesis.
So let's go to step two, we're going to identify the test statistic. In this situation,
our test statistic is a z-score. We will call it z sub p. This z-score will establish the
point on our distribution which divides the do not reject area from the rejection
area. p hat is the sample proportion. p sub zero is the proportion from our null
hypothesis. n is our sample size.
P a g e 64 | 73
So in our case, p hat is equal to 0.54. p sub zero is equal to 0.50. And n our
sample size is 500.And if we use these numbers, we get a z-score of 1.79. So now
we move on to step three, our p-value. So our test statistic z sub p is 1.79. If we
look at this number on our z-score chart, you'll find that 1.79 leads you to 0.9633.
So our p-value is 1 minus 0.9633 which gives us a p-value of 0.0367. Step four, so
now we're going to compare our p-value to our fixed significance level. Our fixed
significance level, alpha, was 5% or 0.05. Our p-value was 0.0367.
That's smaller than our 0.05 significance level, thus we can reject the null
hypothesis. Graphically, we can look at this a few ways. Here's our
distribution. We established that the left side of the distribution is the do not
reject the null hypothesis area. The right part of the distribution is the reject the
null hypothesis area. Our alpha was 0.05, which means that 95% of the
distribution was on the left side of the distribution, and 5% was to the right.
We can also look at this by comparing z-scores. The z-score for 0.05 on a one-
tailed test is 1.65.That would be 1.65 standard deviations from the null hypothesis
P a g e 65 | 73
population proportion, which was 0.50. Our calculated z sub p though, was
1.79. 1.79 standard deviations from the population proportion.
Again, we land in this region on the right. So we reject the null hypothesis. The
politician can breathe easy. Unless they demand a hypothesis test with a 2%
significance level. So here is where the 2% significance level would be on our
distribution. Here is where our p-value of 0.0367 would put us. If we wanted to
use our z-scores, the z-score for 0.02 on a one-tailed test is 2.06.
That would be 2.06 standard deviations from the null hypothesis population
proportion which was 0.50. Our calculated z sub p though was 1.79. No matter
the method, we are now in the do not reject the null hypothesis area. In this
P a g e 66 | 73
case, the hypothesis test tells us we cannot reject the hypothesis that the
candidate will get 50% or less of the vote.
- K-Nosh is a national gourmet dog food company. They sell thousands of bags
of dog food each day. They sell dog food in eight, 20, and 40-pound bags. And
the 20-pound bag is by far the most popular size. K-Nosh's high-end customers
demand outstanding products and excellent service. Customers don't want a bag
with less than 20 pounds. So while the bag is labeled as 20 pounds, K-Nosh sets
the desired weight of each bag at 20.15 pounds to ensure customers get at least
20 pounds in each bag.
Each day, K-Nosh employees pull a random sample of 100 bags out of the
thousands they ship. Based on the 100-bag sample, they will either send out the
shipment or they will reject the shipment for that day. Today's sample had an
average weight of 20.10 pounds, and the population standard deviation is 0.26
pounds. So let's start our four-step process.
Step one, develop the hypotheses and state the significance level. So, let's
develop our hypotheses. Our null hypothesis, H sub zero or H-naught, mu is
greater than or equal to 20.15 pounds. This hypothesis states that the bags of
dog food are equal to or greater than 20.15 pounds. It's what we would consider
the standard state. Our alternative hypothesis, H sub a, this one says the bags of
dog food weigh less than 20.15 pounds.
P a g e 67 | 73
This would be the opposite of the null hypothesis. Our alternative hypothesis is
mu is less than 20.15 pounds. As usual, we will see whether or not we will reject
the null hypothesis. If we reject the null hypothesis, that would mean K-Nosh
would not make any shipments of 20-pound bags on that date. Our significance
level for this test will be five percent. If this has less than a five percent chance of
occurring, then we reject our null hypothesis.
We're looking at a one-tail test where the rejection region is on the left-hand
side of our distribution. If our sample falls in the rejection region, we will reject
the entire shipment. So now we move on to step two, identify the test statistic. In
this situation, our test statistic is a z-score. This z-score will establish the point on
our distribution which divides the "do not reject" area from the rejection area.
x-bar is the sample mean. Mu is the mean from our null hypothesis. n is our
sample size. And sigma is the population standard deviation. In our case, x-bar
was equal to 20.10 pounds. Mu was 20.15 pounds. n, our sample size, is 100. And
sigma, the population standard deviation, is 0.26 pounds.
P a g e 68 | 73
Step four, we're going to compare our p-value to our fixed significance level. Our
fixed significance level alpha was five percent or 0.05. Our p-value was
0.0274. That's smaller than our 0.05 significance level. Thus, we have to reject the
null hypothesis. Graphically, we can look at this a few ways.
Here's our distribution. The left side of the distribution is our "do not reject the
null hypothesis" area. The right side of the distribution is the "reject the null
hypothesis" area. Our alpha was 0.05 which means that 95% of the distribution
was on the right side of the alpha and five percent was to the left. Remember, we
want to be close to our goal of 20.15 pounds. If we're too far to the left of 20.15
pounds, the bags are likely too light.
P a g e 69 | 73
If we compare z-scores, the z-score for 0.05 on a one-tail test is -1.65. That would
be 1.65 standard deviations from the null hypothesis mean, 20.15. Our calculated
z though was -1.92,1.92 standard deviations from the mean. No matter how you
look at it, we must reject the null hypothesis.
The bags are too light. And so, we must reject the entire shipment. I'm guessing
some of you might see this as harsh. But believe it or not, this quality control
technique which is called acceptance sampling was very popular in the past and is
still used in some industries today.
- In our hypothesis tests, we've always set up a null hypothesis and an alternative
hypothesis. The null hypothesis typically assumes that the status quo prevails. The
null hypothesis might state that the system works, it might tell us that nothing
has changed in our system. Our alternative hypothesis assumes the opposite. The
alternative hypothesis might tell us that the system is broken. It might tell us that
things have changed. Let's use a special type of cancer screening test as an
example.
This fictional screening would provide a reading based on your blood. The
average reading is 100. People that get a reading over 125 get a positive
result. This would indicate they have cancer. If we were going to equate this to a
hypothesis test, we would say the cancer screening had two hypotheses. The null
hypothesis would be that everything is okay. The person being tested does not
have cancer. The alternative hypothesis would state that the person being tested
does, in fact, have cancer.
P a g e 70 | 73
Let's say that the incidence of cancer is normally distributed. So, if we were going
to look at this on a normal distribution, we might say that 100 is the
mean. Anything to the right of 125 would be considered a positive result for
cancer. So, left of 125, we do not reject the null hypothesis, but to the right, we
would reject the null hypothesis. Up until now, we've assumed that if you are
beyond 125, the patient has cancer, but remember, even if 125 represented an
alpha of 0.02 or 2%, it would mean that it is extremely unlikely that someone with
a reading over 125 is cancer-free.
It's unlikely, but with an alpha of 2%, it's not impossible. Just as political polls
sometimes predict the wrong candidate to win, cancer screening tests also make
mistakes. But there are two types of mistakes or errors. Let's look at this small
grid. At the top, we see the true state of the system. The patient does not have
cancer, which agrees with our null hypothesis.
And the patient has cancer, this would agree with our alternative
hypothesis. Along the side, we have the two possible outcomes of the test. The
test comes back positive, which means that according to the test, they have
cancer. This is the equivalent of rejecting our null hypothesis. How about the
second outcome for our screening test? The test comes back negative, which
means that according to the test, they do not have cancer.
This is the equivalent of not rejecting our null hypothesis. Now, let's look at the
possible results. If we get a negative test and the patient does not have
cancer, the hypothesis test worked. If we get a positive test and the patient
actually has cancer, the hypothesis test worked. But how about these other two
quadrants? It's possible a person might get a positive test but not actually have
cancer.
P a g e 71 | 73
This is what we would call a Type One Error. Typically, we refer to this as a false
positive. This is the same as a person getting a reading over 125, but not actually
having cancer. If we start to see lots of Type One Errors, Perhaps our screening
test is not sensitive enough. You might start to question if there are better
ways of testing the null hypothesis. The opposite is also possible.
A person might get a negative test, even though they do have cancer. This is what
is called a Type Two Error. This might also be referred to as a false negative. This
would be the same as a person getting a reading under 125, even though they
have cancer. If we start to see lots of Type Two Errors, that may mean our
screening test is too sensitive. Again, we may need to question how we are
testing our null hypothesis. Hypothesis tests, even when they are done the right
way, can be flawed.
So, it's important to understand that a hypothesis test might make a mistake. And
by knowing the different types of errors, Type One and Type Two, it can help
you in developing and interpreting our hypothesis tests and the subsequent
results.
Conclusion
Next steps
- Congratulations, you made it. You survived sampling and sample size. You
created confidence intervals, and you performed some very basic hypothesis
tests. If you weren't already surprised, interested, or perhaps skeptical of the
statistics that you encounter at work and in the media already, you're probably
now hypersensitive to any statistics put in front of you. You might even ask
probing questions about sampling methods. Perhaps you get excited when you
see poll results listed with their margins of error.
And you're probably also keenly aware when people misuse statistics, or when
they present data that is very likely unreliable. Statistics Fundamentals Part 1
provided a really nice foundation that allowed you to interpret data sets and
calculate basic probabilities. And now Statistics Fundamentals Part 2 has given
P a g e 72 | 73
you the ability to collect reliable data, to establish confidence intervals, and to
test hypotheses. Again, congratulations
P a g e 73 | 73