0% found this document useful (0 votes)
96 views35 pages

IV AI-DS AD3491 FDSA Unit3

Unit III covers inferential statistics, focusing on populations, samples, and various sampling methods. It explains the importance of random sampling, hypothesis testing, and the two-sample t-test, including its assumptions and calculations. The document also discusses sampling distributions and their significance in making informed decisions based on statistical analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views35 pages

IV AI-DS AD3491 FDSA Unit3

Unit III covers inferential statistics, focusing on populations, samples, and various sampling methods. It explains the importance of random sampling, hypothesis testing, and the two-sample t-test, including its assumptions and calculations. The document also discusses sampling distributions and their significance in making informed decisions based on statistical analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 35

UNIT III INFERENTIAL STATISTICS

Populations – samples – random sampling – Sampling distribution- standard error of the mean -
Hypothesis testing – z-test – z-test procedure –decision rule – calculations – decisions –
interpretations - one-tailed and two-tailed tests – Estimation – point estimate – confidence
interval – level of confidence – effect of sample size.

 3.1 POPULATIONS

In statistics as well as in quantitative methodology, the set of data are collected and selected from
a statistical population with the help of some defined procedures. There are two different types
of data sets namely, population and sample. So basically when we calculate the mean deviation,
variance and standard deviation, it is necessary for us to know if we are referring to the entire
population or to only sample data. Suppose the size of the population is denoted by ‘n’ then the
sample size of that population is denoted by n -1. Let us take a look of population data sets and
sample data sets in detail.

Population
It includes all the elements from the data set and measurable characteristics of the population
such as mean and standard deviation are known as a parameter. For example, All people living in
India indicates the population of India.
There are different types of population. They are:
Finite Population
Infinite Population
Existent Population
Hypothetical Population
Let us discuss all the types one by one.
Finite Population
The finite population is also known as a countable population in which the population can be
counted. In other words, it is defined as the population of all the individuals or objects that are
finite. For statistical analysis, the finite population is more advantageous than the infinite
population. Examples of finite populations are employees of a company, potential consumer in a
market.
Infinite Population
The infinite population is also known as an uncountable population in which the counting of
units in the population is not possible. Example of an infinite population is the number of germs
in the patient’s body is uncountable.
Existent Population
The existing population is defined as the population of concrete individuals. In other words, the
population whose unit is available in solid form is known as existent population. Examples are
books, students etc.
Hypothetical Population
The population in which whose unit is not available in solid form is known as the hypothetical
population. A population consists of sets of observations, objects etc that are all something in
common. In some situations, the populations are only hypothetical. Examples are an outcome of
rolling the dice, the outcome of tossing a coin.
Sample
It includes one or more observations that are drawn from the population and the measurable
characteristic of a sample is a statistic. Sampling is the process of selecting the sample from the
population. For example, some people living in India is the sample of the population.
Basically, there are two types of sampling. They are:

Probability sampling
Non-probability
sampling Probability
Sampling
In probability sampling, the population units cannot be selected at the discretion of the
researcher. This can be dealt with following certain procedures which will ensure that every unit
of the population consists of one fixed probability being included in the sample. Such a method
is also called random sampling. Some of the techniques used for probability sampling are:

Simple random sampling


Cluster sampling
Stratified Sampling
Disproportionate
sampling Proportionate
sampling
Optimum allocation stratified
sampling Multi-stage sampling
Non Probability Sampling
In non-probability sampling, the population units can be selected at the discretion of the
researcher. Those samples will use the human judgements for selecting units and has no
theoretical basis for estimating the characteristics of the population. Some of the techniques used
for non-probability sampling are

Quota sampling
Judgement sampling
Purposive sampling
Population and Sample Examples
All the people who have the ID proofs is the population and a group of people who only have
voter id with them is the sample.
All the students in the class are population whereas the top 10 students in the class are the
sample.
All the members of the parliament is population and the female candidates present there is the
sample.
Population and Sample Formulas
We will demonstrate here the formulas for mean absolute deviation (MAD), variance and
standard deviation based on population and given sample. Suppose n denotes the size of the
population and n-1 denotes the sample size, then the formulas for mean absolute deviation,
variance and standard deviation are given by;
 3.2SAMPLES
1. What is the Two-Sample t-test? The two-sample t-test is a hypothesis test that compares the
means of two independent groups to determine if they are statistically different. It is
specifically used when the data follow a normal distribution, and the variances of the two
groups are assumed to be equal.
2. Null and Alternative Hypotheses: Before conducting a two-sample t-test, we must establish
the null hypothesis (H0) and the alternative hypothesis (Ha). The null hypothesis assumes
that there is no significant difference between the means of the two groups, while the
alternative hypothesis suggests the presence of a significant difference.
3. Assumptions of the Two-Sample t-test: To ensure accurate results, the two-sample t-test
relies on several assumptions: a) The data in each group are independent and randomly
sampled. b) The data in each group follow a normal distribution. c) The variances of the two
groups are equal.
4. Calculating the Test Statistic: The test statistic for the two-sample t-test is calculated using
the formula:
t = (x1 — x2) / √((s1² / n1) + (s2² / n2))
where x1 and x2 are the sample means, s1 and s2 are the sample standard deviations, and n1 and
n2 are the sample sizes of the two groups.
Degrees of Freedom and Critical Value: The degrees of freedom (df) for the two-sample t-test is
calculated using the formula:
df = n1 + n2–2
The critical value is determined based on the desired significance level (e.g., 0.05) and the
degrees of freedom. If the test statistic exceeds the critical value, we reject the null
hypothesis. Interpreting the Results: If the p-value associated with the two-sample t-test is
less than the chosen significance level, typically 0.05, we reject the null hypothesis and
conclude that there is a significant difference between the means of the two groups.
Conversely, if the p-value is greater than the significance level, we fail to reject the null
hypothesis, indicating no significant difference.
Practical Applications in Data Science: The two-sample t-test finds applications in various
data science scenarios, including:
a) A/B testing: Comparing the performance of two different versions of a website or
application.
b) Market research: Analyzing the preferences of two different customer segments.
c) Medical research: Comparing the effectiveness of two treatment groups.
d) Quality control: Assessing if changes in manufacturing processes lead to significant
differences in product quality.
Python Implementation: Here’s an example implementation of the two-sample t-test in Python
using the SciPy library:
import scipy.stats as stats

# Sample data for two groups


group1 = [15, 18, 22, 20, 25]
group2 = [10, 12, 14, 16, 18]

# Perform the two-sample t-test


t_statistic, p_value = stats.ttest_ind(group1, group2)

# Print the results


print("T-Statistic:", t_statistic)
print("P-Value:", p_value)

In this example, we have two groups, group1 and group2, with sample data representing some
metric of interest. The ttest_ind() function from the SciPy library is used to calculate the two-
sample t-test. It takes the two groups as input and returns the t-statistic and p-value.
The t-statistic measures the difference between the means of the two groups relative to the
variation within each group. A larger absolute t-statistic suggests a larger difference between
the group means. The p-value indicates the probability of obtaining a difference as extreme
as observed (or more extreme) if the null hypothesis is true. A smaller p-value indicates
stronger evidence against the null hypothesis.
By executing this code, you will obtain the t-statistic and p-value, which you can interpret to
draw conclusions about the significance of the difference between the means of the two
groups.
Conclusion: The two-sample t-test is a fundamental statistical tool in data science that allows
us to compare the means of two independent groups. By following the correct procedures
and interpreting the results appropriately, data scientists can make informed decisions based
on statistically significant differences.
Understanding the applications and limitations of the two-sample t-test empowers data
scientists to draw reliable conclusions from their analyses and contribute to evidence-based
decision-making processes. The Python implementation using the SciPy library provides a
practical way to perform the two-sample t-test and obtain the necessary statistics for further
analysis.
In summary, the two-sample t-test serves as a valuable tool for data scientists in various
domains, enabling them to gain insights and make data-driven decisions confidently.

 3.3 RANDOM SAMPLING – SAMPLING DISTRIBUTION

Random sampling, or probability sampling, is a sampling method that allows for the
randomization of sample selection, i.e., each sample has the same probability as other samples to
be selected to serve as a representation of an entire population.

Types of Random Sampling Methods


There are four primary, random (probability) sampling methods. These methods are:

1. Simple random sampling


Simple random sampling is the randomized selection of a small segment of individuals or
members from a whole population. It provides each individual or member of a population with
an equal and fair probability of being chosen. The simple random sampling method is one of the
most convenient and simple sample selection techniques.

2. Systematic sampling
Systematic sampling is the selection of specific individuals or members from an entire
population. The selection often follows a predetermined interval (k). The systematic sampling
method is comparable to the simple random sampling method; however, it is less complicated to
conduct.

3. Stratified sampling
Stratified sampling, which includes the partitioning of a population into subclasses with notable
distinctions and variances. The stratified sampling method is useful, as it allows the researcher to
make more reliable and informed conclusions by confirming that each respective subclass has
been adequately represented in the selected sample.

4. Cluster sampling
Cluster sampling, which, similar to the stratified sampling method, includes dividing a
population into subclasses. Each of the subclasses should portray comparable characteristics to
the entire selected sample. This method entails the random selection of a whole subclass, as
opposed to the sampling of members from each subclass. This method is ideal for studies that
involve widely spread populations.

A simple random sample is a randomly selected subset of a population. In this sampling method,
each member of the population has an exactly equal chance of being selected.

This method is the most straightforward of all the probability sampling methods, since it only
involves a single random selection and requires little advance knowledge about the population.
Because it uses randomization, any research performed on this sample should have high internal
and external validity, and be at a lower risk for research biases like sampling bias and selection
bias.
When to use simple random sampling
Simple random sampling is used to make statistical inferences about a population. It helps ensure
high internal validity: randomization is the best method to reduce the impact of potential
confounding variables.
In addition, with a large enough sample size, a simple random sample has high external validity:
it represents the characteristics of the larger population.

However, simple random sampling can be challenging to implement in practice. To use this
method, there are some prerequisites:
You have a complete list of every member of the population.
You can contact or access each member of the population if they are selected.
You have the time and resources to collect data from the necessary sample size.
Simple random sampling works best if you have a lot of time and resources to conduct your
study, or if you are studying a limited population that can easily be sampled.
In some cases, it might be more appropriate to use a different type of probability sampling:

Systematic sampling involves choosing your sample based on a regular interval, rather than a
fully random selection. It can also be used when you don’t have a complete list of the
population. Stratified sampling is appropriate when you want to ensure that specific
characteristics are proportionally represented in the sample. You split your population into strata
(for example, divided by gender or race), and then randomly select from each of these
subgroups.
Cluster sampling is appropriate when you are unable to sample from the entire population. You
divide the sample into clusters that approximately reflect the whole population, and then choose
your sample from a random selection of these clusters.

How to perform simple random sampling


There are 4 key steps to select a simple random sample.

Step 1: Define the population


Start by deciding on the population that you want to study.

It’s important to ensure that you have access to every individual member of the population, so
that you can collect data from all those who are selected for the sample.

Example: Population
In the American Community Survey, the population is all 128 million households who live in the
United States (including households made up of citizens and non-citizens alike).
Step 2: Decide on the sample size
Next, you need to decide how large your sample size will be. Although larger samples provide
more statistical certainty, they also cost more and require far more work.

There are several potential ways to decide upon the size of your sample, but one of the simplest
involves using a formula with your desired confidence interval and confidence level, estimated
size of the population you are working with, and the standard deviation of whatever you want to
measure in your population.

The most common confidence interval and levels used are 0.05 and 0.95, respectively. Since you
may not know the standard deviation of the population you are studying, you should choose a
number high enough to account for a variety of possibilities (such as 0.5).

You can then use a sample size calculator to estimate the necessary sample

size. Example: Sample size


The ACS follows 3.5 million households each year. This is a small fraction of the overall
population of 128 million households, but it is a large enough sample size to gather detailed data
on all geographical regions and demographic groups in the United States, including those usually
underrepresented in surveys.
Step 3: Randomly select your sample
This can be done in one of two ways: the lottery or random number method.

In the lottery method, you choose the sample at random by “drawing from a hat” or by using a
computer program that will simulate the same action.

In the random number method, you assign every individual a number. By using a random
number generator or random number tables, you then randomly pick a subset of the population.
You can also use the random number function (RAND) in Microsoft Excel to generate random
numbers.
Example: Random selection
The Census Bureau randomly selects addresses of 295,000 households monthly (or 3.5 million
per year). Each address has approximately a 1-in-480 chance of being selected.
Step 4: Collect data from your sample
Finally, you should collect data from your sample.

To ensure the validity of your findings, you need to make sure every individual selected actually
participates in your study. If some drop out or do not participate for reasons associated with the
question that you’re studying, this could bias your findings.

For example, if young participants are systematically less likely to participate in your study, your
findings might not be valid due to the underrepresentation of this group.

Example: Data collection


The Census Bureau first sends a letter to ask the respondents to fill the survey out online. If
occupants of an address do not respond, the Bureau calls the home telephone number. If all else
fails, a representative visits the address in person.
Through this variety of methods, the officials collecting data for the ACS manage to receive
responses from 95% of those randomly selected, a high response rate that supports the validity of
their results.

Probability (Random) Sampling vs. Non-Probability Sampling


Probability – or random sampling – is the random selection of sample participants to derive
conclusions and assumptions about an entire population. On the other hand, non-probability
sampling is the selection of sample participants based on specified criteria or suitability.

SAMPLING DISTRIBUTION

A sampling distribution is a concept used in statistics. It is a probability distribution of a statistic


obtained from a larger number of samples drawn from a specific population. The sampling
distribution of a given population is the distribution of frequencies of a range of different
outcomes that could possibly occur for a statistic of a population. This allows entities like
governments and businesses to make more well-informed decisions based on the information
they gather. There are a few methods of sampling distribution used by researchers, including the
sampling distribution of a mean.

How Sampling Distributions Work


Data allows statisticians, researchers, marketers, analysts, and academics to make important
conclusions about specific topics and information. It can help businesses make decisions about
their future and boost their performance, or it can help governments plan for services needed by
a group of people.

A lot of data drawn and used are actually samples rather than populations. A sample is a subset
of a population. Put simply, a sample is a smaller part of a larger group. As such, this smaller
portion is meant to be representative of the population as a whole.
Sampling distributions (or the distribution of data) are statistical metrics that determine whether
an event or certain outcome will take place. This distribution depends on a few different factors,
including the sample size, the sampling process involved, and the population as a whole. There
are a few steps involved with sampling distribution. These include:

Choosing a random sample from the overall population


Determine a certain statistic from that group, which could be the standard deviation, median, or
mean
Establishing a frequency distribution of each sample
Mapping out the distribution on a graph

Once the information is gathered, plotted, and analyzed, researchers can make inferences and
conclusions. This can help them make decisions about what to expect in the future. For instance,
governments may be able to invest in infrastructure projects based on the needs of a certain
community or a company may decide to proceed with a new business venture if the sampling
distribution suggests a positive outcome.

Special Considerations
The number of observations in a population, the number of observations in a sample, and the
procedure used to draw the sample sets determine the variability of a sampling distribution. The
standard deviation of a sampling distribution is called the standard error.

While the mean of a sampling distribution is equal to the mean of the population, the standard
error depends on the standard deviation of the population, the size of the population, and the size
of the sample.
Knowing how spread apart the mean of each of the sample sets are from each other and from the
population mean will give an indication of how close the sample mean is to the population mean.
The standard error of the sampling distribution decreases as the sample size increases.

Determining a Sampling Distribution


Let's say a medical researcher wants to compare the average weight of all babies born in North
America from 1995 to 2005 to those from South America within the same time period. Since
they cannot draw the data for the entire population within a reasonable amount of time, they
would only use 100 babies in each continent to make a conclusion. The data used is the sample
and the average weight calculated is the sample mean.

Now suppose they take repeated random samples from the general population and compute the
sample mean for each sample group instead. So, for North America, they pull data for 100
newborn weights recorded in the U.S., Canada, and Mexico as follows:

Four 100 samples from select hospitals in the U.S.


Five 70 samples from Canada
Three 150 records from Mexico
The researcher ends up with a total of 1,200 weights of newborn babies grouped in 12 sets. They
also collect sample data of 100 birth weights from each of the 12 countries in South America.
The average weight computed for each sample set is the sampling distribution of the mean. Not
just the mean can be calculated from a sample. Other statistics, such as the standard deviation,
variance, proportion, and range can be calculated from sample data. The standard deviation and
variance measure the variability of the sampling distribution.
Types of Sampling Distributions
Here is a brief description of the types of sampling distributions:

Sampling Distribution of the Mean: This method shows a normal distribution where the middle
is the mean of the sampling distribution. As such, it represents the mean of the overall
population. In order to get to this point, the researcher must figure out the mean of each sample
group and map out the individual data.
Sampling Distribution of Proportion: This method involves choosing a sample set from the
overall population to get the proportion of the sample. The mean of the proportions ends up
becoming the proportions of the larger group.
T-Distribution: This type of sampling distribution is common in cases of small sample sizes. It
may also be used when there is very little information about the entire population. T-distributions
are used to make estimates about the mean and other statistical points.

Plotting Sampling Distributions


A population or one sample set of numbers will have a normal distribution. However, because a
sampling distribution includes multiple sets of observations, it will not necessarily have a bell-
curved shape.

Following our example, the population average weight of babies in North America and in South
America has a normal distribution because some babies will be underweight (below the mean) or
overweight (above the mean), with most babies falling in between (around the mean). If the
average weight of newborns in North America is seven pounds, the sample mean weight in each
of the 12 sets of sample observations recorded for North America will be close to seven pounds
as well.

But if you graph each of the averages calculated in each of the 1,200 sample groups, the resulting
shape may result in a uniform distribution, but it is difficult to predict with certainty what the
actual shape will turn out to be. The more samples the researcher uses from the population of
over a million weight figures, the more the graph will start forming a normal distribution.

Why Is Sampling Used to Gather Population Data?


Sampling is a way to gather and analyze information about a larger group. It is done because
researchers aren't able to study entire populations due to the sheer volume of subjects involved.
As such, not everyone in the larger group can be included as it may take too long to study and
analyze the data. It allows entities like governments and businesses to make important decisions
about the future, whether that means investing in an infrastructure project, social service
program, or new product.

Why Are Sampling Distributions Used?


Sampling distributions are used in statistics and research. They highlight the chance or
probability of an event that may take place. This is based on a set of data that is gathered from a
small group within a larger population.

What Is a Mean?
A mean is a metric used in statistics and research. It is the average for at least two numbers. The
mean may be determined by adding up all the numbers and dividing the result by the number of
numbers in that set. This is known as the arithmetic mean. You can determine the geometric
mean by multiplying the values of a data set and taking the root of the sum equal to the number
of values within that data set.

The Bottom Line


Researchers aren't able to make conclusions about very large groups because of the number of
subjects involved. That's why they use sampling. Sampling allows them to take a small group
from a large population and analyze data. Once that data is collected, researchers can plot out
sampling distributions, which allow them to determine whether an event may take place within a
certain population. This may include business growth or population trends, which can help
businesses, governments, and other entities make better decisions for the future.

A sampling distribution depends three primary factors: the sample size (n), the population as a
whole (N) and the sampling process. So how does it work?
Choose a random sample from the given population.
Calculate a statistic from that group, such as the standard deviation, mean or median.
Construct a frequency distribution of each sample statistic.
Plot the frequency distribution of each sample statistic on a graph. The resulting graph is the
sampling distribution.
For example, if you randomly sample data three times and determine the mean, or the average, of
each sample, all three means are likely to be different and fall somewhere along the graph. That's
variability. You do that many times, and eventually the data you plot may look like a bell curve.
That process is a sampling distribution.

Example of a sampling distribution


Here is an example of a sampling distribution using a fictional scenario with a data set and a
graph:
A professor is interested in understanding the sampling distribution of their students' test scores.
This professor thinks this may help determine a suitable curve for the previous tests their
students completed. The professor recorded test scores from the previous three tests and created
a data table and a sampling distribution graph.
 3.4STANDARD ERROR OF THE MEAN

The standard error of the mean, or simply standard error, indicates how different the
population mean is likely to be from a sample mean. It tells you how much the sample mean
would vary if you were to repeat a study using new samples from within a single population.

The standard error of the mean (SE or SEM) is the most commonly reported type of standard
error. But you can also find the standard error for other statistics, like medians or proportions.
The standard error is a common measure of sampling error—the difference between a
population parameter and a sample statistic.

In statistics, data from samples is used to understand larger populations. Standard error
matters because it helps you estimate how well your sample data represents the whole
population.

With probability sampling, where elements of a sample are randomly selected, you can
collect data that is likely to be representative of the population. However, even with
probability samples, some sampling error will remain. That’s because a sample will never
perfectly match the population it comes from in terms of measures like means and standard
deviations.

By calculating standard error, you can estimate how representative your sample is of your
population and make valid conclusions.

A high standard error shows that sample means are widely spread around the population
mean—your sample may not closely represent your population. A low standard error shows
that sample means are closely distributed around the population mean—your sample is
representative of your population.

You can decrease standard error by increasing sample size. Using a large, random sample is
the best way to minimize sampling bias.

Standard error vs standard deviation


Standard error and standard deviation are both measures of variability:

The standard deviation describes variability within a single sample.


The standard error estimates the variability across multiple samples of a population.
Standard error vs standard deviation

The standard deviation is a descriptive statistic that can be calculated from sample data. In
contrast, the standard error is an inferential statistic that can only be estimated (unless the
real population parameter is known).

Example: Standard error vs standard deviation


In a random sample of 200 students, the mean math SAT score is 550. In this case, the
sample is the 200 students, while the population is all test takers in the region.
The standard deviation of the math scores is 180. This number reflects on average how much
each score differs from the sample mean score of 550.

The standard error of the math scores, on the other hand, tells you how much the sample
mean score of 550 differs from other sample mean scores, in samples of equal size, in the
population of all test takers in the region.

Standard error formula


The standard error of the mean is calculated using the standard deviation and the sample size.

From the formula, you’ll see that the sample size is inversely proportional to the standard
error. This means that the larger the sample, the smaller the standard error, because the
sample statistic will be closer to approaching the population parameter.

Different formulas are used depending on whether the population standard deviation is
known. These formulas work for samples with more than 20 elements (n > 20).

When population parameters are known


When the population standard deviation is known, you can use it in the below formula to
calculate standard error precisely.

Formula

SE is standard error
sigma is population standard deviation
n is the number of elements in the sample
When population parameters are unknown
When the population standard deviation is unknown, you can use the below formula to only
estimate standard error. This formula takes the sample standard deviation as a point estimate
for the population standard deviation.

Formula Explanation
SE = \dfrac{s}{\sqrt{n}}
SE is standard error
s is sample standard deviation
n is the number of elements in the sample
Example: Using the standard error formula
To estimate the standard error for math SAT scores, you follow two steps.
First, find the square root of your sample size (n).

Next, divide the sample standard deviation by the number you found in step one.

Formula Calculation

The standard error of math SAT scores is 12.8.

How should you report the standard error?


You can report the standard error alongside the mean or in a confidence interval to
communicate the uncertainty around the mean.

Example: Reporting the mean and standard error


The mean math SAT score of a random sample of test takers is 550 ± 12.8 (SE).
The best way to report the standard error is in a confidence interval because readers
won’t have to do any additional math to come up with a meaningful interval.

A confidence interval is a range of values where an unknown population parameter is


expected to lie most of the time, if you were to repeat your study with new random samples.

With a 95% confidence level, 95% of all sample means will be expected to lie within a
confidence interval of ± 1.96 standard errors of the sample mean.

Based on random sampling, the true population parameter is also estimated to lie within this
range with 95% confidence.

Example: Constructing a 95% confidence interval


You construct a 95% confidence interval (CI) to estimate the population mean math SAT
score.
For a normally distributed characteristic, like SAT scores, 95% of all sample means fall
within roughly 4 standard errors of the sample mean.
Confidence interval formula
CI = x̄ ± (1.96 × SE)

x̄ = sample mean = 550


SE = standard error = 12.8

Lower limit Upper limit


x̄ − (1.96 × SE)

550 − (1.96 × 12.8) = 525

x̄ + (1.96 × SE)

550 + (1.96 × 12.8) = 575

With random sampling, a 95% CI [525 575] tells you that there is a 0.95 probability that the
population mean math SAT score is between 525 and 575.

 3.5 HYPOTHESIS TESTING

Hypothesis testing is the detective work of statistics, where evidence is scrutinized to determine
the truth behind claims. From unraveling mysteries in science to guiding decisions in business,
this method empowers researchers to make sense of data and draw reliable conclusions. In this
article, we’ll explore the fascinating world of hypothesis testing, uncovering its importance and
practical applications in data analytics.

In this comprehensive guide, we will be learning the theory and types of hypothesis testing.
Additionally, we will be taking sample problem statements and solving them step-by-step using
hypothesis testing. We will be using Python as the programming language.

Hypothesis Testing in data science


Learning Objectives
Understand what hypothesis testing is and when to use it.
Get familiar with various terminologies used in hypothesis testing.
Learn the steps of hypothesis testing and how to apply it to various problems.
Learn about decision rules and confusion matrix in hypothesis testing.
Differentiate between different types of hypothesis tests.
This article was published as a part of the Data Science Blogathon!

What is Hypothesis Testing and When Do We Use It?


Hypothesis testing is a statistical method used to evaluate a claim or hypothesis about a
population parameter based on sample data. It involves making decisions about the validity of a
statement, often referred to as the null hypothesis, by assessing the likelihood of observing the
sample data if the null hypothesis were true.
This process helps researchers determine whether there is enough evidence to support or reject
the null hypothesis, thereby drawing conclusions about the population of interest. In essence,
hypothesis testing provides a structured approach for making inferences and decisions in the face
of uncertainty, playing a crucial role in scientific research, data analysis, and decision-making
across various domains.

Hypothesis testing is a part of statistical analysis and machine learning, where we test the
assumptions made regarding a population parameter.

We use hypothesis testing in various scenarios, including:

Scientific research: Testing the effectiveness of a new drug, evaluating the impact of a treatment
on patient outcomes, or examining the relationship between variables in a study.
Quality control: Assessing whether a manufacturing process meets specified standards or
determining if a product’s performance meets expectations.
Business decision-making: Investigating the effectiveness of marketing strategies, analyzing
customer preferences, or testing hypotheses about financial performance.
Social sciences: Studying the effects of interventions on societal outcomes, examining attitudes
and behaviors, or testing theories about human behavior.
Note: Don’t be confused between the terms Parameter and Satistic.
A Parameter is a number that describes the data from the population whereas, a Statistic is a
number that describes the data from a sample.

Before moving any further, it is important to know the terminology used.

Terminology Used in Hypothesis Testing


In hypothesis testing, several key terms and concepts are commonly used to describe the process
and interpret results:

1. Null Hypothesis (H0): Null hypothesis is a statistical theory that suggests there is no statistical
significance exists between the populations. It is denoted by H0 and read as H-naught.

2. Alternative Hypothesis (Ha or H1): An Alternative hypothesis suggests there is a significant


difference between the population parameters. It could be greater or smaller. Basically, it is the
contrast of the Null Hypothesis. It is denoted by Ha or H1.

Note: H0 must always contain equality(=). Ha always contains difference(≠, >, <).

For example, if we were to test the equality of average means (µ) of two groups:
for a two-tailed test, we define H0: µ1 = µ2 and Ha: µ1≠µ2
for a one-tailed test, we define H0: µ1 = µ2 and Ha: µ1 > µ2 or Ha: µ1 < µ2

3. Test Statistic: It is denoted by t and is dependent on the test that we run. It is the deciding
factor to reject or accept the Null Hypothesis. The four main test statistics are given in the below
table:
4. Significance Level (α): The significance level, often denoted by α (alpha), represents the
probability of rejecting the null hypothesis when it is actually true. Commonly used significance
levels include 0.05 and 0.01, indicating a 5% and 1% chance of Type I error, respectively.

5. P-value: It is the proportion of samples (assuming the Null Hypothesis is true) that would be
as extreme as the test statistic. It is denoted by the letter p.

6. Critical Value: Denoted by C and it is a value in the distribution beyond which leads to the
rejection of the Null Hypothesis. It is compared to the test statistic.

Now, assume we are running a two-tailed Z-Test at 95% confidence. Then, the level of significance
(α) = 5% = 0.05. Thus, we will have (1-α) = 0.95 proportion of data at the center, and α = 0.05
proportion will be equally shared to the two tails. Each tail will have (α/2) = 0.025 proportion of
data.

The critical value i.e., Z is 95% or Z=α/2 = 1.96 is calculated from the Z-scores table.

Steps of Hypothesis Testing


The steps of hypothesis testing typically involve the following process:

Formulate Hypotheses: State the null hypothesis and the alternative hypothesis.
Choose Significance Level (α): Select a significance level (α), which determines the threshold for
rejecting the null hypothesis. Commonly used significance levels include 0.05 and 0.01.
Select Appropriate Test: Choose a statistical test based on the research question, type of data, and
assumptions. Common tests include t-tests, chi-square tests, ANOVA, correlation tests, and
regression analysis, among others.
Collect Data and Calculate Test Statistic: Collect relevant sample data and calculate the appropriate
test statistic based on the chosen statistical test.
Determine Critical Region: Define the critical region or rejection region based on the chosen
significance level and the distribution of the test statistic.
Calculate P-value: Determine the probability of observing a test statistic as extreme as, or more
extreme than, the one obtained from the sample data, assuming the null hypothesis is true. The p-
value is compared to the significance level to make decisions about the null hypothesis.
Make Decision: If the p-value is less than or equal to the significance level (p ≤ α), reject the null
hypothesis in favor of the alternative hypothesis. If the p-value is greater than the significance
level (p > α), fail to reject the null hypothesis.
Draw Conclusion: Interpret the results based on the decision made in step 7. Provide implications of
the findings in the context of the research question or problem.
Check Assumptions and Validate Results: Assess whether the assumptions of the chosen statistical
test are met. Validate the results by considering the reliability of the data and the appropriateness
of the statistical analysis.
By following these steps systematically, researchers can conduct hypothesis tests, evaluate the
evidence, and draw valid conclusions from their analyses.

Decision Rules
The two methods of concluding the Hypothesis test are using the Test-statistic value and p-value.
In both methods, we start assuming the Null Hypothesis to be true, and then we reject the Null
hypothesis if we find enough evidence.

Confusion Matrix in Hypothesis Testing


To plot a confusion matrix, we can take actual values in columns and predicted values in rows or
vice versa.

Confidence: The probability of accepting a True Null Hypothesis. It is denoted as (1-α)

Power of test: The probability of rejecting a False Null Hypothesis i.e., the ability of the test to
detect a difference. It is denoted as (1-β) and its value lies between 0 and 1.

Type I error: Occurs when we reject a True Null Hypothesis and is denoted as α.

Type II error: Occurs when we accept a False Null Hypothesis and is denoted as β.

Accuracy: Number of correct predictions / Total number of cases

When dealing with continuous data, several common hypothesis tests are used, depending on the
research question and the characteristics of the data. Some of the most widely used hypothesis
tests for continuous data include:

One-Sample t-test: Used to compare the mean of a single sample to a known value or
hypothesized population mean.
Paired t-test: Compares the means of two related groups (e.g., before and after treatment) to
determine if there is a significant difference.
Independent Samples t-test: Compares the means of two independent groups to determine if
there is a significant difference between them.
Analysis of Variance (ANOVA): Used to compare means across three or more independent
groups to determine if there are any statistically significant differences.
Correlation Test (Pearson’s correlation coefficient): Determines if there is a linear relationship
between two continuous variables.
Regression Analysis: Evaluates the relationship between one dependent variable and one or more
independent variables.

When dealing with discrete data, several common hypothesis tests are used to analyze
differences between groups, associations, or proportions. Some of the most widely used
hypothesis tests for discrete data include:

Chi-Square Test of Independence: Determines whether there is a significant association between


two categorical variables by comparing observed frequencies to expected frequencies.
Chi-Square Goodness-of-Fit Test: Assesses whether the observed frequency distribution of a
single categorical variable differs significantly from a hypothesized or expected distribution.
Binomial Test: Determines whether the proportion of successes in a series of independent
Bernoulli trials differs significantly from a hypothesized value.
Poisson Test: Tests whether the observed counts of events in a fixed interval of time or space
follow a Poisson distribution, often used in count data analysis.
McNemar’s Test: Analyzes changes or differences in paired categorical data, typically used in
before-and-after studies or matched case-control studies.
Fisher’s Exact Test: Determines the significance of the association between two categorical
variables in small sample sizes when the assumptions of the chi-square test are not met.
These tests are valuable tools for analyzing categorical data, identifying relationships between
variables, and making inferences about populations based on sample data. The choice of test
depends on the research question, the nature of the data, and the study design.

Types of Errors in Hypothesis Testing


In hypothesis testing, there are two main types of errors:

Type I error (False Positive): This happens when one incorrectly rejects the null hypothesis,
indicating a significant result when no true effect or difference exists in the population being
studied.
Type II error (False Negative): This occurs when one fails to reject the null hypothesis despite
the presence of a true effect or difference in the population.
These errors represent the trade-off between making incorrect conclusions and the risk of
missing important findings in hypothesis testing.

 3.6 Z-TEST – Z-TEST PROCEDURE

The Z-test is a statistical hypothesis test used to determine where the distribution of the test
statistic we are measuring, like the mean, is part of the normal distribution.

While there are multiple types of Z-tests, we’ll focus on the easiest and most well-known one,
the one-sample mean test. This is used to determine if the difference between the mean of a
sample and the mean of a population is statistically significant.

What Is a Z-Test?
A Z-test determines whether there are any statistically significant differences between the means
of two populations. A Z-test can only be applied if the standard deviation of each population is
known and a sample size of at least 30 data points is available.

The name Z-test comes from the Z-score of the normal distribution. This is a measure of how
many standard deviations away a raw score or sample statistics is from the population’s mean. Z-
tests are the most common statistical tests conducted in fields such as healthcare and data
science, making them essential to understand.
A normal distribution drawn on a napkin.
Image: Shutterstock / Built In
Brand Studio Logo
UPDATED BY
Matthew Urwin | Oct 16, 2024
The Z-test is a statistical hypothesis test used to determine where the distribution of the test
statistic we are measuring, like the mean, is part of the normal distribution.

While there are multiple types of Z-tests, we’ll focus on the easiest and most well-known one,
the one-sample mean test. This is used to determine if the difference between the mean of a
sample and the mean of a population is statistically significant.

What Is a Z-Test?
A Z-test determines whether there are any statistically significant differences between the means
of two populations. A Z-test can only be applied if the standard deviation of each population is
known and a sample size of at least 30 data points is available.

The name Z-test comes from the Z-score of the normal distribution. This is a measure of how
many standard deviations away a raw score or sample statistics is from the population’s mean. Z-
tests are the most common statistical tests conducted in fields such as healthcare and data
science, making them essential to understand.

In order to conduct a Z-test, your statistics need to meet a few requirements:

A sample size that’s greater than 30. This is because we want to ensure our sample mean comes
from a distribution that is normal. As stated by the central limit theorem, any distribution can be
approximated as normally distributed if it contains more than 30 data points.
The standard deviation and mean of the population is known.
The sample data is collected/acquired randomly.
The Z-test is a statistical hypothesis test used to determine where the distribution of the test
statistic we are measuring, like the mean, is part of the normal distribution.
While there are multiple types of Z-tests, we’ll focus on the easiest and most well-known one,
the one-sample mean test. This is used to determine if the difference between the mean of a
sample and the mean of a population is statistically significant.
What Is a Z-Test?
A Z-test determines whether there are any statistically significant differences between the means
of two populations. A Z-test can only be applied if the standard deviation of each population is
known and a sample size of at least 30 data points is available.
The name Z-test comes from the Z-score of the normal distribution. This is a measure of how
many standard deviations away a raw score or sample statistics is from the population’s mean. Z-
tests are the most common statistical tests conducted in fields such as healthcare and data
science, making them essential to understand.
Requirements for a Z-Test
In order to conduct a Z-test, your statistics need to meet a few requirements:
 A sample size that’s greater than 30. This is because we want to ensure our sample mean
comes from a distribution that is normal. As stated by the central limit theorem, any
distribution can be approximated as normally distributed if it contains more than 30 data
points.
 The standard deviation and mean of the population is known.
 The sample data is collected/acquired randomly.
Z-Test Steps
There are four steps to complete a Z-test. Let’s examine each one:
1. State the Null Hypothesis
The first step in a Z-test is to state the null hypothesis, H_0. This is what you believe to be true
from the population, which could be the mean of the population, μ_0:

2. State the Alternate Hypothesis


Next, state the alternate hypothesis, H_1. This is what you observe from your sample. If the
sample mean is different from the population’s mean, then we say the mean is not equal to μ_0:

3. Choose Your Critical Value


Then, choose your critical value, α, which determines whether you accept or reject the null
hypothesis. Typically, for a Z-test we would use a statistical significance of 5 percent which is z
= +/- 1.96 standard deviations from the population’s mean in the normal distribution:

Z-test critical value plot


This critical value is based on confidence intervals.
4. Calculate Your Z-Test Statistic
Compute the Z-test statistic using the sample mean, μ_1, the population mean, μ_0, the number
of data points in the sample, n and the population’s standard deviation, σ:

If the test statistic is greater (or lower depending on the test we are conducting) than the critical
value, then the alternate hypothesis is true because the sample’s mean is statistically
significant enough from the population mean.
Another way to think about this is if the sample mean is so far away from the population mean,
the alternate hypothesis has to be true or the sample is a complete anomaly.
 3.7DECISION RULE – CALCULATIONS – DECISIONS

The decision to either reject or not to reject a null hypothesis is guided by the distribution the
test statistic assumes. This means that if the variable involved follows a normal distribution,
we use the level of significance of the test to come up with critical values that lie along the
standard normal distribution.
Note that before one makes a decision to reject or not to reject a null hypothesis, one must
consider whether the test should be one-tailed or two-tailed. This is because the number of
tails determines the value of α (significance level). The following is a summary of the
decision rules under different scenarios.
Left One-tailed Test
H1: Parameter < X
Decision rule: Reject H0 if the test statistic is less than the critical value. Otherwise, do not
reject H0.

Right One-tailed Test


H1: Parameter > X
Decision rule: Reject H0 if the test statistic is greater than the critical value. Otherwise, do not
reject H0.

Two-tailed Test
H1: Parameter ≠ X (not equal to X)
Decision rule: Reject H0 if the test statistic is greater than the upper critical value or less than
the lower critical value.

Critical values link confidence intervals to hypothesis tests. For example, to construct a 95%
confidence interval assuming a normal distribution, we would need to determine the critical
values that correspond to a 5% significance level. Similarly, if we were to conduct a test of
some given hypothesis at the 5% significance level, we would use the same critical values
used for the confidence interval to subdivide the distribution space into rejection and non-
rejection regions.
Example: Hypothesis Testing
A survey carried out using a sample of 50 Level I candidates reveals an average IQ of 100.
Assuming that IQs are distributed normally, carry out a statistical test to determine whether
the mean IQ is greater than 105. You are instructed to use a 5% level of significance.
(Previous studies give a standard deviation of IQs of approximately 20.)
Solution
First, state the hypothesis:
H0: μ = 105 vs H1: μ > 105
Since IQs follow a normal distribution, under H0,(X′–100)(σ√n)∼N(0,1)H0,(X
′– 100)(σn)∼N(0,1).
Next, we compute the test statistic, which is (105–100)(20√50)=1.768(105–
100)(2050)=1.768.
This is a right one-tailed test, and IQs are distributed normally. Therefore, we should
compare our test statistic to the upper 5% point of the normal distribution.
From the normal distribution table, this value is 1.6449. Since 1.768 is greater than 1.6449,
we have sufficient evidence to reject the H0 at the 5% significance level. Therefore, it
is reasonable to conclude that the mean IQ of CFA candidates is greater than 100.
Statistical Significance vs. Economic Significance
Statistical significance refers to the use of a sample to carry out a statistical test meant to
reveal any significant deviation from the stated null hypothesis. We then decide whether to
reject or not reject the null hypothesis.
Economic significance entails the statistical significance and the economic effect inherent in
the decision made after data analysis and testing.
The need to separate statistical significance from economic significance arises because some
statistical results may be significant on paper but not economically meaningful. The
difference from the hypothesized value may carry some statistical weight but lack economic
feasibility, making implementation of the results very unlikely. Perhaps an example can help
you gain a deeper understanding of the two concepts.
Example: Statistical Significance and Economic Feasibility
A well-established pharmaceutical company wishes to assess the effectiveness of a newly
developed drug before commercialization. The company’s board of directors commissions a
pilot test. The drug is administered to a few patients to whom none of the existing drugs has
been prescribed. A statistical test follows and reveals a significant decrease in the average
number of days taken before full recovery. The company considers the evidence sufficient to
conclude that the new drug is more effective than existing alternatives.
However, the production of the new drug is significantly more expensive because of the
scarcity of the active ingredient. Furthermore, the company would have to engage in a year-
long lobbying exercise to convince the Food and Drug Administration and the general public
that the drug is indeed an improvement to the existing brands. At the end of the day, the
management decides to delay the commercialization of the drug because of the higher
production and introduction costs.
Other factors that may affect the economic feasibility of statistical results include:
 Tax: Financial institutions generally avoid projects that may increase the tax payable.
 Shareholders: They are often trying to increase returns on investment from one year to
the next, not taking into account the long run because their investment horizon is too
short.
 Risk: We may have a statistically significant project that is too risky. Projects that are
capital intensive are, in the long term, particularly, very risky. In fact, the additional risk
is excluded from statistical tests.
Evidence of returns based solely on statistical analysis may not be enough to guarantee the
implementation of a project. In particular, large samples may produce results that have high
statistical significance but very low applicability.
Question
A.
There is sufficient evidence to justify the rejection of the H0 and inform the conclusion
that the average IQ is greater than 102.
B.
There is insufficient evidence to justify the rejection of the H0 and guide the conclusion
that the average IQ is not more than 102.
C.
There is sufficient evidence to justify the rejection of the H0 and inform the conclusion
that the average IQ is greater than 102.
Solution
The correct answer is B.
Just like in the example above, start with the statement of the
hypothesis; H0: μ = 102 vs. H1: μ > 102
The test statistic is (105–102)(20√50)=1.061(105–102)(2050)=1.061.
Again, this is a right one-tailed test but this time, 1.061 is less than the upper 5% point of a
standard normal distribution (1.6449). Therefore, we do not have sufficient evidence to reject
the H0 at the 5% level of significance. It is, therefore, reasonable to conclude that the average
IQ of CFA candidates is not more than 102.

There are many applications of decision rules in business and finance, including:
 Credit card companies use decision rules to approve credit card applications.
 Retailers use associative rules to understand customers' habits and preferences (market
basket analysis) and apply the finding to launch effective promotions and advertising.
 Banks use decision rules induced from data about bankrupt and non-bankrupt firms to
support credit-granting decisions.
 Telemarketing and direct marketing companies use decision rules to reduce the number
of calls made and increase the ratio of successful calls.
DESCRIBING AND COMPARING INFORMATION ATTRIBUTES
The examples (information) from which decision rules are induced are expressed in terms of
some characteristic attributes. For instance, companies could be described by the following
attributes: sector of activity, localization, number of employees, total assets, profit, and risk
rating. From the viewpoint of conceptual content, attributes can be one of the following
types:
 Qualitative attributes (symbolic,categorical, or nominal), including sector of activity or
localization
 Quantitative attributes, including number of employees or total assets
 Criteria or attributes whose domains are preferentially ordered, including profit, because
a company having large profit is preferred to a company having small profit or even loss

The objects are compared differently depending on the nature of the attributes considered. More
precisely, with respect to qualitative attributes, the objects are compared on the basis of an
indiscernibility relation: two objects are indiscernible if they have the same evaluation with respect to
the considered attributes. The indiscernibility relation is reflexive (i.e., each object is indiscernible
with itself), symmetric (if object A is indiscernible with object B, then object B also is indiscernible
with object A), and transitive (if object A is indiscernible with object B and object B is indiscernible
with object C, then object A also is indiscernible with object C). Therefore, the indiscernibility
relation is an equivalence relation.

For instance, with respect to the attribute “number of employees,” fixing a threshold at 10 percent,
Company A having 2,710 employees is similar to Company B having 3,000 employees. Similarity
relation is reflexive, but neither symmetric nor transitive; the abandon of the transitivity requirement is
easily justifiable, remembering, for example, Luce's paradox of the cups of tea (Luce, 1956). As for the
symmetry, one should notice that the proposition yRx, which means “y is similar to x, ” is directional;
there is a subject y and a referent x, and in general this is not equivalent to the proposition “x is similar to
y. ”
 3.8 INTERPRETATIONS

Data interpretation is conducted in 4 steps:


 Assembling the information you need (like bar graphs and pie charts);
 Developing findings or isolating the most relevant inputs;
 Developing conclusions;
 Coming up with recommendations or actionable solutions.
Considering how these findings dictate the course of action, data analysts must be
accurate with their conclusions and examine the raw data from multiple angles. Different
variables may allude to various problems, so having the ability to backtrack data and
repeat the analysis using different templates is an integral part of a successful business
strategy.
What Should Users Question During Data Interpretation?
To interpret data accurately, users should be aware of potential pitfalls present within this
process. You need to ask yourself if you are mistaking correlation for causation. If two
things occur together, it does not indicate that one caused the other.

Data Interpretation Methods


Data analysts or data analytics tools help people make sense of the numerical data that has been
aggregated, transformed, and displayed. There are two main methods for data interpretation:
quantitative and qualitative.
Qualitative Data Interpretation Method
This is a method for breaking down or analyzing so-called qualitative data, also known as
categorical data. It is important to note that no bar graphs or line charts are used in this method.
Instead, they rely on text. Because qualitative data is collected through person-to-person
techniques, it isn't easy to present using a numerical approach.
Surveys are used to collect data because they allow you to assign numerical values to answers,
making them easier to analyze. If we rely solely on the text, it would be a time-consuming and
error-prone process. This is why it must be transformed.

Quantitative Data Interpretation Method


This data interpretation is applied when we are dealing with quantitative or numerical data. Since
we are dealing with numbers, the values can be displayed in a bar chart or pie chart. There are
two main types: Discrete and Continuous. Moreover, numbers are easier to analyze since they
involve statistical modeling techniques like mean and standard deviation.

Mean is an average value of a particular data set obtained or calculated by dividing the sum of
the values within that data set by the number of values within that same set.
Standard Deviation is a technique is used to ascertain how responses align with or deviate from
the average value or mean. It relies on the meaning to describe the consistency of the replies
within a particular data set. You can use this when calculating the average pay for a certain
profession and then displaying the upper and lower values in the data set.

Benefits Of Data Interpretation


Multiple data interpretation benefits explain its significance within the corporate world, medical
industry, and financial industry:

Informed decision-making. The managing board must examine the data to take action and
implement new methods. This emphasizes the significance of well-analyzed data as well as a
well-structured data collection process.
Anticipating needs and identifying trends. Data analysis provides users with relevant insights
that they can use to forecast trends. It would be based on customer concerns and expectations.

For example, a large number of people are concerned about privacy and the leakage of personal
information. Products that provide greater protection and anonymity are more likely to become
popular.

Clear foresight. Companies that analyze and aggregate data better understand their own
performance and how consumers perceive them. This provides them with a better understanding
of their shortcomings, allowing them to work on solutions that will significantly improve their
performance.

 3.9 ONE-TAILED AND TWO-TAILED TESTS

In Statistics hypothesistesting, we need to judge whether it is a one-tailed or a two-tailed test so


that we can find the critical values in tables such as Standard Normal z Distribution Table and t
Distribution Table. And then, by comparing test statistic value with the critical value or whether
statistic value falls in the critical region, we make a conclusion either to reject the null hypothesis
or to fail to reject the null hypothesis.
How can we tell whether it is a one-tailed or a two-tailed test? It depends on the original claim in
the question. A one-tailed test looks for an “increase” or “decrease” in the parameter whereas a
two-tailed test looks for a “change” (could be increase or decrease) in the parameter.
Therefore, if we see words such as “increased, greater, larger, improved and so on”, or
“decreased, less, smaller and so on” in the original claim of a question (>,<are used in H1), a
one-tail testis applied. If words such as “change, the same, different/difference and so on” are
used in the claim of the question (≠is used in H1), a two-tailed testis applied.
In a one-tailed test, the critical region has just one part (the green area below). It can be a
lefttailed test or a right-tailed test. Left-tailed test: The critical region is in the extreme left region
(tail) under the curve.Right-tailed test: The critical region is in the extreme right region (tail)
under the curve.
One-tailed Tests
A one-tailed test may be either left-tailed or right-tailed.
A left-tailed test is used when the alternative hypothesis states that the true value of the
parameter specified in the null hypothesis is less than the null hypothesis claims.
A right-tailed test is used when the alternative hypothesis states that the true value of the
parameter specified in the null hypothesis is greater than the null hypothesis claims
Two-tailed Tests
The main difference between one-tailed and two-tailed tests is that one-tailed tests will only have
one critical region whereas two-tailed tests will have two critical regions. If we require
a 100(1−α)100(1−α)% confidence interval we have to make some adjustments when using a two-
tailed test.
The confidence interval must remain a constant size, so if we are performing a two-tailed test, as
there are twice as many critical regions then these critical regions must be half the size. This
means that when we read the tables, when performing a two-tailed test, we need to
consider α2α2 rather than αα.
Example
A light bulb manufacturer claims that its' energy saving light bulbs last an average of 60 days.
Set up a hypothesis test to check this claim and comment on what sort of test we need to use.
Solution
So we
have
H0H0 : The mean lifetime of an energy-saving light bulb is 6060 days.
H1H1 : The mean lifetime of an energy-saving light bulb is not 6060 days.
Because of the “is not” in the alternative hypothesis, we have to consider both the possibility that
the lifetime of the energy-saving light bulb is greater than 6060 and that it is less than 6060. This
means we have to use a two-tailed test.
Example
The manufacturer now decides that it is only interested whether the mean lifetime of an energy-
saving light bulb is less than 60 days. What changes would you make from Example 1?
Solution
So we
have
H0H0 : The mean lifetime of an energy-saving light bulb is 6060 days.
H1H1 : The mean lifetime of an energy-saving light bulb is less than 6060 days.
Now we have a “less than” in the alternative hypothesis. This means that instead of performing a
two-tailed test, we will perform a left-sided one-tailed test.
Example:
The true value of one type of degree or diploma cannot be quantitatively measured, but we can
measure its relative impact on starting salary. Graduates from Quebec universities with a B.A.
or B.Sc. degree have a mean annual starting salary of $28,300. Sixty-five Quebec graduates
with a civil engineering degree are randomly selected. Their starting salaries have a mean of
$36,300. If the standard deviation is $1670, use a 0.01 level of significance to test the claim that
Quebec graduates with a civil engineering degree have a mean starting salary that is greater
thanthe mean for graduates with a B.A. or B.Sc. degree from Quebec.
The underlined part of the question above is the original claim of this question. The words
“greater than” show that it is a one-tailed (right tail) test and the sign used in H1 is “>”. The null
hypothesis and the alternative hypothesis are:
H0: µ = 28,300
H1: µ > 28,300

 3.10ESTIMATION – POINT ESTIMATE

Point estimators are functions that are used to find an approximate value of a population
parameter from random samples of the population. They use the sample data of a population to
calculate a point estimate or a statistic that serves as the best estimate of an unknown parameter
of a population.

Most often, the existing methods of finding the parameters of large populations are unrealistic.
For example, when finding the average age of kids attending kindergarten, it will be impossible
to collect the exact age of every kindergarten kid in the world. Instead, a statistician can use the
point estimator to make an estimate of the population parameter.

Properties of Point Estimators


The following are the main characteristics of point estimators:

1. Bias
The bias of a point estimator is defined as the difference between the expected value of the
estimator and the value of the parameter being estimated. When the estimated value of the
parameter and the value of the parameter being estimated are equal, the estimator is considered
unbiased.

Also, the closer the expected value of a parameter is to the value of the parameter being
measured, the lesser the bias is.

2. Consistency
Consistency tells us how close the point estimator stays to the value of the parameter as it
increases in size. The point estimator requires a large sample size for it to be more consistent and
accurate.
we can also check if a point estimator is consistent by looking at its corresponding expected
value and variance. For the point estimator to be consistent, the expected value should move
toward the true value of the parameter.

3. Most efficient or unbiased


The most efficient point estimator is the one with the smallest variance of all the unbiased and
consistent estimators. The variance measures the level of dispersion from the estimate, and the
smallest variance should vary the least from one sample to the other.

Generally, the efficiency of the estimator depends on the distribution of the population. For
example, in a normal distribution, the mean is considered more efficient than the median, but the
same does not apply in asymmetrical distributions.

Point Estimation vs. Interval Estimation


The two main types of estimators in statistics are point estimators and interval estimators. Point
estimation is the opposite of interval estimation. It produces a single value while the latter
produces a range of values.

A point estimator is a statistic used to estimate the value of an unknown parameter of a


population. It uses sample data when calculating a single statistic that will be the best estimate of
the unknown parameter of the population.

On the other hand, interval estimation uses sample data to calculate the interval of the possible
values of an unknown parameter of a population. The interval of the parameter is selected in a
way that it falls within a 95% or higher probability, also known as the confidence interval.

The confidence interval is used to indicate how reliable an estimate is, and it is calculated from
the observed data. The endpoints of the intervals are referred to as the upper and lower
confidence limits.

Common Methods of Finding Point Estimates


The process of point estimation involves utilizing the value of a statistic that is obtained from
sample data to get the best estimate of the corresponding unknown parameter of the population.
Several methods can be used to calculate the point estimators, and each method comes with
different properties.

1. Method of moments
The method of moments of estimating parameters was introduced in 1887 by Russian
mathematician Pafnuty Chebyshev. It starts by taking known facts about a population and then
applying the facts to a sample of the population. The first step is to derive equations that relate
the population moments to the unknown parameters.

The next step is to draw a sample of the population to be used to estimate the population
moments. The equations derived in step one are then solved using the sample mean of the
population moments. This produces the best estimate of the unknown population parameters.
2. Maximum likelihood estimator
The maximum likelihood estimator method of point estimation attempts to find the unknown
parameters that maximize the likelihood function. It takes a known model and uses the values to
compare data sets and find the most suitable match for the data.

For example, a researcher may be interested in knowing the average weight of babies born
prematurely. Since it would be impossible to measure all babies born prematurely in the
population, the researcher can take a sample from one location.

Because the weight of pre-term babies follows a normal distribution, the researcher can use the
maximum likelihood estimator to find the average weight of the entire population of pre-term
babies based on the sample data.

 3.11CONFIDENCE INTERVAL – LEVEL OF CONFIDENCE – EFFECT OF


SAMPLE SIZE
A confidence interval is a type of interval calculation in statistics derived from observed data and
holds the actual value of an unknown parameter. It's linked to the confidence level, which
measures how confident the interval is in estimating the deterministic parameter.
A confidence interval shows the probability that a parameter will fall between a pair of values
around the mean. Confidence intervals show the degree of uncertainty or certainty in a sampling
method. They are constructed using confidence levels of 95% or 99%.

When Do You Use Confidence Intervals?


The size of a 90% confidence interval for a given estimate is one method to gauge how
"excellent" it is; the greater the range, the more care must be used when utilising the estimate.
Confidence intervals serve as a crucial reminder of the estimates' limits.
What Does a 95% Confidence Interval Mean?
The 95% confidence interval is the range that you can be 95% confident that the similarly
constructed intervals will contain the parameter being estimated. The sample mean (center of the
CI) will vary from sample to sample because of natural sampling variability.
Statisticians use confidence intervals to measure the uncertainty in a sample variable. The
confidence is in the method, not in a particular CI. Approximately 95% of the intervals
constructed would capture the true population mean if the sampling method was repeated many
times.
Confidence Interval Formula
The formula to find Confidence Interval is:

 X bar is the sample mean.


 Z is the number of standard deviations from the sample mean.
 S is the standard deviation in the sample.
 n is the size of the sample.
The value after the ± symbol is known as the margin of error.
Question: In a tree, there are hundreds of mangoes. You randomly choose 40 mangoes with a
mean of 80 and a standard deviation of 4.3. Determine that the mangoes are big enough.
Solution:
Mean = 80
Standard deviation = 4.3 Number
of observations = 40
Take the confidence level as 95%. Therefore the value of Z = 1.9 Substituting
the value in the formula, we get
= 80 ± 1.960 × [ 4.3 / √40 ]
= 80 ± 1.960 × [ 4.3 / 6.32]
= 80 ± 1.960 × 0.6803
= 80 ± 1.33
The margin of error is 1.33
All the hundreds of mangoes are likely to be in the range of 78.67 and 81.33.

Calculating A Confidence Interval


Imagine a group of researchers who are trying to decide whether or not the oranges produced on a
certain farm are large enough to be sold to a potential grocery chain. This will serve as an
example of how to compute a confidence interval.

Step 1: Determine the sample size (n).


46 oranges are chosen at random by the researchers from farm trees. Consequently, n

is 46.

Step 2: Determine the samples' means (x).


The researchers next determine the sample's mean weight, which comes out to be 86 grammes. X

= 86.

Step 3: Determine the standard deviation (s).


Although utilising the population-wide standard deviation is ideal, this data is frequently
unavailable to researchers. In this scenario, the

If this is the case, the researchers should apply the sample's determined standard deviation.

Let's assume, for our example, that the researchers have chosen to compute the standard deviation
from their sample. They get a 6.2-gramme standard deviation.

S = 6.2.

Step 4: Determine the confidence interval utilised in step #4.


In ordinary market research studies, 95% and 999% are the most popular selection for
confidence intervals.

For this example, let's assume that the researchers employ a 95 per cent confidence interval.

Step 5: Find the Z value for the chosen confidence interval in step #5.

The researchers would subsequently use the following table to establish their Z value:

Confidence Interval Z

80% 1.282

85% 1.440

90% 1.645

95% 1.960

99% 2.576

99.5% 2.807

99.9% 3.291

Step 6: Calculate the following formula


The next step would be for the researchers to enter their known values into the formula.
Following our example, this formula would look like this:

86 ± 1.960 (6.2/6.782)

This calculation yields a value of 86 1.79, which the researchers use as their confidence interval.

Step 7: Come to a decision.


According to the study's findings, the real mean of the larger population of oranges is probably
(with a 95% confidence level) between 84.21 grammes and 87.79 grammes.

Confidence Interval For Proportions


In newspaper stories during election years, confidence intervals are expressed as proportions or
percentages. For instance, a survey for a specific presidential contender may indicate that they
are within three percentage points of 40% of the vote (if the sample is large enough). The
pollsters would be 95% certain that the actual percentage of voters who supported the candidate
would be between 37% and 43% because election polls are frequently computed with a 95%
confidence level.
Stock market investors are most interested in knowing the actual percentage of equities that rise
and fall each week. The percentage of American households with personal computers is relevant
to companies selling computers. Confidence intervals may be established for the weekly
percentage change in stock prices and the percentage of American homes with personal
computers.

Confidence Interval For Non-Normally Distributed Data


In data analysis, calculating the confidence interval is a typical step that may be easily derived
from populations with normally distributed data using the well-known x (ts)/n formula. The
confidence interval, however, is not always easy to determine when working with data that is not
regularly distributed. There are fewer and far less easily available references for this data in the
literature.

We explain the percentile, bias-corrected, and expedited versions of the bootstrap method for
calculating confidence intervals in plain terms. This approach is suitable for both normal and
non-normal data sets and may be used to calculate a broad range of metrics, including mean,
median, the slope of a calibration curve, etc. As a practical example, the bootstrap method
determines the confidence interval around the median level of cocaine in femoral blood.

You might also like