0% found this document useful (0 votes)
5 views47 pages

Acc117 Chapter5 Report

This document covers inferential statistics, focusing on point and interval estimates, confidence intervals, and hypothesis testing. It explains key concepts such as confidence levels, significance levels, and the use of z and t distributions for different sample sizes. Additionally, it outlines the systematic procedure for hypothesis testing, including formulating null and alternative hypotheses, selecting significance levels, and making decisions based on statistical evidence.

Uploaded by

Johny balance
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views47 pages

Acc117 Chapter5 Report

This document covers inferential statistics, focusing on point and interval estimates, confidence intervals, and hypothesis testing. It explains key concepts such as confidence levels, significance levels, and the use of z and t distributions for different sample sizes. Additionally, it outlines the systematic procedure for hypothesis testing, including formulating null and alternative hypotheses, selecting significance levels, and making decisions based on statistical evidence.

Uploaded by

Johny balance
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION ACC 117

INFERENTIAL
STATISTICS
CHAPTER 5
POINT ESTIMATE AND
INTERVAL ESTIMATE

CHAPTER 5
POINT ESTIMATE Search

is a statistic, computed from sample information, which is used to


estimate the population parameter.

the sample mean, x̄ , is the point estimate of the population mean,


𝜇.

the sample standard deviation, s, is the point estimate of the


population standard deviation, σ

the sample proportion, p, is the point estimate of the population


proportion, 𝛑
INTERVAL ESTIMATE AND CONFIDENCE INTERVAL

CONFIDENCE INTERVAL CONFIDENCE LEVEL


is a range of values generated from is your degree of trust, faith, or
sample data so that the population conviction that if you repeated the
parameter is likely to fall within that survey or study, you would get the same
range at a specified probability. results.
in statistics, confidence levels of 90
this specified probability is called the percent, 95 percent, and 99 percent are
confidence level. frequently used.

SIGNIFICANCE LEVEL - is the residual of the confidence level.


MARGIN OF SAMPLING ERROR

is usually indicated as +/- x percentage points, where x is


any positive real number.

it gives us an idea of the width of the confidence


interval or how certain we are about the true value of
the population parameter.
CONFIDENCE INTERVAL ESTIMATION FOR LARGE SAMPLES

Knowing the shape of a sampling distribution of the sample mean, x̄ ,


can allow us to construct an interval that has a specified confidence
level of probability that it contains the population mean, 𝜇.

Using the results of the central limit theorem, we can say that for a
reasonably large sample size, we can state the following:
95 percent of the sample means selected from a population will
be within 1.96 standard deviations of the population mean, 𝜇.
99 percent of the sample means selected from a population will
be within 2.58 standard deviations of the population mean, 𝜇.
CONFIDENCE INTERVAL ESTIMATION FOR LARGE SAMPLES

ROLE OF CONFIDENCE LEVEL IN DETERMINING CONFIDENCE INTERVAL

Suppose that we have a confidence level of 95 percent. Because it refers to the middle 95 percent of the
observations, we divide the bell curve into two at its center, with a z score of 0.
Because it is symmetrical, both divisions are mirror images.
Because the sum of probabilities under the entire bell curve is equal to 1, each division has a sum of
probabilities equal to 0.5.
We have 0.025 at both ends since the final 5 percent is equally split between the two tails.
CONFIDENCE INTERVAL ESTIMATION FOR LARGE SAMPLES

ROLE OF CONFIDENCE LEVEL IN DETERMINING CONFIDENCE INTERVAL

Using the standard normal table, locate 0.025 (can be found in the standard normal table for negative z scores).
It corresponds to a z score of -- 1.96.
Therefore, the area under the bell curve from z score of -- 1.96 to 0 is 0.475.
Correspondingly, we do the same thing for the other half to the right of z score 0. (mirror image)
Using the standard normal table, locate 0.975 (can be found in the standard normal table for positive z scores).
It corresponds to a z score of 1.96.
Therefore, the area under the bell curve from the z score of 0 to 1.96 is also 0.475.
Determining z for Different
Confidence Level
The z score for a corresponding
CONFIDENCE
0.9 0.95 0.99 0.999 =1-∝ confidence level is important in
LEVEL
constructing the confidence
SIGNIFICANCE
LEVEL
0.1 0.05 0.01 0.001 =∝ interval.
Z SCORE (LEFT-
-1.28 -1.64 -2.33 -3.09 =normsinv (∝)
It is also important that we are
TAILED)

aware of our sample size and the


Z SCORE (RIGHT-
TAILED)
1.28 1.64 2.33 3.09 =normsinv (1-∝)

Z SCORE (TWO-
availability of the population
±1.64 ±1.96 ±2.58 ±3.29 =±normsinv (∝/2)
TAILED)
standard deviation to know which
formula to use in constructing the
confidence interval.
Determining z for Different
Confidence Level

Confidence interval for the If your sample size is at least 30


Equation population mean when n≥30
5.1 and population standard
and you have information on the
deviation is available. population standard deviation,
Equation 5.1 should be used to
Equation
Confidence interval for the
population mean when n≥30
construct the confidence interval.
5.2 and the population standard
deviation is not available
If the population standard
deviation is not available, Equation
5.2 should be used.
CONFIDENCE INTERVAL ESTIMATION FOR SMALL SAMPLES

Let us say the sample size is less than 30, and we do not know what the
population standard deviation is. We are unable to apply the z
distribution or the conventional normal distribution in this situation.
We would instead apply the t distribution.

The t distribution, also called the Students’ t, is a continuous, bell


shaped, and symmetrical distribution.
CONFIDENCE INTERVAL ESTIMATION FOR SMALL SAMPLES

Standard Normal Value


Equation 3.21
(z score)

Standard Normal Value (z score) when According to the presumption that the
Equation 4.4 population standard deviation is population of interest has a normal
known, regardless of n distribution, the t distribution is also
described as follows:
Standard Normal Value (z score) when There is also a family of t distributions.
Equation 4.5 z population standard deviation is While all t distributions are centered
unknown and n≥30 at mean 0, their standard deviation
differs according to sample size. A
t distribution (t statistic) when smaller sample size has a higher
Equation 5.3 population standard deviation is standard deviation.
unknown and n<30
Compared to the standard normal
Confidence interval for the population distribution, the t distribution is more
Equation 5.4 mean when n<30 and population spread out and flatter at the center.
standard deviation is not available
t Distribution Table
CONFIDENCE INTERVAL ESTIMATION FOR PROPORTION

A proportion is the fraction, ratio, or percentage quantifying the


portion of the sample or the population possessing a particular
trait or interest.
We can create confidence intervals for nominal-level data when
observations are divided into two or more categories that are
mutually exclusive.
CONFIDENCE INTERVAL
ESTIMATION FOR PROPORTION To construct a confidence interval for a
proportion, the following has to be satisfied:
The characteristics of a binomial probability
distribution enumerated in chapter 3
Equation
Sample proportion Value of n 𝛑 and n (1-𝛑) should both be
5.5 greater than or equal to 5 in order for the
standard normal distribution to hold.

p = sample proportion Finding a point estimate for a population


x = number of “successes” proportion and constructing a confidence interval
n = sample size for a population proportion follows the same
procedure for a mean.

The sample proportion is our best approximation


Note that we designate 𝛑 as the population of the unknown population proportion.
proportion. That is, it refers to the percentage of
“successes” in the population.
CONFIDENCE INTERVAL ESTIMATION FOR PROPORTION

Confidence interval for the population


Equation 5.2 mean when n≥30 and the population
standard deviation is not available

Confidence interval for the population


Equation 5.4 mean when n<30 and population
standard deviation is not available

Confidence interval for population


Equation 5.6 p±zσp proportion

To construct a confidence interval for the population proportion, we make


some adjustments to Equation 5.2 and 5.4, so we have equation 5.6.
CONFIDENCE INTERVAL ESTIMATION FOR PROPORTION

Equation 5.7 Standard error of the sample proportion

Note that σp is the standard error of the proportion that measures the variability in the
sampling distribution of the sample proportion.

Equation 5.8 Confidence interval for population proportion

Using Equation 5.7, we restate Equation 5.6 into Equation 5.8, which is the formula to
construct a confidence interval for a population proportion.
FINITE POPULATION CORRECTION FACTOR

While the majority of populations can be quite vast, it is also


possible for populations to be small and limited in number.
That is, a population that has a fixed upper bound is finite.
A finite population can be very large or small.
It can as large as the population of the Philippines or as small as the
number of students in your statistics class.
Hence, we need to make some adjustments in our computation of
the standard error or the sample means and the standard error of
the sample proportions.
FINITE POPULATION CORRECTION FACTOR

Standard error of the sample mean with finite-


Equation 5.9
population correction factor

Standard error of the sample proportion with


Equation 5.10
finite-population correction factor

A finite population is denoted by N, and the sample size is denoted by n.


Equation 5.9 and Equation 5.10 are used to make adjustments.
These are called the finite-population correction factor.
Mathematically, the finite-population correction factor reduces the standard error by
about 5 percent. Consequently, this reduction in the standard error gives a narrower
range of values in estimating the population mean or the population proportion.
HYPOTHESIS TESTING

CHAPTER 5
HYPOTHESIS TESTING

A hypothesis is a statement about a population that needs to be validated using


appropriate testing procedures.
Technically, the term hypothesis comes from the Greek word hypotithenai, which means
“to put under”. Hence, it means an assumption that must be verified.
The process by which we determine their validity or test their reasonableness is called
hypothesis testing.
Because we are using the tools of statistics to conduct hypothesis testing procedures,
we call this process statistical hypothesis testing.
Statistical hypothesis testing is a procedure based on sample evidence and probability
theory to assess whether a hypothesis is to be rejected or not, without a reasonable
doubt.
STATISTICAL HYPOTHESIS TESTING PROCEDURE
We systematize hypothesis testing through this five-step procedure

1 2 3 4 5

State the null Select a Select a test Formulate a Make a decision:


(Ho) and the significance level statistic. decision rule. Either do not
alternative (H1) (α). reject Ho or reject
hypotheses. Ho and do not
reject H1.
STEP 1: State the null and alternative hypothesis

To begin hypothesis testing, state the hypothesis being tested. This statement is
called null hypothesis, denoted by Ho read as “H not” or “H subzero”.
The null hypothesis is a hypothesis that indicates “no change”, “no difference”,
current or reported condition.
Hence, it is always stated in equality terms.
It is also stated in the negative because you have to be able to prove something is
indeed true.
Thus, the null hypothesis is what we test with statistics.
We either reject or fail to reject the null hypothesis
STEP 1: State the null and alternative hypothesis

When you have empirical evidence to reject the null hypothesis, you need to have an
alternative. This is where the alternative hypothesis comes into play.
The alternative hypothesis states what you will choose if the null hypothesis is
rejected.
Formally, it is a statement that is not rejected if the sample data provides sufficient
empirical evidence to reject the null hypothesis.
It is denoted by H1 read as “H sub one”/
If the null hypothesis is called the test statement, the alternative hypothesis is
called the research statement.
STEP 2: Select a significance level (α)

Once you have stated the null and alternative hypotheses, you need to decide on
your significance level, denoted by α.
In hypothesis testing, the significance level is the probability of rejecting the null
hypothesis when it should not have been rejected.
Also called the significance level the risk level because it is the risk you take of
rejecting the null hypothesis when it should not have been rejected.
STEP 2: Select a significance level (α)

TYPE I ERROR AND TYPE II ERROR


When you reject the null hypothesis when it should not have been rejected, you
commit a Type I Error, and the the probability of committing it is represented by the
significance level.
However, it is also possible that instead of rejecting the null hypothesis because
evidence suggested it, you did not reject it. In this scenario, you committed a Type II
Error wherein you did not reject the null hypothesis when it should have been
rejected.
STEP 3: Select a test statistic

A test statistic is a numerical value obtained from sample information used to


conclude whether the null hypothesis should be rejected or not.

There are various statistics, which include z, t, F, and χ2 (chi squared). Each has its
specific use given the nature of the hypothesis, data distribution, and the testing
procedure itself.
STEP 4: Formulate a decision rule

DECISION RULE: CRITICAL VALUE APPROACH


An explanation of the specific criteria under which the null hypothesis should be rejected is
known as a decision rule.
The rejection region or critical region needs to be identified when creating a decision rule.
As long as the the null hypothesis is not rejected, this region contains all values that are either
so little or so huge that there is an extremely low possibility that they will ever occur.
The critical region is demarcated by the critical value, which is the dividing point between the
region where we reject or not reject the null hypothesis.
The critical region’s location is indicated by the inequality sign pointing to the distribution’s tail.
STEP 5: Make a decision

DECISION RULE: p-value Approach


When the test statistic falls within the critical region, or when the probability value
or p-value is below the significance level, we reject the null hypothesis.
APPROACHES TO
HYPOTHESIS TESTING

CHAPTER 5
From our discussion of constructing a
confidence interval, we have to use a
specific formula depending on our
HYPOTHESIS TEST data’s distribution, sample size, level
of measurement, and availability of
ABOUT A population parameters.
POPULATION MEAN The same is true for implementing
statistical hypothesis testing for a
(ONE-SAMPLE, population mean using one sample
only.
z Test) Here we test whether sample data
results are less than or greater than
the hypothesized population
parameter value.
STEP 1: STATE THE NULL AND ALTERNATIVE HYPOTHESES

Where k is any hypothesized value of


Null hypothesis
the population mean

One-tailed, left-tailed test


Alternative
Also used when null is stated as
hypothesis
Ho : 𝜇 ≥ k

One-tailed, right-tailed test


Alternative
Also used when null is stated as
hypothesis
Ho : 𝜇 ≤ k

Alternative
Two-tailed test
hypothesis
STEP 2: SELECT THE LEVEL OF SIGNIFICANCE (α)

CONFIDENCE LEVEL SIGNIFICANCE LEVEL

90% 10%

95% 5%

99% 1%
STEP 3: SELECT A TEST STATISTIC

Test statistic hypothesis testing for the


population mean (𝜇) when population
Equation 5.11 standard deviation (𝜎) is known,
regardless of sample size (n)

Test statistic hypothesis testing for the Depending on your data’s


population mean (𝜇) when population distribution, sample size, level
of measurement, and
standard deviation (𝜎) is unknown
Equation 5.12 z proxied by the sample standard
deviation (s), and the sample size (n) is availability of population
parameters, you may refer to
large

Test statistic hypothesis for the the following formula in


population mean (𝜇) when 𝜎 is unknown
Equation 5.13 but proxied by the sample standard determining the appropriate
deviation (s), and the sample size (n) is test statistic
small

Test statistic in hypothesis testing


Equation 5.14 concerning proportion with n (1-𝛑) ≥ 5 for
the standard normal distribution to hold
STEP 4: FORMULATE A DECISION RULE

Formulate a decision rule


STEP 5: MAKE A DECISION (USING CRITICAL REGION)

To make a decision, we would apply the five-step hypothesis


testing procedure.
State whether the null hypothesis and alternative hypothesis is
rejected or not rejected.
A one-sample t-test is a statistical
hypothesis test that determines if a
HYPOTHESIS TEST population mean is different from a
given value.
ABOUT A
POPULATION MEAN It's used when you have a random
sample from a normal population and
(ONE-SAMPLE, want to compare the sample mean to
a specific value.
t Test)
How to perform a one-sample t test

Calculate the sample mean.

Calculate the sample standard deviation.

Calculate the test statistic.

Compare the test statistic to a t-distribution to calculate the probability of observing the test
statistic under the null hypothesis .

Compare the P-value to the significance level to decide whether to reject the null hypothesis
A one-sample proportion z-test is a
statistical test that compares the
proportion of a sample to a
HYPOTHESIS TEST ABOUT theoretical proportion.

A PROPORTION It is used when you have one sample


(ONE-SAMPLE, Z TEST) with a categorical variable.
How to perform a one-sample z test

Identify the null and alternative hypotheses

Calculate the test statistic

Find the critical value or the p-value

Apply the decision rule

Make a decision
A two-sample z-test is a statistical
hypothesis test that compares the
HYPOTHESIS TEST means of two independent samples
to determine if they come from the
ABOUT A POPULATION same population.

MEAN (INDEPENDENT
TWO-SAMPLE, Z TEST)
When to use a two-sample z-test
The two samples must be independent

The sample size for each sample must be large enough (𝑛≥30)

The population standard deviations for both samples are unknown, but the
sample containing at least 30 observations was randomly selected.
A two-sample t-test, also called an
independent samples t-test, is a
HYPOTHESIS TEST statistical test that compares the
means of two independent groups.
ABOUT A POPULATION
It is used to determine if there's a
MEAN (INDEPENDENT statistically significant difference
TWO-SAMPLE, T TEST) between the two groups.
When to use an independent two-sample, t test
To use the t statistic to test for the difference between two means, the following
assumptions have to be satisfied:

The two sampled populations approximate the normal distribution.

The two samples must be independent.

The two population standard deviations are equal (i.3., equal variances).
METHOD A METHOD B
A researcher wants to compare
the mean test scores of two
85 78
groups of students who were
taught using different teaching 90 82
methods (Method A and Method B).
The researcher collects the 88 80
following data:
92 85

87 83

I L L U S T R A T I O N
We can also conduct statistical
hypothesis testing for a proportion for
two samples, which is similar to a
one-sample hypothesis test.
HYPOTHESIS TEST A hypothesis test for two proportions
ABOUT A PROPORTION compares the proportions of two
independent samples.
(Two-Sample, t test) In implementing a statistical
hypothesis test of proportions for two
samples, we will continue to apply
the five-step hypothesis testing
procedure.
Hypothesis Test about a Proportion (Two-Sample, t test)

Equation 5. 18 represents the pooled proportion who has the trait in the combined samples, which is the
pooled estimate of the population proportion. We then substitute Equation 5.18 into Equation 5.19.

You might also like