0% found this document useful (0 votes)
20 views34 pages

Fds Unit 3 Final Correction

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views34 pages

Fds Unit 3 Final Correction

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 34

AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS

II YEAR / IV SEMESTER (B.Tech- ARTIFICIAL INTELLIGENCE AND DATA


SCIENCE) UNIT – III
INFERENTIAL STATISTICS

PREPARED BY

S.SANTHI PRIYA, M.E., (AP/

AI&DS)

VERIFIED BY

HOD PRINCIPAL CEO/CORRESPONDENT

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

SENGUNTHAR COLLEGE OF ENGINEERING ,TIRUCHENGODE-637 205.


1
UNIT III
INFERENTIAL STATISTICS
 Populations
 samples
 random sampling
 Sampling
distribution

 standard error of the mean


 Hypothesis testing
 z-test
 z-Test Procedure
 Decision Rule
 Calculations
 Decisions
 Interpretations
 One-Tailed And Two-Tailed Tests
 Estimation
 Point Estimate
 Confidence Interval
 Level Of Confidence
 Effect Of Sample Size.

2
LIST OF IMPORTANT QUESTIONS

UNIT III
INFERENTIAL STATISTICS

PART A (2 marks)
1. What population means?

2. What is population and types?

3. What is mean by Finite Population ?

4. What is mean by Infinite Population ?

5. What is mean by Existent Population ?

6. What is mean by Hypothetical Population ?

7. What is mean by Sample ?

8. Explain about Probability Sampling ?

9. Explain about Non Probability Sampling ?

10. Difference between Population and Sample ?

3
PART B(16 marks)

1. Differences Between Population and Sample.


2. Write a Step-by-Step Guide of Hypothesis Testing with Easy Examples ?

3. Write a real time example for hypothesis testing ?


4.what is mean by z-test and its types?
5. Write difference between Z Test vs T-Test ?
6. A Telecom service provider claims that individual customers pay on an average 400 rs.
per month with standard deviation of 25 rs. A random sample of 50 customers bills during
a given month is taken with a mean of 250 and standard deviation of 15. What to say with
respect to the claim made by the service provider?

4
UNIT III
INFERENTIAL STATISTICS
PART A

1. What population means?

A population is the complete set group of individuals, whether that group


comprises a nation or a group of people with a common characteristic. In statistics, a
population is the pool of individuals from which a statistical sample is drawn for a
study.
2. What is population and types?
 Finite Population.
 Infinite Population.
 Existent Population.
 Hypothetical Population.

3. What is mean by Finite Population ?

The finite population is also known as a countable population in which the population can
be counted. In other words, it is defined as the population of all the individuals or objects that are
finite. For statistical analysis, the finite population is more advantageous than the infinite
population.

Examples of finite populations are employees of a company, potential consumer in


a market.

4. What is mean by Infinite Population ?

The infinite population is also known as an uncountable population in which the counting
of units in the population is not possible.

Example of an infinite population is the number of germs in the patient’s body is


uncountable.

5. What is mean by Existent Population ?

The existing population is defined as the population of concrete individuals. In other


words, the population whose unit is available in solid form is known as existent population.
Examples are books, students etc.

5
6. What is mean by Hypothetical Population ?

The population in which whose unit is not available in solid form is known as the
hypothetical population. A population consists of sets of observations, objects etc that are all
something in common. In some situations, the populations are only hypothetical.

Examples are an outcome of rolling the dice, the outcome of tossing a coin.

7. what is mean by Sample ?


It includes one or more observations that are drawn from the population and the
measurable characteristic of a sample is a statistic. Sampling is the process of selecting the
sample from the population. For example, some people living in India is the sample of the
population.

Basically, there are two types of sampling. They are:

 Probability sampling
 Non-probability sampling

8. Explain about Probability Sampling ?

In probability sampling, the population units cannot be selected at the discretion of the
researcher. This can be dealt with following certain procedures which will ensure that every unit
of the population consists of one fixed probability being included in the sample. Such a method
is also called random sampling. Some of the techniques used for probability sampling are:

 Simple random sampling


 Cluster sampling

 Stratified Sampling
 Disproportionate sampling

 Proportionate sampling
 Optimum allocation stratified sampling
 Multi-stage sampling

6
9. Explain about Non Probability Sampling ?

In non-probability sampling, the population units can be selected at the discretion of the
researcher. Those samples will use the human judgements for selecting units and has no
theoretical basis for estimating the characteristics of the population. Some of the techniques
used for non-probability sampling are

 Quota sampling
 Judgement sampling
 Purposive sampling

10. what is mean by Population and Sample Formulas

We will demonstrate here the formulas for mean absolute deviation (MAD), variance and
standard deviation based on population and given sample. Suppose n denotes the size of the
population and n-1 denotes the sample size, then the formulas for mean absolute deviation,
variance and standard deviation are given by;

𝑛
1 ¯
Population MAD = ∑ |𝑥𝑖 − 𝑥|
𝑛i=1 1 ¯
𝑛
Population Variance = (𝜎𝑥)2 = ∑ (𝑥 − 𝑥)2
𝑖
𝑛
𝑖=1
𝑛 ¯
1
Sample MAD = ∑ |𝑥𝑖 − 𝑥|
𝑛 − 1𝑖=1
1 𝑛
Sample Variance = (𝑆𝑥)2 = ¯
𝑛−1 ∑ (𝑥 𝑖 − 𝑥)
2

𝑖=1

𝑛
1 ¯
Population Standard Deviation = 𝜎𝑥 = √ ∑ (𝑥𝑖 − 𝑥) 2
𝑛 𝑖=1

𝑛
1 ¯
Sample Standard Deviation = 𝑆𝑥 = √ ∑ (𝑥𝑖 − 𝑥) 2
𝑛−1
𝑖=1

7
11. Difference between Population and Sample
Some of the key differences between population and sample are clearly given below:

Comparison Population Sample

Meaning Collection of all the units or elements that A subgroup of the


possess common characteristics members of the
population

Includes Each and every element of a group Only includes a handful


of units of population

Characteristics Parameter Statistic

Data Complete enumeration or census Sampling or sample


Collection survey

Focus on Identification of the characteristics Making inferences about


the population

12. What is simple random sampling?

Simple random sampling is a technique where every item in the population has an even
chance and likelihood of being selected. Here, the selection of items entirely depends on luck or
probability; therefore, this sampling technique is also sometimes known as a method of chance.

Simple random sampling is a fundamental sampling method and can easily be a


component of a more complex sampling method. The main attribute of this sampling method is
that every sample has the same probability of being chosen.

13. write Simple random sampling formula ?

Consider a hospital has 1000 staff members, and they need to allocate a night shift to 100
members. All their names will be put in a bucket to be randomly selected. Since each person
has an equal chance of being selected, and since we know the population size (N) and sample
size (n), the calculation can be as follows:

8
P=1- N-1/N.N-2/N-1….N-n/N-(n-1)

Cancelling=1-N-n/N

=n/N

=100/1000

=10%

14. write a Advantages of simple random sampling

 It is a fair sampling method, and if applied appropriately, it helps reduce any bias involved
compared to any other sampling method.
 Since it involves a large sample frame, it is usually easy to pick a smaller sample size
from the existing larger population.
 The person conducting the research doesn’t need to have prior knowledge of the data he/
she is collecting. One can ask a question to gather the researcher need not be a subject
expert.
 This sampling method is a fundamental method of collecting the data. You don’t need any
technical knowledge. You only require essential listening and recording skills.
 Since the population size is vast in this type of sampling method, there is no restriction on
the sample size that the researcher needs to create. From a larger population, you can
get a small sample quite quickly.
 The data collected through this sampling method is well informed; more the samples
better is the quality of the data.

15. write a steps for Simple Random Sampling

 Make a list of all the employees working in the organization. (as mentioned above, there
are 500 employees in the organization, the record must contain 500 names).
 Assign a sequential number to each employee (1,2,3…n). This is your sampling frame
(the list from which you draw your simple random sample).
 Figure out what your sample size is going to be. (In this case, the sample size is 100).
 Use a random number generator to select the sample, using your sampling frame
(population size) from Step 2 and your sample size from Step 3.

9
 For example, if your sample size is 100 and your population is 500, generate 100 random
numbers between 1 and 500.

16. What is meant by sampling distribution?

A sampling distribution refers to a probability distribution of a statistic that comes


from choosing random samples of a given population. Also known as a finite-sample
distribution, it represents the distribution of frequencies on how spread apart various outcomes
will be for a specific population.

17. write down the formula and Calculation of Standard Error ?

The standard error of an estimate can be calculated as the standard deviation divided by
the square root of the sample size:

SE = σ / √n
Where,

σ = the population standard deviation

√n = the square root of the sample size

If the population standard deviation is not known, you can substitute the sample standard
deviation, s, in the numerator to approximate the standard error.

18. what is mean by Standard Error of the Mean (SEM)

The standard error of the mean also called the standard deviation of mean, is represented
as the standard deviation of the measure of the sample mean of the population.

It is abbreviated as SEM. For example, normally, the estimator of the population mean is
the sample mean. But, if we draw another sample from the same population, it may provide a
distinct value.

Thus, there would be a population of the sampled means having its distinct variance and mean.
It may be defined as the standard deviation of such sample means of all the possible samples
taken from the same given population. SEM defines an estimate of standard deviation which has

10
been computed from the sample. It is calculated as the ratio of the standard deviation to the root
of sample size, such as:

19. How to calculate Standard Error?


Step 1: Note the number of measurements (n) and determine the sample mean (μ). It is the
average of all the measurements.

Step 2: Determine how much each measurement varies from the mean.

Step 3: Square all the deviations determined in step 2 and add altogether: Σ(xi – μ)²

Step 4: Divide the sum from step 3 by one less than the total number of measurements (n-1).

Step 5: Take the square root of the obtained number, which is the standard deviation (σ).

Step 6: Finally, divide the standard deviation obtained by the square root of the number of
measurements (n) to get the standard error of your estimate.
Go through the example given below to understand the method of calculating standard error.

20. Calculate the standard error of the given data: y: 5, 10, 12, 15, 20

First we have to find the mean of the given data;


Mean = (5+10+12+15+20)/5 = 62/5 = 10.5
Now, the standard deviation can be calculated as;
S = Summation of difference between each value of given data and the mean value/Number
of values.
Hence,

After solving the above equation, we get;


S = 5.35
Therefore, SE can be estimated with the formula;
SE = S/√n
SE = 5.35/√5 = 2.39

11
21. What is a good standard error?

SE is an implication of the expected precision of the sample mean as compared with the
population mean. The bigger the value of standard error, the more the spread and likelihood
that any sample means are not close to the population’s mean. A small standard error is thus
a good attribute.
22.What is a big standard error?

The bigger the standard error, the more the spread means there will be less accurate
statistics.
23. What is hypothesis testing?
All analysts use a random population sample to test two different hypotheses: the null
hypothesis and the alternative hypothesis. The null hypothesis is usually a hypothesis of equality
between population parameters; e.g., a null hypothesis may state that the population mean
return is equal to zero.
24. What are the 5 steps of hypothesis testing?
Step 1: State your null and alternate hypothesis. ...
Step 2: Collect data. ...
Step 3: Perform a statistical test. ...
Step 4: Decide whether to reject or fail to reject your null hypothesis. ...
Step 5: Present your findings.
25. What are the 5 steps of hypothesis testing?
Step 1: State your null and alternate
hypothesis. ...
Step 2: Collect data. ...
Step 3: Perform a statistical test. ...
Step 4: Decide whether to reject or fail to reject your null hypothesis. ...
Step 5: Present your findings.
26. What are the 3 main
hypothesis?
 Simple hypothesis.
 Complex hypothesis
 Directional hypothesis.

27. What Is a Z-Test?


z-test is a statistical test used to determine whether two population means are different

12
when the variances are known and the sample size is large.

13
28. What is difference between z-test and t-test?
Z-test is the statistical hypothesis used to determine whether the two samples'
means calculated are different if the standard deviation is available and the sample is large.
In contrast, the T-test determines how averages of different data sets differ in case the
standard deviation or the variance is unknown.
29. What are the types of z-test?
There are following different types of Z-tests which are used to perform different types of
hypothesis testing.
 One-sample Z-test for means.
 Two-sample Z-test for means.
 One sample Z-test for proportion.
 Two sample Z-test for proportions.

30.How do I calculate the Z test statistic?


 Compute the arithmetic mean of your sample.
 From this mean subtract the mean postulated in null hypothesis.
 Multiply by the square root of size sample.
 Divide by the population standard deviation.
 That's it, you've just computed the Z test statistic .
31. What is the decision rule example?
A decision rule is a simple IF-THEN statement consisting of a condition (also called
antecedent) and a prediction. For example: IF it rains today AND if it is April (condition),
THEN it will rain tomorrow (prediction).

Figure 1: example for decision rule

14
32. What Is Decision And Example?
The Act Of Or Need For Making Up One's Mind: this is a difficult decision. Something
that is decided; resolution: she made a poor decision when she dropped out of school.
33. What are the 4 types of decisions?

The four decision-making styles include:


 Analytical.

 Directive.
 Conceptual.

 Behavioral.
34. What is Data Interpretation?

Data interpretation is the process of reviewing data through some predefined processes
which will help assign some meaning to the data and arrive at a relevant conclusion. It involves
taking the result of data analysis, making inferences on the relations studied, and using them to
conclude.

35. What are Data Interpretation Methods?

Data interpretation methods are how analysts help people make sense of numerical
data that has been collected, analyzed and presented. Data, when collected in raw form, may be
difficult for the layman to understand, which is why analysts need to break down the information
gathered so that others can make sense of it.

For example, when founders are pitching to potential investors, they must interpret data (e.g.
market size, growth rate, etc.) for better understanding. There are 2 main methods in which this
can be done, namely; quantitative methods and qualitative methods.

36. What are the types of interpretation of data?


There are two methods to interpret data: quantitative method and qualitative method.
Types of data interpretation include bar graphs, line graphs, histograms, heat maps, tables,
scatter plots and pie charts.
37. What are the five steps for data interpretation?
Steps of Data Analysis
Step 1 - Determining the objective.
Step 2: Gathering the data.

15
Step 3: Cleaning the data.
Step 4: Interpreting the data.
Step 5: Sharing the results.
38. What is a one-tailed test?
A one-tailed test is a statistical test in which the critical area of a distribution is one-sided
so that it is either greater than or less than a certain value, but not both. If the sample being
tested falls into the one-sided critical area, the alternative hypothesis will be accepted instead of
the null hypothesis.
39. What is a two-tailed test example?
For example, let's say you were running a z test with an alpha level of 5% (0.05). In a one
tailed test, the entire 5% would be in a single tail. But with a two tailed test, that 5% is split
between the two tails, giving you 2.5% (0.025) in each tail.
40. When would you use a one-tailed test?
While you gain more statistical power in one direction, the test has absolutely no power in
the other direction. Suppose you are testing a new vaccine and want to determine whether it's
better than the current vaccine. You use a one-tailed test to improve the test's ability to learn
whether the new vaccine is better.
41. What is meant by estimation in statistics?
Estimation is concerned with inference about the numerical value of unknown
population values from incomplete data such as a sample. Context: If a single figure is
calculated for each unknown parameter the process is called point estimation
42. What is the best definition of estimation?
the act of estimating something. : the value, amount, or size arrived at in an estimate.
esteem, honor
43. What are the different confidence levels?
The most common confidence levels are 90%, 95% and 99%. The following table
contains a summary of the values of corresponding to these common confidence levels. (Note
that the"confidence coefficient" is merely the confidence level reported as a proportion rather
than as a percentage.)
44. What are the 3 types of confidence?
 Here are the 3 types of confidence one can have:
 Self-Centered Confidence.
 Perfection-Seeking Confidence.
 Faith-filled Confidence.

16
PART B
16-marks

1. Differences Between Population and Sample

Now, try to understand what a sample and a population are, with the help of suitable examples.

Population Sample

All residents of a country would constitute the All residents who live above the
Population set poverty line would be the Sample

All residents above the poverty line in a country All residents who are millionaires
would be the Population would make up the Sample

Out of all the employees, all


All employees in an office would be the Population managers in the office would be the
Sample

Table 1: Population vs Sample

2.Write a Step-by-Step Guide of Hypothesis Testing with Easy Examples ?


Hypothesis testing is a formal procedure for investigating our ideas about the world
using statistics. It is most often used by scientists to test specific predictions, called hypotheses,
that arise from theories.

There are 5 main steps in hypothesis testing:

1. State your research hypothesis as a null hypothesis and alternate hypothesis (Ho)
and (Ha or H1).
2.
Collect data in a way designed to test the hypothesis.
3.
Perform an appropriate statistical test.
4.
Decide whether to reject or fail to reject your null hypothesis.
5.
Present the findings in your results and discussion section.

17
Step 1: State your null and alternate hypothesis

After developing your initial research hypothesis (the prediction that you want to
investigate), it is important to restate it as a null (Ho) and alternate (Ha) hypothesis so that you
can test it mathematically.

The alternate hypothesis is usually your initial hypothesis that predicts a relationship between
variables. The null hypothesis is a prediction of no relationship between the variables you are
interested in.

Hypothesis testing example


You want to test whether there is a relationship between gender and height. Based on your
knowledge of human physiology, you formulate a hypothesis that men are, on average, taller
than women. To test this hypothesis, you restate it as:

H0: Men are, on average, not taller than women.


Ha: Men are, on average, taller than women.

Step 2: Collect data

For a statistical test to be valid, it is important to perform sampling and collect data in a
way that is designed to test your hypothesis. If your data are not representative, then you
cannot make statistical inferences about the population you are interested in.

Hypothesis testing example :


To test differences in average height between men and women, your sample should have
an equal proportion of men and women, and cover a variety of socio-economic classes and any
other control variables that might influence average height.
You should also consider your scope (Worldwide? For one country?) A potential data source in
this case might be census data, since it includes data from a variety of regions and social
classes and is available for many countries around the world.

Step 3: Perform a statistical test

There are a variety of statistical tests available, but they are all based on the comparison
of within-group variance (how spread out the data is within a category) versus between-group
variance (how different the categories are from one another).

If the between-group variance is large enough that there is little or no overlap between
groups, then your statistical test will reflect that by showing a low p-value. This means it is
unlikely that the differences between these groups came about by chance.
18
Alternatively, if there is high within-group variance and low between-group variance, then
your statistical test will reflect that with a high p-value. This means it is likely that any difference
you measure between groups is due to chance.

Your choice of statistical test will be based on the type of variables and the level of
measurement of your collected data.

Hypothesis testing example:


Based on the type of data you collected, you perform a one-tailed t-test to test whether men
are in fact taller than women. This test gives you:an estimate of the difference in average height
between the two groups.

 a p-value showing how likely you are to see this difference if the null hypothesis of
no difference is true.

Your t-test shows an average height of 175.4 cm for men and an average height of 161.7 cm for
women, with an estimate of the true difference ranging from 10.2 cm to infinity. The p-value is
0.002.

Step 4: Decide whether to reject or fail to reject your null hypothesis

Based on the outcome of your statistical test, you will have to decide whether to reject or
fail to reject your null hypothesis.

In most cases you will use the p-value generated by your statistical test to guide your
decision. And in most cases, your predetermined level of significance for rejecting the null
hypothesis will be 0.05 – that is, when there is a less than 5% chance that you would see these
results if the null hypothesis were true.

In some cases, researchers choose a more conservative level of significance, such as


0.01 (1%). This minimizes the risk of incorrectly rejecting the null hypothesis (Type I error).

Hypothesis testing example :


In your analysis of the difference in average height between men and women, you
find that the p-value of 0.002 is below your cutoff of 0.05, so you decide to reject your null
hypothesis of no difference.

Step 5: Present your findings

The results of hypothesis testing will be presented in the results and discussion
sections of your research paper, dissertation or thesis.
19
In the results section you should give a brief summary of the data and a summary of the
results of your statistical test (for example, the estimated difference between group means and
associated p-value). In the discussion, you can discuss whether your initial hypothesis was
supported by your results or not.

Stating results in a statistics assignment example:


In our comparison of mean height between men and women we found an average
difference of 13.7 cm and a p-value of 0.002; therefore, we can reject the null hypothesis that
men are not taller than women and conclude that there is likely a difference in height between
men and women.

3.Write a real time example for hypothesis testing ?


Peppermint Essential Oil:
Essential oils are becoming more and more popular. Chamomile, lavender, and

ylang-ylang are commonly touted as anxiety remedies. Perhaps you'd like to test the healing

powers of peppermint essential oil. Your hypothesis might go something like this:

 Null hypothesis - Peppermint essential oil has no effect on the pangs of anxiety.

 Alternative hypothesis - Peppermint essential oil alleviates the pangs of anxiety.

 Significance level - The significance level is 0.25 (allowing for a better shot at proving

your alternative hypothesis).

 P-value - The p-value is calculated as 0.05.

 Conclusion - After providing one group with peppermint oil and the other with a

placebo, you gauge the difference between the two based on self-reported levels of

anxiety. Based on your calculations, the difference between the two groups is

statistically significant with a p-value of 0.05, well below the defined alpha of 0.25. You

conclude that your study supports the alternative hypothesis that peppermint essential

oil can alleviate the pangs of anxiety.

20
4.what is mean by z-test and its types?
A z test is a test that is used to check if the means of two populations are different or not
provided the data follows a normal distribution. For this purpose, the null hypothesis and the
alternative hypothesis must be set up and the value of the z test statistic must be calculated. The
decision criterion is based on the z critical value.

Z Test Definition

A z test is conducted on a population that follows a normal distribution with independent


data points and has a sample size that is greater than or equal to 30. It is used to check whether
the means of two populations are equal to each other when the population variance is
known. The null hypothesis of a z test can be rejected if the z test statistic is statistically
significant when compared with the critical value.

Z Test Formula

The z test formula compares the z statistic with the z critical value to test whether there is
a difference in the means of two populations. In hypothesis testing, the z critical value divides
the distribution graph into the acceptance and the rejection regions. If the test statistic falls in the
rejection region then the null hypothesis can be rejected otherwise it cannot be rejected. The z
test formula to set up the required hypothesis tests for a one sample and a two-sample z test are
given below.

One-Sample Z Test

A one-sample z test is used to check if there is a difference between the sample


mean and the population mean when the population standard deviation is known. The formula
for the z:
The sample mean, μμ is the population mean, σσ is the population standard deviation
and n is the sample size.

The algorithm to set a one sample z test based on the z test statistic is given as follows:

Left Tailed Test:

Null Hypothesis: H0H0 : μ=μ0μ=μ0

Alternate Hypothesis: H1H1 :

μ<μ0μ<μ0
21
Decision Criteria: If the z statistic < z critical value then reject the null hypothesis.

Right Tailed Test:

Null Hypothesis: H0H0 : μ=μ0μ=μ0

Alternate Hypothesis: H1H1 :

μ>μ0μ>μ0

Decision Criteria: If the z statistic > z critical value then reject the null hypothesis.

Two Tailed Test:

Null Hypothesis: H0H0 : μ=μ0μ=μ0

Alternate Hypothesis: H1H1 :

μ≠μ0μ≠μ0

Decision Criteria: If the z statistic > z critical value then reject the null hypothesis.

Two Sample Z Test

A two sample z test is used to check if there is a difference between the means of two
samples. The z test statistic formula is given as follows:

z= (x1−x2)−(μ1−μ2)√σ21n1+σ22n2(x1¯−x2¯)−(μ1−μ2)σ12n1+σ22n2. x1x1¯ , μ1μ1, σ21σ12 are


the sample mean, population mean and population variance respectively for the first
sample. x2x2, μ2μ2, σ22σ22 are the sample mean, population mean and population variance
respectively for the second sample.

The two-sample z test can be set up in the same way as the one-sample test. However, this test
will be used to compare the means of the two samples. For example, the null hypothesis is given
as H0H0 : μ1=μ2μ1=μ2.

22
Figure 2 : Rejection Region

Z Test for Proportions

A z test for proportions is used to check the difference in proportions. A z test can either
be used for one proportion or two proportions. The formulas are given as follows.

One Proportion Z Test

 A one proportion z test is used when there are two groups and compares the value of an
observed proportion to a theoretical one. The z test statistic for a one proportion z test is
given as follows:
 z = p−p0√p0(1−p0)np−p0p0(1−p0)n. Here, p is the observed value of the
proportion, p0p0 is the theoretical proportion value and n is the sample size.
 The null hypothesis is that the two proportions are the same while the alternative
hypothesis is that they are not the same.

Two Proportion Z Test

A two proportion z test is conducted on two proportions to check if they are the same or
not. The test statistic formula is given as follows:

z =p1−p2−0√p(1−p)(1n1+1n2)p1−p2−0p(1−p)(1n1+1n2)

where p = x1+x2n1+n2x1+x2n1+n2

23
p1p1 is the proportion of sample 1 with sample size n1n1 and x1x1 number of trials.

p2p2 is the proportion of sample 2 with sample size n2n2 and x2x2 number of trials.

5.Write difference between Z Test vs T-Test

Both z test and t-test are univariate tests used on the means of two datasets. The
differences between both tests are outlined in the table given below:

Z Test T-Test

A z test is a statistical test that is used to


A t-test is used to check if the means of two
check if the means of two data sets are
data sets are different when the population
different when the population variance is
variance is not known.
known.

The sample size is greater than or equal The sample size is lesser than 30.
to 30.

The data follows a normal distribution. The data follows a student-t distribution.

The one-sample z test statistic is given The t test statistic is given


by x−μσ√nx¯−μσn as x−μs√nx−μsn where s is the sample
standard deviation

TABLE 2 : Z Test And T Test

6. A Telecom service provider claims that individual customers pay on an


average 400 rs. per month with standard deviation of 25 rs. A random sample of 50
customers bills during a given month is taken with a mean of 250 and standard
deviation of 15. What to say with respect to the claim made by the service provider?
Solution:
First thing first, Note down what is given in the question:

H0 (Null Hypothesis) : μ = 400

H1 (Alternate Hypothesis): μ ≠ 400 (Not equal means either μ > 400 or μ < 400 Hence it

will be validated with two tailed test )

24
σ = 25 (Population Standard Deviation)

LoS (α) = 5% (Take 5% if not given in question)

n = 50 (Sample size)

xbar x̄ = 250 (Sample mean)

s = 15 (sample Standard deviation) n > = 30 hence will go with z-test

Step 1: Calculate z using z-test formula as below:

z = (x̄ - μ)/ (σ/√n) z = (250 - 400) / (25/√50) z = -42.42

Step 2:

get z critical value from z table for α =

5% z critical values = (-1.96, +1.96)

to accept the claim (significantly), calculated z should be in between

-1.96 < z < +1.96

but calculated z (-42.42) < -1.96 which mean reject the null hypothesis

Figure 3: Region Of Rejection

7.write any example problem for Hypothesis Testing Problem ?


In the population, the average IQ is 100 with a standard deviation of 15. A team of
scientists want to test a new medication to see if it has either a positive or negative effect on
intelligence, or not effect at all. A sample of 30 participants who have taken the medication has
a mean of 140. Did the medication affect intelligence?

25
Steps
Follow these simple steps in solving this question. Leave a question at the left of this page if you
hava any.

Step 1: Set up the null and alternate hypothesis


H0: medication affects intelligence
Ha: medication does not affect intellignece(not that the alternate hypothesis is always the
opposite of the null hypothesis)

Step 2: Determine the type of test to use


Since the sample size is 30, we use the z-test. See why we use the z-test when sample size
is 30 and above in Parametric Tests in Statistics, When to use which

Step 3: Calculate the tested statistic z using the formula

This formular can also be written like this:

Using the data given in the equation we would have the following:
μ0 = 100
σ = 15
n = 30
x̄ n = 140

Plugging the values into the formular we have: Z= x̄ n- μ0/ σ

Step 4: Look up the values of z ( called the critical value) from statistical tables.
You can access statistical table from here. Statistical TableFrom the table: we get a value
of 1.96

Step 5: Draw a conclusion


In this case the tested statistic value of z calculated is more than the critical value obtained
from statistical tables.

14.606 > 1.96

Therefore we reject the null hypothesis.

26
8.Explain about z-test ? explain and its type ?

Z-test is a statistical method to determine whether the distribution of the test


statistics can be approximated by a normal distribution. It is the method to determine whether
two sample means are approximately the same or different when their variance is known and
the sample size is large (should be >= 30).

When to Use Z-test:


 The sample size should be greater than 30. Otherwise, we should use the t-test.
 Samples should be drawn at random from the population.
 The standard deviation of the population should be known.
 Samples that are drawn from the population should be independent of each other.
 The data should be normally distributed, however for large sample size, it is assumed to
have a normal distribution.

Hypothesis Testing
A hypothesis is an educated guess/claim about a particular property of an object. Hypothesis
testing is a way to validate the claim of an experiment.

 Null Hypothesis: The null hypothesis is a statement that the value of a population
parameter (such as proportion, mean, or standard deviation) is equal to some claimed
value. We either reject or fail to reject the null hypothesis. Null Hypothesis is denoted by H
0.
 Alternate Hypothesis: The alternative hypothesis is the statement that the parameter has
a value that is different from the claimed value. It is denoted by HA.
Level of significance: It means the degree of significance in which we accept or reject the null-
hypothesis. Since in most of the experiments 100% accuracy is not possible for accepting or
rejecting a hypothesis, so we, therefore, select a level of significance. It is denoted by alpha
(𝖺).
Steps to perform Z-test:
 First, identify the null and alternate hypotheses.
 Determine the level of significance (𝖺).
 Find the critical value of z in the z-test using
 Calculate the z-test statistics. Below is the formula for calculating the z-test statistics.

 where,
 X¯ : mean of the sample.
27
 Mu: mean of the population.

28
 Sd: Standard deviation of the population.
 n: sample size.
 Now compare with the hypothesis and decide whether to reject or not to reject the null
hypothesis
Type of Z-test
 Left-tailed Test: In this test, our region of rejection is located to the extreme left of the
distribution. Here our null hypothesis is that the claimed value is less than or equal to the
mean population value.

Figure 4 : Left tailed test

 Right-tailed Test: In this test, our region of rejection is located to the extreme right of the
distribution. Here our null hypothesis is that the claimed value is less than or equal to the
mean population value.

Figure 5 : Right Tailed Test

29
Two-tailed test: In this test, our region of rejection is located to both extremes of the
distribution. Here our null hypothesis is that the claimed value is equal to the mean population
value.

Figure 6: Double Tailed Test

9. What is mean by F TEST and explain its procedure ?

The F-Test helps to determine the overall significance of the regression. It is useful in

various situations, such as when a quality controller wants to determine whether the product’s

quality is deteriorating over time. In addition, it might be useful for an economist to determine

whether income variability varies between two populations.

F-Test in Statistics Explained

F-test in statistics helps to decide whether two populations’ variances are equal. This is
the variance ratio test because it calculates the ratio of variances. The goal of the test is to
determine whether the variance in two populations is equal. It was propounded by British
polymath R.A. Fisher and named to honor him. G.W. Snedecor later developed the test.

The following conditions are critical for using the F-test to compare the variances of two
populations:

1. Normality: the populations must have a normal distribution.

30
2. Independent and random selection of sample items: the selection of the samples’
components should be independent and random.
3. More than unity: The variance ratio must be one or larger than one; it cannot be less than one.
When dividing variance estimates, smaller estimates divide the larger estimates of variances.
4. The additive property states that the total of different variance components will equal the total
variance, i.e., the total variance is the sum of the variance between samples and the variance
within samples.

Formula

1. Sample variances: The formula for calculating sample variances is as follows (an online F-
test calculator can make it easier):

2. Null hypothesis: After the formation of the test, the null hypothesis are either
a) Two samples were from the same group or

(b) The population’s variances concerning both samples are equal.

3. To compute the variance ratio, use the formula F = larger estimate divided by a smaller
estimate of variance. Regardless of whether S12 or S22, the numerator will always be the larger
value.

4. When calculating degrees of freedom, the larger the sample’s variance is V1; the smaller
variance is V2.

5. Table value of F: the critical value of F is available from the “F-Table” (F-test table) at the
determined significance level.

6. Analysis: This involves the comparison of the computed value and the tabulated value. For
various levels of significance, there are several F Tables (F-test tables).

(a) The variance ratio is insignificant if F < OR = F0.5. We can assume that the values are from
the same group or groups with similar variances.

(b) The null hypothesis is rejected, and the variance ratio is considered significant if F> OR =
F0.5.

31
10. Consider the example of the population in a village:

Village A B

Sample size 10 12

Mean monthly income 150 140

Sample variance 92 110

Testing the equality of sample variances with a significance level of 5% with the above-given
date.

Variance sample for S12 (sample1) =

And

Variance sample for S22(sample 2) =

F= S1 2/ S2 2 = 10.22/10=1.022

The critical value for v1 (10-1) = 9 and v2 (12-1) =11 and the table value of F at 5%
significance level = 2.90. An online f-test calculator can help you in making the easier calculation
.

11. A teacher claims that the mean score of students in his class is greater than 82 with a
standard deviation of 20. If a sample of 81 students was selected with a mean score of 90 then
check if there is enough evidence to support this claim at a 0.05 significance level.

 Solution: As the sample size is 81 and population standard deviation is known, this is
an example of a right-tailed one-sample z test.

H0H0 : μ=82μ=82

H1H1 : μ>82μ>82

From the z table the critical value at αα = 1.645 z = x−μσ√nx−μσn

32
xx= 90, μμ = 82, n = 81, σσ = 20

z = 3.6

As 3.6 > 1.645 thus, the null hypothesis is rejected and it is concluded that there is enough
evidence to support the teacher's claim.

Answer: Reject the null hypothesis

12. Does Intervention Increase HIV Knowledge?

Early in the HIV epidemic, there was poor knowledge of HIV transmission risks among health
care staff. A short training was developed to improve knowledge and attitudes around HIV
disease. Was the training effective in improving knowledge?

Table - Mean ( ± SD) knowledge scores, pre- and post-intervention, n=15

Pre-Intervention Post-Intervention Change


18.3 ± 3.8 21.9 ± 4.4 3.53 ± 3.93

The raw data for this comparison is shown in the next table.

Subject Kscore1 Kscore2 Difference

1 17 22 5

2 17 21 4

3 15 21 6

4 19 26 7

5 18 20 2

6 14 14 0

7 27 31 4

8 20 18 -2

9 12 22 10

33
10 21 20 -1

11 20 27 7

12 24 23 -1

13 17 15 -2

14 17 24 7

15 17 24 7

mean difference = 3.533333


sddiff= 3.925497

p-value= 0.003634

The strategy is to calculate the pre-/post- difference in knowledge score for each person and
determine whether the mean difference=0.

First, establish the null and alternative hypotheses.


H0: μd = 0

H1: μd ≠ 0

Then compute the test statistic for paired or matched samples

34

You might also like