Fds Unit 3 Final Correction
Fds Unit 3 Final Correction
PREPARED BY
AI&DS)
VERIFIED BY
2
LIST OF IMPORTANT QUESTIONS
UNIT III
INFERENTIAL STATISTICS
PART A (2 marks)
1. What population means?
3
PART B(16 marks)
4
UNIT III
INFERENTIAL STATISTICS
PART A
The finite population is also known as a countable population in which the population can
be counted. In other words, it is defined as the population of all the individuals or objects that are
finite. For statistical analysis, the finite population is more advantageous than the infinite
population.
The infinite population is also known as an uncountable population in which the counting
of units in the population is not possible.
5
6. What is mean by Hypothetical Population ?
The population in which whose unit is not available in solid form is known as the
hypothetical population. A population consists of sets of observations, objects etc that are all
something in common. In some situations, the populations are only hypothetical.
Examples are an outcome of rolling the dice, the outcome of tossing a coin.
Probability sampling
Non-probability sampling
In probability sampling, the population units cannot be selected at the discretion of the
researcher. This can be dealt with following certain procedures which will ensure that every unit
of the population consists of one fixed probability being included in the sample. Such a method
is also called random sampling. Some of the techniques used for probability sampling are:
Stratified Sampling
Disproportionate sampling
Proportionate sampling
Optimum allocation stratified sampling
Multi-stage sampling
6
9. Explain about Non Probability Sampling ?
In non-probability sampling, the population units can be selected at the discretion of the
researcher. Those samples will use the human judgements for selecting units and has no
theoretical basis for estimating the characteristics of the population. Some of the techniques
used for non-probability sampling are
Quota sampling
Judgement sampling
Purposive sampling
We will demonstrate here the formulas for mean absolute deviation (MAD), variance and
standard deviation based on population and given sample. Suppose n denotes the size of the
population and n-1 denotes the sample size, then the formulas for mean absolute deviation,
variance and standard deviation are given by;
𝑛
1 ¯
Population MAD = ∑ |𝑥𝑖 − 𝑥|
𝑛i=1 1 ¯
𝑛
Population Variance = (𝜎𝑥)2 = ∑ (𝑥 − 𝑥)2
𝑖
𝑛
𝑖=1
𝑛 ¯
1
Sample MAD = ∑ |𝑥𝑖 − 𝑥|
𝑛 − 1𝑖=1
1 𝑛
Sample Variance = (𝑆𝑥)2 = ¯
𝑛−1 ∑ (𝑥 𝑖 − 𝑥)
2
𝑖=1
𝑛
1 ¯
Population Standard Deviation = 𝜎𝑥 = √ ∑ (𝑥𝑖 − 𝑥) 2
𝑛 𝑖=1
𝑛
1 ¯
Sample Standard Deviation = 𝑆𝑥 = √ ∑ (𝑥𝑖 − 𝑥) 2
𝑛−1
𝑖=1
7
11. Difference between Population and Sample
Some of the key differences between population and sample are clearly given below:
Simple random sampling is a technique where every item in the population has an even
chance and likelihood of being selected. Here, the selection of items entirely depends on luck or
probability; therefore, this sampling technique is also sometimes known as a method of chance.
Consider a hospital has 1000 staff members, and they need to allocate a night shift to 100
members. All their names will be put in a bucket to be randomly selected. Since each person
has an equal chance of being selected, and since we know the population size (N) and sample
size (n), the calculation can be as follows:
8
P=1- N-1/N.N-2/N-1….N-n/N-(n-1)
Cancelling=1-N-n/N
=n/N
=100/1000
=10%
It is a fair sampling method, and if applied appropriately, it helps reduce any bias involved
compared to any other sampling method.
Since it involves a large sample frame, it is usually easy to pick a smaller sample size
from the existing larger population.
The person conducting the research doesn’t need to have prior knowledge of the data he/
she is collecting. One can ask a question to gather the researcher need not be a subject
expert.
This sampling method is a fundamental method of collecting the data. You don’t need any
technical knowledge. You only require essential listening and recording skills.
Since the population size is vast in this type of sampling method, there is no restriction on
the sample size that the researcher needs to create. From a larger population, you can
get a small sample quite quickly.
The data collected through this sampling method is well informed; more the samples
better is the quality of the data.
Make a list of all the employees working in the organization. (as mentioned above, there
are 500 employees in the organization, the record must contain 500 names).
Assign a sequential number to each employee (1,2,3…n). This is your sampling frame
(the list from which you draw your simple random sample).
Figure out what your sample size is going to be. (In this case, the sample size is 100).
Use a random number generator to select the sample, using your sampling frame
(population size) from Step 2 and your sample size from Step 3.
9
For example, if your sample size is 100 and your population is 500, generate 100 random
numbers between 1 and 500.
The standard error of an estimate can be calculated as the standard deviation divided by
the square root of the sample size:
SE = σ / √n
Where,
If the population standard deviation is not known, you can substitute the sample standard
deviation, s, in the numerator to approximate the standard error.
The standard error of the mean also called the standard deviation of mean, is represented
as the standard deviation of the measure of the sample mean of the population.
It is abbreviated as SEM. For example, normally, the estimator of the population mean is
the sample mean. But, if we draw another sample from the same population, it may provide a
distinct value.
Thus, there would be a population of the sampled means having its distinct variance and mean.
It may be defined as the standard deviation of such sample means of all the possible samples
taken from the same given population. SEM defines an estimate of standard deviation which has
10
been computed from the sample. It is calculated as the ratio of the standard deviation to the root
of sample size, such as:
Step 2: Determine how much each measurement varies from the mean.
Step 3: Square all the deviations determined in step 2 and add altogether: Σ(xi – μ)²
Step 4: Divide the sum from step 3 by one less than the total number of measurements (n-1).
Step 5: Take the square root of the obtained number, which is the standard deviation (σ).
Step 6: Finally, divide the standard deviation obtained by the square root of the number of
measurements (n) to get the standard error of your estimate.
Go through the example given below to understand the method of calculating standard error.
20. Calculate the standard error of the given data: y: 5, 10, 12, 15, 20
11
21. What is a good standard error?
SE is an implication of the expected precision of the sample mean as compared with the
population mean. The bigger the value of standard error, the more the spread and likelihood
that any sample means are not close to the population’s mean. A small standard error is thus
a good attribute.
22.What is a big standard error?
The bigger the standard error, the more the spread means there will be less accurate
statistics.
23. What is hypothesis testing?
All analysts use a random population sample to test two different hypotheses: the null
hypothesis and the alternative hypothesis. The null hypothesis is usually a hypothesis of equality
between population parameters; e.g., a null hypothesis may state that the population mean
return is equal to zero.
24. What are the 5 steps of hypothesis testing?
Step 1: State your null and alternate hypothesis. ...
Step 2: Collect data. ...
Step 3: Perform a statistical test. ...
Step 4: Decide whether to reject or fail to reject your null hypothesis. ...
Step 5: Present your findings.
25. What are the 5 steps of hypothesis testing?
Step 1: State your null and alternate
hypothesis. ...
Step 2: Collect data. ...
Step 3: Perform a statistical test. ...
Step 4: Decide whether to reject or fail to reject your null hypothesis. ...
Step 5: Present your findings.
26. What are the 3 main
hypothesis?
Simple hypothesis.
Complex hypothesis
Directional hypothesis.
12
when the variances are known and the sample size is large.
13
28. What is difference between z-test and t-test?
Z-test is the statistical hypothesis used to determine whether the two samples'
means calculated are different if the standard deviation is available and the sample is large.
In contrast, the T-test determines how averages of different data sets differ in case the
standard deviation or the variance is unknown.
29. What are the types of z-test?
There are following different types of Z-tests which are used to perform different types of
hypothesis testing.
One-sample Z-test for means.
Two-sample Z-test for means.
One sample Z-test for proportion.
Two sample Z-test for proportions.
14
32. What Is Decision And Example?
The Act Of Or Need For Making Up One's Mind: this is a difficult decision. Something
that is decided; resolution: she made a poor decision when she dropped out of school.
33. What are the 4 types of decisions?
Directive.
Conceptual.
Behavioral.
34. What is Data Interpretation?
Data interpretation is the process of reviewing data through some predefined processes
which will help assign some meaning to the data and arrive at a relevant conclusion. It involves
taking the result of data analysis, making inferences on the relations studied, and using them to
conclude.
Data interpretation methods are how analysts help people make sense of numerical
data that has been collected, analyzed and presented. Data, when collected in raw form, may be
difficult for the layman to understand, which is why analysts need to break down the information
gathered so that others can make sense of it.
For example, when founders are pitching to potential investors, they must interpret data (e.g.
market size, growth rate, etc.) for better understanding. There are 2 main methods in which this
can be done, namely; quantitative methods and qualitative methods.
15
Step 3: Cleaning the data.
Step 4: Interpreting the data.
Step 5: Sharing the results.
38. What is a one-tailed test?
A one-tailed test is a statistical test in which the critical area of a distribution is one-sided
so that it is either greater than or less than a certain value, but not both. If the sample being
tested falls into the one-sided critical area, the alternative hypothesis will be accepted instead of
the null hypothesis.
39. What is a two-tailed test example?
For example, let's say you were running a z test with an alpha level of 5% (0.05). In a one
tailed test, the entire 5% would be in a single tail. But with a two tailed test, that 5% is split
between the two tails, giving you 2.5% (0.025) in each tail.
40. When would you use a one-tailed test?
While you gain more statistical power in one direction, the test has absolutely no power in
the other direction. Suppose you are testing a new vaccine and want to determine whether it's
better than the current vaccine. You use a one-tailed test to improve the test's ability to learn
whether the new vaccine is better.
41. What is meant by estimation in statistics?
Estimation is concerned with inference about the numerical value of unknown
population values from incomplete data such as a sample. Context: If a single figure is
calculated for each unknown parameter the process is called point estimation
42. What is the best definition of estimation?
the act of estimating something. : the value, amount, or size arrived at in an estimate.
esteem, honor
43. What are the different confidence levels?
The most common confidence levels are 90%, 95% and 99%. The following table
contains a summary of the values of corresponding to these common confidence levels. (Note
that the"confidence coefficient" is merely the confidence level reported as a proportion rather
than as a percentage.)
44. What are the 3 types of confidence?
Here are the 3 types of confidence one can have:
Self-Centered Confidence.
Perfection-Seeking Confidence.
Faith-filled Confidence.
16
PART B
16-marks
Now, try to understand what a sample and a population are, with the help of suitable examples.
Population Sample
All residents of a country would constitute the All residents who live above the
Population set poverty line would be the Sample
All residents above the poverty line in a country All residents who are millionaires
would be the Population would make up the Sample
1. State your research hypothesis as a null hypothesis and alternate hypothesis (Ho)
and (Ha or H1).
2.
Collect data in a way designed to test the hypothesis.
3.
Perform an appropriate statistical test.
4.
Decide whether to reject or fail to reject your null hypothesis.
5.
Present the findings in your results and discussion section.
17
Step 1: State your null and alternate hypothesis
After developing your initial research hypothesis (the prediction that you want to
investigate), it is important to restate it as a null (Ho) and alternate (Ha) hypothesis so that you
can test it mathematically.
The alternate hypothesis is usually your initial hypothesis that predicts a relationship between
variables. The null hypothesis is a prediction of no relationship between the variables you are
interested in.
For a statistical test to be valid, it is important to perform sampling and collect data in a
way that is designed to test your hypothesis. If your data are not representative, then you
cannot make statistical inferences about the population you are interested in.
There are a variety of statistical tests available, but they are all based on the comparison
of within-group variance (how spread out the data is within a category) versus between-group
variance (how different the categories are from one another).
If the between-group variance is large enough that there is little or no overlap between
groups, then your statistical test will reflect that by showing a low p-value. This means it is
unlikely that the differences between these groups came about by chance.
18
Alternatively, if there is high within-group variance and low between-group variance, then
your statistical test will reflect that with a high p-value. This means it is likely that any difference
you measure between groups is due to chance.
Your choice of statistical test will be based on the type of variables and the level of
measurement of your collected data.
a p-value showing how likely you are to see this difference if the null hypothesis of
no difference is true.
Your t-test shows an average height of 175.4 cm for men and an average height of 161.7 cm for
women, with an estimate of the true difference ranging from 10.2 cm to infinity. The p-value is
0.002.
Based on the outcome of your statistical test, you will have to decide whether to reject or
fail to reject your null hypothesis.
In most cases you will use the p-value generated by your statistical test to guide your
decision. And in most cases, your predetermined level of significance for rejecting the null
hypothesis will be 0.05 – that is, when there is a less than 5% chance that you would see these
results if the null hypothesis were true.
The results of hypothesis testing will be presented in the results and discussion
sections of your research paper, dissertation or thesis.
19
In the results section you should give a brief summary of the data and a summary of the
results of your statistical test (for example, the estimated difference between group means and
associated p-value). In the discussion, you can discuss whether your initial hypothesis was
supported by your results or not.
ylang-ylang are commonly touted as anxiety remedies. Perhaps you'd like to test the healing
powers of peppermint essential oil. Your hypothesis might go something like this:
Null hypothesis - Peppermint essential oil has no effect on the pangs of anxiety.
Significance level - The significance level is 0.25 (allowing for a better shot at proving
Conclusion - After providing one group with peppermint oil and the other with a
placebo, you gauge the difference between the two based on self-reported levels of
anxiety. Based on your calculations, the difference between the two groups is
statistically significant with a p-value of 0.05, well below the defined alpha of 0.25. You
conclude that your study supports the alternative hypothesis that peppermint essential
20
4.what is mean by z-test and its types?
A z test is a test that is used to check if the means of two populations are different or not
provided the data follows a normal distribution. For this purpose, the null hypothesis and the
alternative hypothesis must be set up and the value of the z test statistic must be calculated. The
decision criterion is based on the z critical value.
Z Test Definition
Z Test Formula
The z test formula compares the z statistic with the z critical value to test whether there is
a difference in the means of two populations. In hypothesis testing, the z critical value divides
the distribution graph into the acceptance and the rejection regions. If the test statistic falls in the
rejection region then the null hypothesis can be rejected otherwise it cannot be rejected. The z
test formula to set up the required hypothesis tests for a one sample and a two-sample z test are
given below.
One-Sample Z Test
The algorithm to set a one sample z test based on the z test statistic is given as follows:
μ<μ0μ<μ0
21
Decision Criteria: If the z statistic < z critical value then reject the null hypothesis.
μ>μ0μ>μ0
Decision Criteria: If the z statistic > z critical value then reject the null hypothesis.
μ≠μ0μ≠μ0
Decision Criteria: If the z statistic > z critical value then reject the null hypothesis.
A two sample z test is used to check if there is a difference between the means of two
samples. The z test statistic formula is given as follows:
The two-sample z test can be set up in the same way as the one-sample test. However, this test
will be used to compare the means of the two samples. For example, the null hypothesis is given
as H0H0 : μ1=μ2μ1=μ2.
22
Figure 2 : Rejection Region
A z test for proportions is used to check the difference in proportions. A z test can either
be used for one proportion or two proportions. The formulas are given as follows.
A one proportion z test is used when there are two groups and compares the value of an
observed proportion to a theoretical one. The z test statistic for a one proportion z test is
given as follows:
z = p−p0√p0(1−p0)np−p0p0(1−p0)n. Here, p is the observed value of the
proportion, p0p0 is the theoretical proportion value and n is the sample size.
The null hypothesis is that the two proportions are the same while the alternative
hypothesis is that they are not the same.
A two proportion z test is conducted on two proportions to check if they are the same or
not. The test statistic formula is given as follows:
z =p1−p2−0√p(1−p)(1n1+1n2)p1−p2−0p(1−p)(1n1+1n2)
where p = x1+x2n1+n2x1+x2n1+n2
23
p1p1 is the proportion of sample 1 with sample size n1n1 and x1x1 number of trials.
p2p2 is the proportion of sample 2 with sample size n2n2 and x2x2 number of trials.
Both z test and t-test are univariate tests used on the means of two datasets. The
differences between both tests are outlined in the table given below:
Z Test T-Test
The sample size is greater than or equal The sample size is lesser than 30.
to 30.
The data follows a normal distribution. The data follows a student-t distribution.
H1 (Alternate Hypothesis): μ ≠ 400 (Not equal means either μ > 400 or μ < 400 Hence it
24
σ = 25 (Population Standard Deviation)
n = 50 (Sample size)
Step 2:
but calculated z (-42.42) < -1.96 which mean reject the null hypothesis
25
Steps
Follow these simple steps in solving this question. Leave a question at the left of this page if you
hava any.
Using the data given in the equation we would have the following:
μ0 = 100
σ = 15
n = 30
x̄ n = 140
Step 4: Look up the values of z ( called the critical value) from statistical tables.
You can access statistical table from here. Statistical TableFrom the table: we get a value
of 1.96
26
8.Explain about z-test ? explain and its type ?
Hypothesis Testing
A hypothesis is an educated guess/claim about a particular property of an object. Hypothesis
testing is a way to validate the claim of an experiment.
Null Hypothesis: The null hypothesis is a statement that the value of a population
parameter (such as proportion, mean, or standard deviation) is equal to some claimed
value. We either reject or fail to reject the null hypothesis. Null Hypothesis is denoted by H
0.
Alternate Hypothesis: The alternative hypothesis is the statement that the parameter has
a value that is different from the claimed value. It is denoted by HA.
Level of significance: It means the degree of significance in which we accept or reject the null-
hypothesis. Since in most of the experiments 100% accuracy is not possible for accepting or
rejecting a hypothesis, so we, therefore, select a level of significance. It is denoted by alpha
(𝖺).
Steps to perform Z-test:
First, identify the null and alternate hypotheses.
Determine the level of significance (𝖺).
Find the critical value of z in the z-test using
Calculate the z-test statistics. Below is the formula for calculating the z-test statistics.
where,
X¯ : mean of the sample.
27
Mu: mean of the population.
28
Sd: Standard deviation of the population.
n: sample size.
Now compare with the hypothesis and decide whether to reject or not to reject the null
hypothesis
Type of Z-test
Left-tailed Test: In this test, our region of rejection is located to the extreme left of the
distribution. Here our null hypothesis is that the claimed value is less than or equal to the
mean population value.
Right-tailed Test: In this test, our region of rejection is located to the extreme right of the
distribution. Here our null hypothesis is that the claimed value is less than or equal to the
mean population value.
29
Two-tailed test: In this test, our region of rejection is located to both extremes of the
distribution. Here our null hypothesis is that the claimed value is equal to the mean population
value.
The F-Test helps to determine the overall significance of the regression. It is useful in
various situations, such as when a quality controller wants to determine whether the product’s
quality is deteriorating over time. In addition, it might be useful for an economist to determine
F-test in statistics helps to decide whether two populations’ variances are equal. This is
the variance ratio test because it calculates the ratio of variances. The goal of the test is to
determine whether the variance in two populations is equal. It was propounded by British
polymath R.A. Fisher and named to honor him. G.W. Snedecor later developed the test.
The following conditions are critical for using the F-test to compare the variances of two
populations:
30
2. Independent and random selection of sample items: the selection of the samples’
components should be independent and random.
3. More than unity: The variance ratio must be one or larger than one; it cannot be less than one.
When dividing variance estimates, smaller estimates divide the larger estimates of variances.
4. The additive property states that the total of different variance components will equal the total
variance, i.e., the total variance is the sum of the variance between samples and the variance
within samples.
Formula
1. Sample variances: The formula for calculating sample variances is as follows (an online F-
test calculator can make it easier):
2. Null hypothesis: After the formation of the test, the null hypothesis are either
a) Two samples were from the same group or
3. To compute the variance ratio, use the formula F = larger estimate divided by a smaller
estimate of variance. Regardless of whether S12 or S22, the numerator will always be the larger
value.
4. When calculating degrees of freedom, the larger the sample’s variance is V1; the smaller
variance is V2.
5. Table value of F: the critical value of F is available from the “F-Table” (F-test table) at the
determined significance level.
6. Analysis: This involves the comparison of the computed value and the tabulated value. For
various levels of significance, there are several F Tables (F-test tables).
(a) The variance ratio is insignificant if F < OR = F0.5. We can assume that the values are from
the same group or groups with similar variances.
(b) The null hypothesis is rejected, and the variance ratio is considered significant if F> OR =
F0.5.
31
10. Consider the example of the population in a village:
Village A B
Sample size 10 12
Testing the equality of sample variances with a significance level of 5% with the above-given
date.
And
F= S1 2/ S2 2 = 10.22/10=1.022
The critical value for v1 (10-1) = 9 and v2 (12-1) =11 and the table value of F at 5%
significance level = 2.90. An online f-test calculator can help you in making the easier calculation
.
11. A teacher claims that the mean score of students in his class is greater than 82 with a
standard deviation of 20. If a sample of 81 students was selected with a mean score of 90 then
check if there is enough evidence to support this claim at a 0.05 significance level.
Solution: As the sample size is 81 and population standard deviation is known, this is
an example of a right-tailed one-sample z test.
H0H0 : μ=82μ=82
H1H1 : μ>82μ>82
32
xx= 90, μμ = 82, n = 81, σσ = 20
z = 3.6
As 3.6 > 1.645 thus, the null hypothesis is rejected and it is concluded that there is enough
evidence to support the teacher's claim.
Early in the HIV epidemic, there was poor knowledge of HIV transmission risks among health
care staff. A short training was developed to improve knowledge and attitudes around HIV
disease. Was the training effective in improving knowledge?
The raw data for this comparison is shown in the next table.
1 17 22 5
2 17 21 4
3 15 21 6
4 19 26 7
5 18 20 2
6 14 14 0
7 27 31 4
8 20 18 -2
9 12 22 10
33
10 21 20 -1
11 20 27 7
12 24 23 -1
13 17 15 -2
14 17 24 7
15 17 24 7
p-value= 0.003634
The strategy is to calculate the pre-/post- difference in knowledge score for each person and
determine whether the mean difference=0.
H0: μd = 0
H1: μd ≠ 0
34