Introduction To Statistics (4485) : Semester: Spring, 2023

Introduction to Statistics (4485)
Semester: Spring, 2023
ASSIGNMENT NO: 02
Name : Maqbool Ahmed

Registration No : 0000347172
Level: BS Pakistan Studies
Allama Iqbal Open University Islamabad

Q. 1 (a) Discuss the different measures of dispersion. Also
indicate their merits and demerits.
(b) Calculate variance and coefficient of variation from the following

data:
Ans: (a) Discuss the different measures of dispersion. Also indicate their
merits and demerits.
Measures of dispersion, also known as measures of variability or spread, are

statistical measures that describe the extent to which data points in a dataset
deviate from the central tendency. They provide valuable insights into the
spread or distribution of data. Here are some commonly used measures of
dispersion:
Range:
Merits: It is the simplest measure of dispersion, calculated as the difference
between the maximum and minimum values in a dataset. It is easy to understand
and compute.
Demerits: It is sensitive to extreme values and does not consider the distribution
of data between the minimum and maximum values. Consequently, it may not
accurately represent the overall variability.
Interquartile Range (IQR):

Merits: It is a robust measure of dispersion that overcomes the limitations of the
range. It considers the spread of the middle 50% of the data, excluding outliers
or extreme values.
Demerits: It does not take into account the entire distribution of the data,
neglecting information about the lower and upper tails.
Variance:
Merits: It considers the squared deviations of each data point from the mean,
providing a measure of average dispersion. It is widely used in statistical
analysis and has important theoretical properties.
Demerits: The variance is not in the original units of the data, making it less
interpretable. Additionally, it amplifies the effect of outliers due to squaring the
deviations.
Standard Deviation:
Merits: It is the square root of the variance and is expressed in the original units
of the data. It is widely used, easily interpretable, and considered one of the
most important measures of dispersion.
Demerits: Like the variance, the standard deviation is sensitive to outliers due to
the squaring of deviations.
Mean Absolute Deviation (MAD):

Merits: It measures the average absolute deviation of each data point from the
mean, providing a robust measure of dispersion. It avoids squaring the
deviations, which reduces the influence of outliers.
Demerits: It can be computationally complex compared to the range, variance,
or standard deviation. MAD does not have the same theoretical properties as
variance or standard deviation.
Coefficient of Variation (CV):

Merits: It is a relative measure of dispersion that compares the standard
deviation to the mean, allowing for comparison between datasets with different
scales.
Demerits: It is not suitable for datasets with zero or very small means, as it leads
to undefined or inflated values.
It's important to choose an appropriate measure of dispersion based on the

nature of the data and the research objectives. Different measures have their
advantages and disadvantages, and the selection depends on the specific context
and requirements of the analysis.
(b) Calculate variance and coefficient of variation from the following

data:
Groups 35-39 40-44 45-49 50-54 55-59 60-64 65-69

f 13 15 17 28 12 10 5
To calculate the variance and coefficient of variation, we first need to compute

the mean and the squared deviations for each group. Let's perform the
calculations step by step.
Step 1: Calculate the mean (average)
To find the mean, we multiply each group frequency (f) by its respective group
midpoint (the average of the lower and upper limits of the group) and sum up
these values. Then we divide by the total frequency (sum of all f values).
Group midpoint (x) is calculated by adding the lower and upper limits of each
group and dividing by 2.
Groups 35-39 40-44 45-49 50-54 55-59 60-64 65-69

f 13 15 17 28 12 10 5
x 37 42 47 52 57 62 67
The mean (μ) is calculated as follows:
μ = (Σ(f * x)) / Σf
= (13 * 37 + 15 * 42 + 17 * 47 + 28 * 52 + 12 * 57 + 10 * 62 + 5 * 67) / (13 +
15 + 17 + 28 + 12 + 10 + 5)
= (481 + 630 + 799 + 1456 + 684 + 620 + 335) / 100
= 5005 / 100
= 50.05
Step 2: Calculate the squared deviations
The squared deviation for each group is calculated by subtracting the mean from
the group midpoint (x) and squaring the result. Then, we multiply the squared
deviation by the frequency (f) for each group.
Squared deviation (d^2) = (x - μ)^2 * f
Groups 35-39 40-44 45-49 50-54 55-59 60-64 65-69

f 13 15 17 28 12 10 5
x 37 42 47 52 57 62 67
d^2 174.24 48.04 24.01 5.76 47.61 144.04 245.05
Step 3: Calculate the variance
The variance (σ^2) is obtained by summing up all the squared deviations and
dividing by the total frequency.
σ^2 = Σd^2 / Σf
= (174.24 + 48.04 + 24.01 + 5.76 + 47.61 + 144.04 + 245.05) / (13 + 15 + 17
+ 28 + 12 + 10 + 5)
= 688.75 / 100
= 6.888
Step 4: Calculate the coefficient of variation
The coefficient of variation (CV) is the ratio of the standard deviation to the
mean, expressed as a percentage.
CV = (σ / μ) * 100
First, let's calculate the standard deviation (σ), which is the square root of the
variance.
σ = √σ^2
= √6.888
≈ 2.622
Now we can calculate the coefficient of variation.
CV = (σ / μ) * 100
= (2.622 / 50.05)
Q. 2 (a) what is a linear regression model? Explain the
assumptions underlying the linear regression model.
(b) Define the term correlation. Find the correlation coefficient between
X and Y, given
X 78 89 97 69 59 79 68 61
Y 125 137 156 11 107 136 123 108
2
Ans: (a) What is a linear regression model? Explain the assumptions

underlying the linear regression model.
A linear regression model is a statistical model that aims to establish a linear

relationship between a dependent variable and one or more independent
variables. It assumes that there is a linear association between the independent
variables (also known as predictor variables, features, or regressors) and the
dependent variable (also known as the target variable or response variable). The
model attempts to find the best-fitting line, known as the regression line, that
minimizes the overall difference between the predicted values and the actual
values of the dependent variable.
Assumptions underlying the linear regression model:
Linearity:
The relationship between the independent variables and the dependent variable
is assumed to be linear. This means that the change in the dependent variable is
proportional to the change in the independent variables.
Independence:
The observations used to build the model are assumed to be independent of each
other. This assumption is important because if observations are not independent,
it can lead to biased or inefficient estimates.
Homoscedasticity:
The variance of the errors (residuals) should be constant across all levels of the
independent variables. In other words, the spread of the residuals should be
consistent throughout the range of the predictor variables.
No or little multicollinearity:
The independent variables should not be highly correlated with each other. High
multicollinearity can make it difficult to determine the individual effects of the
independent variables on the dependent variable.
Normality of residuals:
The residuals are assumed to follow a normal distribution. This assumption is
necessary for conducting hypothesis tests, constructing confidence intervals,
and obtaining reliable statistical measures such as p-values.
No endogeneity:
There should be no relationship between the errors (residuals) and the
independent variables. Endogeneity occurs when there is a two-way causal
relationship between the dependent variable and one or more of the independent
variables, which can lead to biased and inconsistent estimates.
It is important to note that these assumptions provide a foundation for the

classical linear regression model. In practice, violations of these assumptions
can be addressed through various techniques such as transformations of
variables, robust regression methods, or using alternative models if the
assumptions are severely violated.
(b) Define the term correlation. Find the correlation coefficient between
X and Y, given
X 78 89 97 69 59 79 68 61
Y 125 137 156 11 107 136 123 108
2
In statistics, correlation refers to the statistical relationship or association

between two variables. It measures the strength and direction of the linear
relationship between the variables. The correlation coefficient is a numerical
value that quantifies the degree of correlation between two variables.
To find the correlation coefficient between variables X and Y, we can use the
Pearson correlation coefficient formula. The Pearson correlation coefficient,
often denoted as "r," ranges from -1 to 1. A positive value indicates a positive
correlation, a negative value indicates a negative correlation, and a value close
to 0 suggests no significant linear correlation.
Let's calculate the correlation coefficient between X and Y using the given data:
X: 78, 89, 97, 69, 59, 79, 68, 61

Y: 125, 137, 156, 112, 107, 136, 123, 108
First, we need to calculate the means (average) of X and Y:
Mean of X (x̄ ):
x̄ = (78 + 89 + 97 + 69 + 59 + 79 + 68 + 61) / 8
= 600 / 8
= 75
Mean of Y (ȳ):
ȳ = (125 + 137 + 156 + 112 + 107 + 136 + 123 + 108) / 8
= 1004 / 8
= 125.5
Next, we calculate the deviations from the means for each value of X and Y:
Deviation of X (x - x̄ ):
78 - 75 = 3
89 - 75 = 14
97 - 75 = 22
69 - 75 = -6
59 - 75 = -16
79 - 75 = 4
68 - 75 = -7
61 - 75 = -14
Deviation of Y (y - ȳ):
125 - 125.5 = -0.5
137 - 125.5 = 11.5
156 - 125.5 = 30.5
112 - 125.5 = -13.5
107 - 125.5 = -18.5
136 - 125.5 = 10.5
123 - 125.5 = -2.5
108 - 125.5 = -17.5
Now, we multiply the deviations of X and Y for each data point:
(3)(-0.5) = -1.5
(14)(11.5) = 161
(22)(30.5) = 671
(-6)(-13.5) = 81
(-16)(-18.5) = 296
(4)(10.5) = 42
(-7)(-2.5) = 17.5
(-14)(-17.5) = 245
We sum up these products:
-1.5 + 161 + 671 + 81 + 296 + 42 + 17.5 + 245 = 1513
Next, we calculate the squared deviations of X and Y:
(3)^2 = 9
(14)^2 = 196
(22)^2 = 484
(-6)^2 = 36
(-16)^2 = 256
(4)^2 = 16
(-7)^2 = 49
(-14)^2 = 196
Sum of squared deviations of X:
9 + 196 + 484 + 36 + 256 + 16 + 49 + 196 = 1242

Sum of
Q. 3 (a) Define the terms: experiment, outcome, event, sample
space, simple and compound events, impossible events, sure events
and mutually exclusive events.
(b) How many dice must be thrown so that the probability of obtaining
at least one 6 is at least 0.99?
(c) From a group of 6 men and 8 women, 5 people are chosen at random.
Find the probability that there are more men chosen than women.
Ans: (a) Here are the definitions of the terms you mentioned:
1. Experiment: An experiment refers to a planned activity or process that

produces an observable outcome. It is a controlled procedure designed to gather
data or test a hypothesis.
2. Outcome: An outcome is a possible result or consequence that can occur as a

result of an experiment. It represents a specific observation or measurement that
can be observed or recorded.
3. Event: An event is a subset of the sample space, which is a collection of all

possible outcomes of an experiment. It represents a specific set of outcomes or a
combination of outcomes that we are interested in studying or analyzing.
4. Sample Space: The sample space is the set of all possible outcomes of an
experiment. It includes every possible result that can occur. It is denoted by the
symbol Ω or S.
5. Simple Event: A simple event is an event that consists of a single outcome or

a single element of the sample space. It cannot be further broken down into
smaller events.
6. Compound Event: A compound event is an event that consists of two or more

simple events. It is formed by combining simple events using logical operations
such as "and" (intersection), "or" (union), or "not" (complement).
7. Impossible Event: An impossible event is an event that cannot occur under

any circumstances. It has no elements in the sample space and is denoted by the
empty set (∅).
8. Sure Event: A sure event, also known as a certain event, is an event that is
guaranteed to occur. It includes the entire sample space and represents the
outcome that is certain to happen. It is denoted by the symbol Ω or S.
9. Mutually Exclusive Events: Mutually exclusive events are events that cannot
occur simultaneously. If one event happens, the other event(s) cannot occur at
the same time. In other words, the occurrence of one event excludes the
occurrence of the other event(s).
(b) How many dice must be thrown so that the probability of obtaining
at least one 6 is at least 0.99?
To calculate the number of dice required to obtain at least one 6 with a

probability of at least 0.99, we can use the concept of complementary
probability.
The probability of not rolling a 6 on a single die is 5/6, as there are 5 possible
outcomes (1, 2, 3, 4, or 5) out of 6 total outcomes.
If we throw n dice, the probability of not rolling a 6 on any of them is (5/6)^n.
The complementary probability, which is the probability of obtaining at least

one 6, is equal to 1 minus the probability of not rolling a 6:
1 - (5/6)^n ≥ 0.99
Simplifying the equation:
(5/6)^n ≤ 0.01
To solve for n, we can take the logarithm of both sides:
n * log(5/6) ≤ log(0.01)
n ≤ log(0.01) / log(5/6)
Using a calculator, we can compute:
n ≤ 25.843
Since we cannot have a fraction of a die, the minimum number of dice required
is 26 to ensure a probability of at least 0.99 of obtaining at least one 6.
(c) From a group of 6 men and 8 women, 5 people are chosen at random.
Find the probability that there are more men chosen than women.
To find the probability that there are more men chosen than women from a
group of 6 men and 8 women when 5 people are chosen at random, we need to
consider the different possible scenarios.
There are two cases to consider:
Case 1: Selecting 3 men and 2 women.

Case 2: Selecting 4 men and 1 woman.
Case 3: Selecting all 5 men.
Let's calculate the probability for each case:
Case 1: Selecting 3 men and 2 women:

The number of ways to choose 3 men from 6 men is given by the combination
formula: C(6, 3) = 6! / (3! * (6 - 3)!) = 20.
Similarly, the number of ways to choose 2 women from 8 women is given by
the combination formula: C(8, 2) = 8! / (2! * (8 - 2)!) = 28.
The total number of ways to choose 5 people from the total group of 14 (6 men
+ 8 women) is given by the combination formula: C(14, 5) = 14! / (5! * (14 -
5)!) = 2002.
Therefore, the probability of selecting 3 men and 2 women is: (20 * 28) / 2002
= 560 / 2002 = 0.2797 (approximately).
Case 2: Selecting 4 men and 1 woman:
formula: C(6, 4) = 6! / (4! * (6 - 4)!) = 15.
Similarly, the number of ways to choose 1 woman from 8 women is given by
the combination formula: C(8, 1) = 8! / (1! * (8 - 1)!) = 8.
Therefore, the probability of selecting 4 men and 1 woman is: (15 * 8) / 2002 =
120 / 2002 = 0.0599 (approximately).
Case 3: Selecting all 5 men:

formula: C(6, 5) = 6! / (5! * (6 - 5)!) = 6.
Therefore, the probability of selecting all 5 men is: 6 / 2002 = 0.003

(approximately).
To find the probability that there are more men chosen than women, we sum up
the probabilities from all three cases:
0.2797 + 0.0599 + 0.003 = 0.3426 (approximately).
So, the probability that there are more men chosen than women is
approximately 0.3426 or 34.26%.
Q. 4 (a) Discuss the common types of sampling techniques used

to gather information and the advantages and disadvantages of
each of these techniques.
(b) What is a sampling distribution? Describe the properties of the
sampling distribution of the means.
Ans: Sampling techniques are used in research and data collection to select a
subset of individuals or elements from a larger population. Each sampling
technique has its own advantages and disadvantages, which should be carefully
considered to ensure the reliability and validity of the collected information.
Here are some common types of sampling techniques and their respective pros
and cons:
Simple Random Sampling:

Advantages: Every individual or element in the population has an equal chance
of being selected, ensuring unbiased representation. It is easy to understand and
implement.
- Disadvantages: It may not be suitable for large populations, as it can be
time-consuming and costly. It may also result in a sample that does not fully
represent the population's diversity.
Stratified Sampling:
Advantages: The population is divided into homogeneous groups (strata), and a
proportional number of individuals are selected from each stratum. It ensures
representation from all subgroups, resulting in more precise estimates for each
subgroup.
Disadvantages: Proper stratification requires accurate prior knowledge about the
population characteristics, which may not always be available. It can be
challenging to classify individuals into mutually exclusive strata.
Cluster Sampling:
Advantages: The population is divided into clusters (e.g., geographical areas),
and a random selection of clusters is made. It is cost-effective and practical
when it is difficult to obtain a complete list of the population.Disadvantages:
There is a risk of high within-cluster similarity, which may reduce the
representativeness. The selection of clusters and subsequent sampling within
clusters can introduce bias.
Systematic Sampling:
Advantages: Every nth individual is selected from a population list after a
random start. It is less time-consuming than simple random sampling and can
still provide a representative sample if the list is randomized.
Disadvantages: If there is a pattern or periodicity in the list, systematic sampling
may introduce bias. It may also miss certain population characteristics if they
are related to the periodicity.
Convenience Sampling:
Advantages: It is convenient and easily accessible, requiring minimal effort. It
can be useful for exploratory or qualitative research when generalizability is not
a primary concern.
Disadvantages: The sample may not be representative of the population, as
individuals are selected based on their availability or accessibility. There is a
high risk of selection bias, limiting the generalizability of the findings.
Snowball Sampling:
Advantages: It is useful when the population is hard to reach or hidden, such as
marginalized or stigmatized groups. Existing participants refer additional
participants, creating a network of connections.
Disadvantages: It may lead to biased samples, as the initial participants may not
accurately represent the target population. The sample size may also be small,
limiting the generalizability of the findings.
These are just a few examples of sampling techniques, and researchers should
choose the most appropriate method based on their research objectives,
available resources, and population characteristics. It is important to
acknowledge and address the limitations and potential biases associated with
each technique to ensure the reliability and validity of the collected information.
(b) What is a sampling distribution? Describe the properties of the

sampling distribution of the means.
In statistics, a sampling distribution refers to the distribution of a sample

statistic, such as the mean or proportion, obtained from repeated random
sampling of a population. It provides valuable insights into the behavior of the
sample statistic and helps make inferences about the population parameter.
The sampling distribution of the means, also known as the sampling distribution
of the sample mean, specifically focuses on the distribution of sample means.
Here are some key properties of the sampling distribution of the means:
1. Central Limit Theorem: The sampling distribution of the means tends to

follow an approximately normal distribution, regardless of the shape of the
population distribution, as long as the sample size is sufficiently large (typically
n ≥ 30). This is known as the central limit theorem.
2. Mean: The mean of the sampling distribution of the means is equal to the
population mean. In other words, the average of all possible sample means is
the same as the mean of the population.
3. Standard Deviation: The standard deviation of the sampling distribution of

the means, often referred to as the standard error, is equal to the population
standard deviation divided by the square root of the sample size.
Mathematically, it can be represented as σ/√n, where σ is the population
standard deviation and n is the sample size. As the sample size increases, the
standard deviation of the sampling distribution decreases, indicating greater
precision in estimating the population mean.
4. Shape: As mentioned earlier, for large sample sizes, the sampling distribution
of the means approaches a normal distribution, even if the population
distribution is not normally distributed. For smaller sample sizes, the
distribution may still exhibit some skewness, but it becomes more symmetrical
and bell-shaped as the sample size increases.
5. Independence: The samples drawn for the calculation of sample means are
assumed to be independent of each other. This assumption is usually met when
samples are collected through simple random sampling or other randomization
methods.
The properties of the sampling distribution of the means play a crucial role in
statistical inference. They allow us to make statements about the likelihood of
observing certain sample means and help us estimate population parameters
with a certain level of confidence.
Q. 5 (a) Explain how you test the hypothesis on proportions?
(b) A thousand households are taken at random and divides into three
groups A, B and C, according to the total monthly income. The following
table shows the numbers in each group having a colour television receiver,
a black and white receiver, or no television at all.
Attributes A B C
Colour TV 56 51 93
Black & White 118 207 375
TV
None 26 42 32
Test the hypothesis that there is no association between total income
and television ownership.
Ans: (a) Explain how you test the hypothesis on proportions
Testing hypotheses about proportions involves determining whether a sample

proportion significantly differs from a hypothesized population proportion. This
type of hypothesis testing is commonly performed when dealing with
categorical data.
Here's a step-by-step explanation of how to test a hypothesis on proportions:
State the null and alternative hypotheses:

Begin by clearly stating the null hypothesis (H₀) and the alternative hypothesis
(H₁). The null hypothesis typically assumes that there is no significant
difference or effect, while the alternative hypothesis suggests that there is a
significant difference.
For example:
- H₀: The population proportion is equal to a specific value.
- H₁: The population proportion is not equal to the specific value.
Choose a significance level:

Select the desired significance level (α), which determines the threshold for
rejecting the null hypothesis. Commonly used significance levels are 0.05 (5%)
or 0.01 (1%).
Collect and analyze the data:

Obtain a random sample from the population of interest and gather the relevant
categorical data. Count the number of successes (e.g., favorable outcomes) and
calculate the sample proportion (p̂ ) by dividing the number of successes by the
sample size.
Verify the conditions:

Ensure that the conditions for performing the hypothesis test are met. The data
should be collected through a random sampling method, and the sample size
should be large enough to satisfy the requirements for approximating a normal
distribution (at least 10 successes and 10 failures).
Compute the test statistic:

Calculate the test statistic based on the sample proportion. The most common
test statistic for proportions is the z-score, which follows a standard normal
distribution under the null hypothesis.
The formula for the z-score is:

z = (p̂ - P₀) / sqrt[(P₀ * (1 - P₀)) / n]
Where p̂ is the sample proportion, P₀ is the hypothesized population proportion
under the null hypothesis, and n is the sample size.
Determine the critical region and p-value:
Based on the chosen significance level (α) and the alternative hypothesis, find
the critical value(s) from the appropriate statistical table or use software.
Alternatively, you can calculate the p-value, which represents the probability of
observing a test statistic as extreme or more extreme than the one obtained,
assuming the null hypothesis is true.
Make a decision:
Compare the test statistic (or p-value) to the critical value(s). If the test statistic
falls within the critical region or the p-value is smaller than the significance
level, reject the null hypothesis in favor of the alternative hypothesis.
Otherwise, fail to reject the null hypothesis.
Interpret the results:

Provide a conclusion based on the decision made in the previous step. If the null
hypothesis is rejected, it suggests that there is sufficient evidence to support the
alternative hypothesis. If the null hypothesis is not rejected, it indicates that
there is insufficient evidence to conclude a significant difference from the
hypothesized population proportion.
It's important to note that this is a general outline of the process, and the specific
calculations or statistical tests may vary slightly depending on the context or
any additional assumptions made.
(b) A thousand households are taken at random and divides into three
groups A, B and C, according to the total monthly income.
The following table shows the numbers in each group having a colour television
receiver, a black and white receiver, or no television at all.
Attributes A B C
Colour TV 56 51 93
Black & White 118 207 375
TV
None 26 42 32
Test the hypothesis that there is no association between total income and
television ownership.
To test the hypothesis that there is no association between total income and
television ownership, we can use the chi-square test of independence. This test
assesses whether there is a significant relationship between two categorical
variables.
Let's set up the null and alternative hypotheses:
Null hypothesis (H0): There is no association between total income and

Alternative hypothesis (H1): There is an association between total income and
We can calculate the expected frequencies under the assumption of

independence. The expected frequency for each cell can be calculated as:
Expected frequency = (row total × column total) / grand total
Using the given data, we can set up the observed and expected frequency tables
as follows:
Observed Frequencies:
Attributes A B C Total
Colour TV 56 51 93 200
Black & White TV118 207 375 700
None 26 42 32 100
Total 200 300 500 1000
Expected Frequencies:
Attributes A B C Total
Colour TV (200 * 200) / 1000 = 40 (200 * 300) / 1000 = 60 (200 * 500) /
1000 = 100 200
Black & White TV(700 * 200) / 1000 = 140 (700 * 300) / 1000 = 210 (700 *
500) / 1000 = 350 700
None (100 * 200) / 1000 = 20 (100 * 300) / 1000 = 30 (100 * 500) / 1000 =
50 100
Total 200 300 500 1000
Now, we can calculate the chi-square test statistic:
χ^2 = Σ((Observed - Expected)^2 / Expected)
Using the observed and expected frequencies, we can calculate the chi-square
statistic:
χ^2 = ((56-40)^2 / 40) + ((51-60)^2 / 60) + ((93-100)^2 / 100) + ((118-140)^2 /

140) + ((207-210)^2 / 210) + ((375-350)^2 / 350) + ((26-20)^2 / 20) + ((42-
30)^2 / 30) + ((32-50)^2 / 50)
After performing the calculations, we find the chi-square statistic to be

approximately 13.31.
Next, we need to determine the degrees of freedom for the test. The degrees of
freedom can be calculated using the formula:
Degrees of freedom = (Number of rows - 1) * (Number of columns - 1)
In this case, the degrees of freedom would be (3 - 1) * (3 - 1) = 4.
We can now consult a chi-square distribution table or use statistical software to

find the critical value for the chi-square test with a significance level (α) of your
choice (e.g., 0.05) and degrees of freedom of 4.
Finally, we compare the calculated chi-square statistic to the critical value. If

the calculated chi-square statistic is greater than the critical value, we reject the
null hypothesis and conclude that there is an association between total income
and television ownership. If the calculated chi-square statistic is less than or
equal to the critical value, we fail to reject the null hypothesis and conclude that
there is no association between total income and television ownership.
Please note that the critical value and the final conclusion will depend on the
chosen significance level (α).

Introduction To Statistics (4485) : Semester: Spring, 2023

Uploaded by

Copyright:

Available Formats

Introduction To Statistics (4485) : Semester: Spring, 2023

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Statistics (4485) : Semester: Spring, 2023

Uploaded by

Copyright:

Available Formats

Introduction to Statistics (4485)

Semester: Spring, 2023

Name : Maqbool Ahmed

Allama Iqbal Open University Islamabad

(b) Calculate variance and coefficient of variation from the following

Measures of dispersion, also known as measures of variability or spread, are

Interquartile Range (IQR):

Mean Absolute Deviation (MAD):

Coefficient of Variation (CV):

It's important to choose an appropriate measure of dispersion based on the

(b) Calculate variance and coefficient of variation from the following

Groups 35-39 40-44 45-49 50-54 55-59 60-64 65-69

To calculate the variance and coefficient of variation, we first need to compute

Step 1: Calculate the mean (average)

Groups 35-39 40-44 45-49 50-54 55-59 60-64 65-69

The mean (μ) is calculated as follows:

Step 2: Calculate the squared deviations

Squared deviation (d^2) = (x - μ)^2 * f

Groups 35-39 40-44 45-49 50-54 55-59 60-64 65-69

Step 3: Calculate the variance

Step 4: Calculate the coefficient of variation

Now we can calculate the coefficient of variation.

Ans: (a) What is a linear regression model? Explain the assumptions

A linear regression model is a statistical model that aims to establish a linear

Assumptions underlying the linear regression model:

It is important to note that these assumptions provide a foundation for the

In statistics, correlation refers to the statistical relationship or association

X: 78, 89, 97, 69, 59, 79, 68, 61

First, we need to calculate the means (average) of X and Y:

We sum up these products:

-1.5 + 161 + 671 + 81 + 296 + 42 + 17.5 + 245 = 1513

Next, we calculate the squared deviations of X and Y:

Sum of squared deviations of X:

9 + 196 + 484 + 36 + 256 + 16 + 49 + 196 = 1242

1. Experiment: An experiment refers to a planned activity or process that

2. Outcome: An outcome is a possible result or consequence that can occur as a

3. Event: An event is a subset of the sample space, which is a collection of all

5. Simple Event: A simple event is an event that consists of a single outcome or

6. Compound Event: A compound event is an event that consists of two or more

7. Impossible Event: An impossible event is an event that cannot occur under

To calculate the number of dice required to obtain at least one 6 with a

If we throw n dice, the probability of not rolling a 6 on any of them is (5/6)^n.

The complementary probability, which is the probability of obtaining at least

Simplifying the equation:

To solve for n, we can take the logarithm of both sides:

Using a calculator, we can compute:

There are two cases to consider:

Case 1: Selecting 3 men and 2 women.

Let's calculate the probability for each case:

Case 1: Selecting 3 men and 2 women:

Case 3: Selecting all 5 men:

Therefore, the probability of selecting all 5 men is: 6 / 2002 = 0.003

0.2797 + 0.0599 + 0.003 = 0.3426 (approximately).

Q. 4 (a) Discuss the common types of sampling techniques used

Simple Random Sampling:

(b) What is a sampling distribution? Describe the properties of the

In statistics, a sampling distribution refers to the distribution of a sample

1. Central Limit Theorem: The sampling distribution of the means tends to

3. Standard Deviation: The standard deviation of the sampling distribution of