0% found this document useful (0 votes)
54 views20 pages

Group Assignment 1 - Stat

The document discusses key concepts in probability and statistics including sample space, outcomes, events, mutually exclusive and independent events, probability and non-probability sampling, standard deviation vs standard error, and permutation and combinations. It also provides examples and explanations of these concepts through experiments involving coins, dice and other scenarios.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views20 pages

Group Assignment 1 - Stat

The document discusses key concepts in probability and statistics including sample space, outcomes, events, mutually exclusive and independent events, probability and non-probability sampling, standard deviation vs standard error, and permutation and combinations. It also provides examples and explanations of these concepts through experiments involving coins, dice and other scenarios.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Group Assignment 1

1. Explanation

● Sample Space:
- The sample space refers to the set of all possible outcomes or results that
can occur in a particular experiment or situation. It essentially represents the
universe of potential events or scenarios that we are interested in studying or
analyzing.

● Outcome:
- An outcome is a specific result or occurrence that can happen within the
sample space. It is a single element or event that we observe or measure when
conducting an experiment or study. Outcomes collectively make up the sample
space.

● Event:
- An event is a subset of the sample space, consisting of one or more
outcomes. It represents a particular situation or condition of interest within the
broader set of possible outcomes. Events can be simple (involving a single
outcome) or compound (involving multiple outcomes).

● Mutually Exclusive Event:


- Mutually exclusive events are events that cannot occur simultaneously. If one
of these events happens, the other(s) cannot occur at the same time. For
example, when rolling a die, getting a 3 and getting a 5 are mutually exclusive
events because both cannot happen on a single roll.

● Independent Event:
- Independent events are events where the occurrence or non-occurrence of
one event does not affect the probability of the other event(s) happening. In
other words, the outcome of one event does not provide any information about
the likelihood of the other event(s).

● Sampling Frame:
- A sampling frame is a list or source from which a sample is drawn in a
research or survey. It represents the population or group under study and
provides a clear and accessible way to select individuals or items for inclusion in
the sample. A good sampling frame is essential for obtaining representative
results.

● Probability and Non-Probability Sampling:


- Probability Sampling:
- Probability sampling is a method of selecting a sample from a population in which
every
individual or item in the population has a known, non-zero chance of being included in
the sample. Common techniques include random sampling, stratified sampling, and
cluster sampling. Probability sampling methods aim to ensure that the sample is
representative of the population, making it suitable for statistical analysis.

- Non-Probability Sampling:
- Non-probability sampling, on the other hand, involves methods where not every
member of the population has a known chance of being included in the sample.
Instead, the selection of individuals or items is based on convenience, judgment, or
availability. Examples of non-probability sampling methods include convenience
sampling, purposive sampling, and snowball sampling. While non-probability sampling
can be quicker and less costly, it may introduce bias into the sample and may not be
suitable for making statistical inferences about the population.

2. Reasons for sampling


Sampling is done for various reasons, including:
1. Cost and time efficiency.
2. Practicality and resource constraints.
3. Accuracy and precision.
4. Risk reduction and quality control.
5. Privacy and confidentiality.
6. Logistical challenges.
7. Statistical analysis and experimentation.
It enables researchers to efficiently collect meaningful data from a specific segment of
the population.

3.Difference between standard deviation and standard error


The fundamental distinction between standard error and standard deviation lies in their
purpose and interpretation. While both metrics gauge variability in a dataset, they serve
distinct roles in statistical analysis.

Standard deviation measures the extent to which individual data points deviate from
the mean, providing a sense of how data is dispersed or spread out. It serves as an
indicator of data's inherent variability and is employed primarily for descriptive
purposes.
On the other hand, standard error is a measure of the precision or reliability of a sample
statistic, such as the sample mean. It indicates how much the sample mean is
expected to vary from the population mean if we were to draw multiple random
samples. Standard error is crucial when making statistical inferences, like constructing
confidence intervals or performing hypothesis tests, as it informs us about the likely
variation in sample estimates.
In essence, standard deviation reveals the spread within a single dataset, while
standard error assesses the spread of sample statistics across multiple samples. This
contrast illustrates their distinct roles and implications in statistical analysis.

4. Sample space
With replacement:
The sample space of the experiment where the second marble is drawn with
replacement is the set of all possible ordered pairs of marble colors, representing the
color of the marble drawn first, followed by the color of the marble drawn second.
There are 3×3=9 possible outcomes.
(Red drawn first)= (red, red), (red, green), (red, blue)
(Green drawn first)= (green, red), (green, green), (green, blue)
(Blue drawn first)= (blue, red), (blue, green), (blue, blue)
Without replacement:
The sample space of the experiment where the second marble is drawn without
replacement is also the set of all possible ordered pairs of marble colors, but with the
restriction that the second marble cannot be the same color as the first marble. This
reduces the number of possible outcomes to 3×2=6 possible outcomes.
(Red drawn first)= (red, green), (red, blue)
(Green drawn first)= (green, red), (green, blue)
(Blue drawn first)= (blue, red), (blue, green)

5.Coin Toss Experiment


The sample space of the experiment of tossing a coin three times is the collection of all
possible sequences of three coin tosses. There are eight possible outcomes in the
sample space, corresponding to all possible combinations of heads and tails.
There are 2×2×2=8 possible outcomes, listed as the following.
{HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}.
Event of having more heads than tails:
The event of having more heads than tails corresponds to the subset of the sample
space that contains all outcomes with two or three heads. This event has 4 elements,
as shown below: {HHH, HHT, HTH, THH}.

6.Dice Experiment(Set Theory)


EF: E = {(x1,x2) : x1 +x2 = odd}
F={(1,x2), (x1,1)}
-Therefore, EF = {(1,2), (1,4), (1,6), (2,1), (4,1), (6,1)}

E ∪ F:Let x1, be the number that appears on first dice and x2; be the number
that appears on second dice.
={(1,1), (1,2), (1,3), (1, 4), (1,5), (1,6), (2,1), (2,3), (2,5), (3,2), (3,4), (3.6), (4,1), (4,3), (4,5),
(5,2), (5, 4), (5,6), (6,1), (6,3), (6.5)}

FG: The event FG is the intersection of events F and G, which means that it occurs
when both events F and G occur. In this case, FG is the event that the first die lands on
1 and the sum of the dice is 7. This event has only one outcome: (1, 6).
EFc: The event EF c is the complement of event EF, which means that it occurs when
event EF does not occur. In this case, EF c is the event that the sum of the dice is not
odd or the first die does not land on 1, or both. This event has 34 outcomes.
EFG: The event EFG is the intersection of events E, F, and G, which means that it
occurs when all three events E, F, and G occur. In this case, EFG is the event that the
sum of the dice is odd, the first die lands on 1, and the sum is 7. This event has no
outcomes, because it is impossible for the sum of the dice to be both odd and 7.

7.Permutation and Combinations


a) Because each ranking corresponds to a particular ordered arrangement of the
10
people, we see the answer to this part is 10! = 3,628,800.
b) Because there are 4! possible rankings of the women among themselves and 6!
possible rankings of men among themselves.
(6!)(4!) = (720)(24) = 17,280 possible rankings in which the
women receive the top 4 scores. Hence, the desired probability is
6!4!/10! = 1/210

8.
a. All the four components are either I or 0, that is two states. Therefore, Total number
of Outcomes = 24 = 16.
b, Component 1 & 2 are both working (A) A = {(1, 1, 1, 1), (1, 1, 1, 0), (1, 1, 0, 1), (1, 1,0,0)}
Component 3 & 4 are both working (B) B = {(1, 1, 1, 1)( 1, 0, 1, 1), (0, 1, 1, 1), (0, 0, 1, 1)}
AUB= (1, 1, 1, 1), (1, 1, 1, 0) ,(1, 1,0,1),(1,0,0,0),(1,0,1,1),(0,1,1,1),(0,0,1,1)

9.
a, There are 5 possible boys
n(E)= 5.14!()
n(s) = 15!
P(4th position is boy) = 5.14!/15! = 5.14/15.14!= 5/15= 0.3333
b. n(E): 5.14!, n(s) = 15!
P(12th position is boy) = 5x14!/15!= 5/15= 0.3333

c. There is 1 way for it to be occupied by that particular boy.


P(Particular boy in 3rd position) = n (E)/n(S) = 1-14!/15! = 0.067

10.
P (3men & 2 women) = (choose 3 of 6 men) (Choose 2 of a women)/ (Select 5 from 15 total)

2036/3003 = 0.2398

11.
To find the conditional probability of an event A given that another event B has occurred, we
can use the formula:


P(A B)=P(A∩B)/P(B)

where P(A∩B) is the probability of both events happening together, and P(B) is the probability
of the event that has occurred.

Let F be the event that the student is female, and C be the event that the student is majoring in
computer science. Then, we have:

a) The conditional probability that this student is female, given that the student is majoring in
computer science, is:


P(F C)=P(C)P(F∩C)

We are given that P(F∩C)=0.02 and P(C)=0.05. Therefore,


P(F C)=0.050.02=0.4

This means that there is a 40% chance that the student is female, given that they are majoring
in computer science.
b) The conditional probability that this student is majoring in computer science, given that the
student is female, is:


P(C F)=P(F)P(C∩F)

We are given that P(C∩F)=0.02 and P(F)=0.52. Therefore,


P(C F)=0.520.02≈0.0385

This means that there is about a 3.85% chance that the student is majoring in computer
science, given that they are female.

12.
a, P(husband earns < 25,059) = (212 +36)/500 = 243/500 = 0.486
b, P(wife earns >25,000, husband > 25000) = 54/252 = 0.204
c. husband < 25,000 = 212 +36=248
P(wife earns > 25,000, husband < 25000) = 36/248 =0.145

13.
To answer this question, we need to use the binomial distribution, which models the number of
successes in a fixed number of independent trials. In this case, the success is the event that
the rainfall exceeds 50 inches in a year, and the number of trials is 4.

First, we need to find the probability of success in one trial, which is equivalent to finding the
probability that a normal random variable with mean 40 and standard deviation 4 is greater
than 50, this probability is approximately 0.0062.

Next, we need to find the probability that exactly 2 out of 4 trials are successful, which can be
done by using the binomial formula:
4 2 4-2
P(X=2) = ( 2)(P) (1-P)

where X is the number of successes, p is the probability of success, and (nk)is the binomial
coefficient. Plugging in the values, we get:

4 2 4-2
P(X=2) = ( 2)(0.0062) (1-0.0062)
P(X=2)= 0.0003

Therefore, the probability that in 2 of the next 4 years the rainfall will exceed 50 inches is
approximately 0.0003 or 0.03%.

14.
1. Use the binomial probability formula: P(X = k) = (n choose k) p^k (1 - p)^(n - k)
2. Plug in the values:
- n = 10 (total voters)
- k = 7 (voters for Proposition A)
- p = 0.7 (probability of a single voter supporting Proposition A)
3. Calculate the binomial coefficient:
- C(10, 7) = 120

4. Calculate the probability:


- P(X = 7) = 120 (0.7^7) (0.3^(10 - 7))
- P(X = 7) ≈ 0.2668 or 26.68%

15.
● Poisson distribution with an average rate (λ) of 3 accidents per week.
Probability of no accidents using the Poisson probability formula:
P(X = k) = (e^(-λ) λ^k) / k!
- P(no accidents) = e^(-λ)
Plug in λ = 3:
- P(no accidents) = e^(-3)
Complement rule to find the probability of at least one accident:
- P(at least one accident) = 1 - P(no accidents)
Calculate this probability:
- P(at least one accident) ≈ 0.9502 or 95.02%
16.
a. Proportion of Days with Less than 3 Claims:

Use the Poisson distribution with an average rate (λ) of 5 claims per day.

- To find the proportion of days with less than 3 claims, calculate P(X < 3) where X is the
number of claims per day.

- Use the Poisson probability formula:


P(X = k) = (e^(-λ) λ^k) / k!

- For less than 3 claims, we need to calculate P(X = 0) and P(X = 1), then add them up:

P(X < 3) = P(X = 0) + P(X = 1)

- Calculate P(X = 0) and P(X = 1) using λ = 5:

P(X = 0) = (e^(-5) 5^0) / 0! = e^(-5) ≈ 0.0067


P(X = 1) = (e^(-5) 5^1) / 1! = 5 e^(-5) ≈ 0.0337

- Add these probabilities:

P(X < 3) = 0.0067 + 0.0337 = 0.0404

So, the proportion of days with less than 3 claims is approximately 0.0404 or 4.04%.

2. Probability of 4 Claims in Exactly 3 of the Next 5 Days:

Use the binomial distribution because we want to find the probability of a specific number of
successes (4 claims) in a fixed number of trials (5 days).

- Use the binomial probability formula:

P(X = k) = (n choose k) p^k (1 - p)^(n - k)

- Here, n is 5 (5 days), k is 4 (4 claims), and p is the probability of a single day having 4


claims, which we calculated in the previous part as 0.0337.

- Calculate P(X = 4):

P(X = 4) = (5 choose 4) 0.0337^4 (1 - 0.0337)^(5 - 4)


P(X = 4) = 5 0.0337^4 (1 - 0.0337)
- Calculate this probability:

P(X = 4) ≈ 0.0017

So, the probability of having exactly 4 claims in exactly 3 of the next 5 days is approximately
0.0017 or 0.17%.

17.
The probability of a disk being defective is p = 0.01. Then, the probability of a disk
being non-defective is q = 1 - p = 0.99.

Calculate the probability that there is at least one defective disk in a package of 10
disks. We can use the binomial distribution to do this:

P(at least one defective disk) = 1 - P(no defective disks)

= 1 - (0.99)^10
= 0.0174

Step 2: Calculate the probability that exactly one package is returned out of three.
This can be done by multiplying the probability of one package being returned by
the probability of the other two packages not being returned:

P(exactly one package returned) = P(one package returned) P(two packages not
returned)
= 3 P(one package returned) (1 - P(one package returned))^2

To calculate the probability of one package being returned, we can use the binomial
distribution again:

P(one package returned) = 3 (0.0174)^1 (1 - 0.0174)^2

= 0.0498
Therefore, the probability that exactly one package is returned out of three is:

P(exactly one package returned) = 3 0.0498 (1 - 0.0498)^2


= 0.1401
The proportion of packages that are returned is 1.74%. The probability that exactly one
package is returned out of three is 14.01%.

18.
a) Find the mean and standard deviation for samples of size 36.

The mean of the sampling distribution of sample means is equal to the population
mean, which is 128. The standard deviation of the sampling distribution of sample
means is equal to the population standard deviation divided by the square root of
the sample size, which is 22 / sqrt(36) = 3.67.

b) Find the probability that the mean of a sample of size 36 will be within 10 units of
the population mean, that is, between 118 and 138.

We can use the central limit theorem to approximate the probability that the mean
of a sample of size 36 will be within 10 units of the population mean. The central
limit theorem states that the sampling distribution of sample means will be
approximately normal, even if the population distribution is not normal, as long as
the sample size is large enough.

To calculate the probability, we can first calculate the z-scores of the two values,
118 and 138, relative to the population mean:

z1 = (118 - 128) / 3.67 = -2.78


z2 = (138 - 128) / 3.67 = 2.78
We can then use a standard normal table to find the probability that a standard
normal variable will be between -2.78 and 2.78. This probability is approximately
0.997.

Therefore, the probability that the mean of a sample of size 36 will be within 10 units
of the population mean is approximately 99.7%.

19.

Mean =23.8 SD=4.6

=> 23.8 +0.5 = 24,3, 23.8 -0.5=23.3

Z(24.3-23.5 /4.6) = Z (0.11) = 0.5438


Z (23.3-23.8)/4.6=Z(-0.11) = 1-Z(0.11) = 0.4562
Z (0.11)-Z(0.11) = 0.5438 -0.4562 =0.0876
Probability = 0.0876 &
Z-value = 0.0876 x 1200 =105.12

20.
To construct a 95% confidence interval for the population mean 𝝁, we will use the
following formula:

Confidence interval = sample mean ± z-score standard error

where:

● sample mean is the average diameter of the simple random sample of 50


bolts, which is 5.11 mm
● z-score is the critical value from the standard normal distribution for a 95%
confidence interval, which is 1.96
● standard error is the standard deviation of the sampling distribution of
sample means, which can be calculated using the following formula:

standard error = population standard deviation / sqrt(sample size)


In this case, the population standard deviation is 0.1 mm and the sample size is 50, so
the standard error is:
standard error = 0.1 mm / sqrt(50) = 0.0141 mm
Therefore, the 95% confidence interval for the population mean 𝝁 is:
Confidence interval = 5.11 mm ± 1.96 0.0141 mm = (5.0821 mm, 5.1379 mm)
We can be 95% confident that the true population mean diameter of bolts produced by
this manufacturer is between 5.0821 mm and 5.1379 mm.

21.
To construct a 95% confidence interval for the mean of the population sampled, we
will use the following formula:

Confidence interval = sample mean ± z-score standard error

where:

● sample mean is the average sales of the random sample of 16 records, which
is 290 liters of diesel fuel
● z-score is the critical value from the standard normal distribution for a 95%
confidence interval, which is 1.96
● standard error is the standard deviation of the sampling distribution of
sample means, which can be calculated using the following formula:

standard error = population standard deviation / sqrt(sample size)


In this case, the population standard deviation is unknown, so we will estimate it
using the sample standard deviation, which is 12 liters. The sample size is 16, so
the standard error is:

standard error = 12 liters / sqrt(16) = 3 liters

Therefore, the 95% confidence interval for the mean of the population sampled is:

Confidence interval = 290 liters ± 1.96 3 liters = (281.04 liters, 298.96 liters)

We can be 95% confident that the true population mean sales of diesel fuel is
between 281.04 liters and 298.96 liters.

22.
Population mean (μ) = 33950 , Sample size (n) = 50
Sample mean (X) = 34,076 , Sample standard deviation (5) = 824
Significance level (0) = 0·01
t = 34,076-33950/ (324/√50)
The critical value for a one-tailed t-test with an alpha level of 0.01 & 49 degrees (n-1=50-1=49)
of freedom is 2.405. Since the test static is greater than the critical valve, we can reject the nuis
hypothesis & conclude that the true population mean is more than 33.950. Therefore, the
government's estimate seems to be low.

23.
The test statistics is,
t= x(mean)-μ/(s /√n) ~ t (n-1)

where the critical region is t>ta/2, (n-1) or t<-tα/2, (n-1).

Therefore;
t = 10.06-10 / (0.246/√10)
= 0.06/0.078
= 0.77
0.078
Using t-valve with (10-1=9) degree of freedom at a=0.01 the critical value of t :
t0.005, 9 = 0.77
Because t> 3.25, we do not reject the null hypothesis.
Hence, We can conclude that the average content of the lubricant is 10 liters.

24.
a) Calculate a 95% confidence interval for the mean weight of all such runners:

To calculate the confidence interval, we can use the formula for a confidence interval for the
population mean when the population standard deviation (σ) is known:

Confidence Interval = X ⼟ Z {σ/√n)


Where:
- X is the sample mean (61.79 kg)
- Z is the critical value for a 95% confidence interval (approximately 1.96)
- σ is the population standard deviation (4.5 kg)
- n is the sample size (24)

Confidence Interval ≈ (60.994 kg, 62.586 kg)

b) Based on this confidence interval, does a test of H0: µ = 61.3 kg vs. HA: µ ≠ 61.3 kg reject
H0 at the 5% significance level?

The population mean (61.3 kg) falls within the confidence interval (60.994 kg to 62.586 kg), so
we fail to reject the null hypothesis (H0) at the 5% significance level.

c) Carry out the hypothesis test to verify your answer:


To formally test the hypothesis, we can perform a one-sample z-test.
H0: µ = 61.3 kg
HA: µ ≠ 61.3 kg
Test Statistic (Z) ≈ 0.533
Since the absolute value of Z is less than the critical value (1.96 for a 95% confidence
level), we fail to reject the null hypothesis.
So, the hypothesis test confirms that we fail to reject H0 at the 5% significance level,
consistent with our answer in part (b).

26.
First we have to find mean of both datasets:
Mean for Dataset X :
X(mean)= 13986/7
X = 1998
Mean for Dataset Y :
Y(mean)= 46290/7
Y = 6612.85

Xi Yi Xi.Yi x^2 y^2


2004 10950 21943800 4016016 119902500
2003 9400 18828200 4012009 88360000
2001 8990 17988990 4004001 80820100
1998 5800 11588400 3992004 33640000
1997 5850 11682450 3988009 34222500
1994 3800 7577200 3976036 14440000
1989 1500 2983500 3956121 2250000
ΣXi = 13986 ΣYi = 46290 ΣXi.Yi = 92592540 Σx^2 = 27944196 Σy^2=373635100

The slope m,
m=(7×92592540) − (13986×46290) / 7×27944196−195608196

m= 625.7143

The intercept b,

b=(46290) − (625.7143×13986) / 7
b=−1243564.3143

The equation of Linear Regression,

y = mx + b

a) y = 625.7143x −1243564.3143

b) The slope of the line , 625.71 in this case, signifies that a one-unit change in the

independent variable (car age) leads to a corresponding 625.71 unit change in the

dependent variable (asking price), assuming all else remains constant. It represents the

asking price's change per unit car age change.

c)
d) To find the proportion of the variability of the asking price explained by the age of the

car, you can square the correlation coefficient (r) to get the coefficient of determination

(R-squared). In this case: R-squared = (0.987)^2 = 0.974169 So, approximately 97.42%

(rounded to two decimal places) of the variability in the asking price is explained by the

age of the car. This indicates that the age of the car is a highly significant predictor of

the asking price, and it explains a large proportion of the variability in the asking prices

of cars in your dataset.


27.

c) The linear regression line equation is:


m=(10×3053536) − (6456×4683) / 10×4285034−41679936
m=301912 / 1170404m = 0.258

y = 0.258x + 301.7352

This equation represents the relationship between the dependent variable (y) and the
independent variable (x). In this case, there is a positive linear relationship between the
two variables. For every one-unit increase in x (monthly income), y (consumption) is
expected to increase by 0.258 units, assuming all other factors remain constant.

However, it's important to note that the strength of this relationship may be weak, as
indicated by the low correlation coefficient (r) in the following question.

d) The correlation coefficient measures the strength and direction of the linear
relationship between two variables. In this case, r is 0.2435, which is a relatively low
value. A low correlation coefficient suggests that there may be only a weak linear
relationship between monthly income and consumption. It's important to remember
that correlation does not imply causation, and there could be other factors at play that
influence consumption.

e) To estimate the amount of consumption for a monthly income of $900, we can use
the linear regression equation:

y = 0.258x + 301.7352

Plug in x (monthly income) as 900:


y = 0.258 900 + 301.7352
y = 232.26 + 301.7352
y = 533.9952
So, with a monthly income of $900, the estimated consumption is approximately 534.

You might also like