Assignment - 2 FM-217

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

ASSIGNMENT-2

Introduction to Data Science


FM-217
Plaksha University
DEADLINE: 20/11/2024, WEDNESDAY, 5:00 PM

QUESTIONS
1. A random sample of 50 students has an average score of 75 with a standard deviation
of 8. Construct a 95% confidence interval for the true mean score of the students.

2. Calculate the 99% confidence interval for a sample proportion of 0.65 with a sample
size of 150.

3. A factory produces widgets with a mean weight of 150 grams. A random sample
of 40 widgets shows a mean weight of 148 grams with a standard deviation of 5
grams. Find the 90% confidence interval for the true mean weight of the widgets.

4. Explain how the width of a confidence interval changes if you increase the sample
size, keeping the confidence level constant.

5. Interpret a 95% confidence interval of (10.5, 15.2) for the average number of hours
worked per week by part-time employees.

6. A survey claims that 60% of people prefer online shopping. To verify, a sample of
200 people is surveyed and 110 prefer online shopping. Conduct a z-test for the
sample proportion at a 5% significance level.

7. Test whether the mean height of a population is 170 cm, given a sample mean of
172 cm, a standard deviation of 8 cm, and a sample size of 50. Use a one-tailed
z-test at a significance level of 0.05.

8. Perform a z-test for the hypothesis: “The average life of a battery is greater than
300 hours,” using a sample of 40 batteries with a mean of 305 hours and a standard
deviation of 10 hours. Use a significance level of 0.01.

9. You are planning to perform a z-test for a population. When would you prefer
using a z-test for a binomial distribution and when would you prefer a z-test for a
normal distribution? Give an example of each scenario.

10. A factory wants to test if the proportion of defective items produced is less than 5%.
A sample of 500 items has 20 defective items. Test this claim at a 0.05 significance
level using a z-test for proportions.

1
11. A researcher wants to test if the average income of a city is different from $50,000.
A random sample of 30 people has a mean income of $52,000 with a standard
deviation of $4,500. Conduct a two-tailed t-test at a significance level of 0.05.

12. Test if the average lifespan of a species of plant is more than 12 months using a
sample of 25 plants with a mean lifespan of 13.2 months and a standard deviation
of 1.8 months. Conduct a one-tailed t-test at a 0.01 significance level.

13. Two groups of students receive different training methods. Group A (n=15) has a
mean score of 82, and Group B (n=20) has a mean score of 79. The pooled standard
deviation is 6. Test if there is a significant difference between the groups using a
two-tailed t-test.

14. Explain the assumptions necessary for performing a t-test. How do these assump-
tions differ from those of a z-test?

15. Calculate the p-value for a t-test comparing the means of two samples: one with
n=10, mean=12, and standard deviation=3, and the other with n=12, mean=15,
and standard deviation=4.

16. You flip a coin 10 times and observe 8 heads. Calculate the exact p-value using
the binomial formula for testing if the coin is fair. Compare your result with a
significance level of 0.05 and interpret the conclusion.

17. A new treatment for a disease is expected to be effective in 70% of cases. In a


trial of 10 patients, 9 show improvement. Calculate the p-value using the binomial
distribution to determine if the treatment is significantly better than expected. Use
a one-tailed test with a significance level of 0.05.

18. A basketball player makes 7 out of 10 free throws in practice. The historical success
rate is 60%. Perform a binomial test to determine if the player’s performance
significantly differs from the historical rate. Calculate the p-value and determine if
it is significant at alpha = 0.05.

19. A quality control engineer tests 10 products and finds 8 to be defective, where the
acceptable defect rate is 50%. Calculate the p-value using the binomial formula
and determine whether the defect rate significantly exceeds the acceptable limit at
a 0.01 significance level.

20. A coin is suspected to be biased towards heads. You toss it 10 times and observe 9
heads. Using the binomial distribution, calculate the p-value for the test that the
probability of heads is greater than 0.5. Compare the result to a significance level
of 0.01 and conclude if there is sufficient evidence to support the bias.

21. A retailer collects data on customer preferences for four different brands. Perform
a chi-square goodness-of-fit test to determine if the brands are equally preferred,
given observed counts of [25, 30, 20, 25].

22. Test whether two categorical variables—gender (male, female) and choice of bev-
erage (coffee, tea)—are independent using the chi-square test, given the following
contingency table: Male (Coffee=20, Tea=15), Female (Coffee=18, Tea=22).

2
23. Explain why a chi-square test for independence can be used to determine the rela-
tionship between two categorical variables, and why it is important that no variable
encoding is needed.

24. A chi-square test statistic is calculated to be 7.81 for a test with 3 degrees of
freedom. Determine if this result is significant at a significance level of 0.05.

25. Describe a scenario in which a two-tailed chi-square test would be used and interpret
the result.

26. A study compares the mean productivity of three departments. The sample means
are 50, 55, and 53 with respective variances of 4, 5, and 6. Conduct a one-way
ANOVA at a 0.05 significance level to determine if there is a significant difference
among departments.

27. Explain what a null and alternative hypotheses would look like in a one-way ANOVA
and what a significant F-statistic would imply.

28. In an ANOVA test, the total sum of squares (SSTotal) is 200, the sum of squares
for the groups (SSG) is 60, and the sum of squares for error (SSE) is 140. Calculate
the F-statistic if there are 4 groups and 25 total observations. Interpret the result
in the context of ANOVA.

29. Discuss the assumptions that must be met for ANOVA to be a valid test.

30. A researcher performs an ANOVA and finds a significant difference between groups.
What test should be conducted next to determine which groups differ from each
other?

31. Perform an F-test to compare the variances of two samples, where sample A has
variance = 5, n = 20, and sample B has variance = 3, n = 25. Test if the variances
are significantly different at the 0.05 significance level.

32. Explain the null and alternative hypotheses for an F-test comparing the variances
of two populations, and describe when a one-tailed test would be used.

33. Calculate the critical value for an F-test given alpha = 0.05, degrees of freedom for
the numerator = 10, and degrees of freedom for the denominator = 15.

34. Given two datasets with variances of 12 and 7, explain how to determine if the
variances are significantly different using a two-tailed F-test.

35. Describe a scenario where an F-test would be used to compare two variances and
the implications of rejecting the null hypothesis.

36. Calculate the R-squared value given the total sum of squares (SST) = 200 and the
sum of squares of residuals (SSE) = 50.

3
Topic: One-Sample and Two-Sample Tests
37. Conduct a one-sample t-test to determine if the average salary of a group of employ-
ees is significantly different from $60,000, given a sample mean of $62,000, standard
deviation of $5,000, and n = 25.

38. Explain the difference between a one-sample z-test and a two-sample z-test, pro-
viding an example of each.

39. Perform a two-sample t-test for independent samples, where sample A has a mean
of 85, standard deviation of 10, and n = 30, and sample B has a mean of 80,
standard deviation of 8, and n = 25. Test if there is a significant difference between
the means at alpha = 0.05.

40. Given two independent samples with proportions of 0.4 and 0.35, and sample sizes
of 100 and 120 respectively, perform a two-sample z-test for proportions at the 0.05
significance level.

41. Discuss the conditions under which a one-sample chi-square test would be used and
provide an example scenario.

Topic: Linear Regression


42. Explain the meaning of an R-squared value of 0.85 in a linear regression model and
how it relates to the variability of the response variable.

43. Explain how to calculate the slope of a simple linear regression line using the formula
for covariance and variance.

44. A dataset has the following values: X = [1, 2, 3, 4], Y = [2, 4, 6, 8]. Calculate the
regression line equation using pen and paper.

45. Given a regression line Y = 5 − 0.5X, interpret the meaning of the slope and
intercept in the context of the relationship between X and Y.

46. Calculate the residuals for a simple linear regression model with observed values Y
= [3, 5, 7] and predicted values Ŷ = [2.8, 5.2, 7.1].

47. A researcher wants to determine if hours of exercise per week can predict resting
heart rate. Using a sample of 100 individuals, a simple linear regression analysis is
conducted, and the following equation is obtained: Ŷ = 80 − 1.5X, where Y is the
resting heart rate (in beats per minute) and X is the hours of exercise per week.

• Regression Interpretation: Interpret the slope of this regression equation


in terms of its effect on resting heart rate.
• Hypothesis Test: The average resting heart rate in the general population
is known to be 75 bpm, with a standard deviation of 10 bpm. Perform a two-
tailed z-test at a 0.05 significance level to test if the sample’s mean resting
heart rate significantly differs from the population mean.

4
48. In a study on academic performance, a simple linear regression is used to predict
students’ final exam scores (out of 100) based on the number of study hours. The
regression equation obtained is Ŷ = 50 + 4X, where X is the number of hours
studied.

• Regression Interpretation: Explain the meaning of the slope in terms of


its effect on the final exam score.
• Hypothesis Test: Test if the slope of 4 is significantly different from zero
using a two-tailed t-test at a 0.05 significance level. Assume the standard
error of the slope is 1.5. Formulate and interpret your hypotheses regarding
the relationship between study hours and exam scores.

49. A dietitian is studying the relationship between hours of physical activity and weight
loss over a month. Using a sample of 40 clients, a simple linear regression is con-
ducted, resulting in the equation Ŷ = 5 + 0.3X, where X represents the hours of
activity, and Y represents the weight loss in kg.

• Regression Interpretation: Interpret the slope of the regression model in


terms of weight loss per hour of activity. What does the slope suggest about
the relationship between physical activity and weight loss?
• Calculate SSG: Based on the slope and the given information, calculate the
SSG assuming that X̄ = 20 hours and the mean weight loss Ȳ = 11 kg across
the 40 clients.
• Hypothesis Test: Given that SSTotal = 500 and SSE = 200, calculate the
F-statistic for the regression model using the SSG you computed. Perform a
two-tailed ANOVA test at a significance level of 0.05. Determine if the model
significantly predicts weight loss.

50. A retail store performs a simple linear regression to predict whether a customer will
make a purchase (Yes = 1, No = 0) based on the amount of time spent browsing
(in minutes). The regression yields a predicted probability for each customer. A
threshold probability of 0.5 is used to classify customers as “Likely to Purchase” or
“Unlikely to Purchase.”

• Regression Interpretation: Explain what it means for a customer to be


classified as “Likely to Purchase” based on the regression prediction.
• Hypothesis Test: Using the classifications from the regression predictions,
construct a contingency table comparing the observed and predicted purchase
categories. Perform a two-tailed chi-square test at a 0.05 significance level
to evaluate if the model’s predictions significantly align with actual purchase
behavior.

You might also like