Sb-Midterm I
Sb-Midterm I
SB midterm
Statistics for Business (Trường Đại học Kinh tế Thành phố Hồ Chí Minh)
Student 4:
1. The closing price of a stock is an example of what type of data: nominal, ordinal,
interval, or ratio data?
Ans: Ratio data, because it can be ranked, measured and there is true zero
2. From the following tree, find the probability that a randomly chosen person will not
get a vaccination and will not get the flu
Student 5:
1. What is the level of measurement for categorical data?
Ans: nominal and ordinal
2. People weights are normally distributed with mean of 60kg and standard deviation of
5kg. What is the probability that a randomly picked person weighs more than 65kg?
Ans: EXCEL => 1-NORM.DIST(65,60,5,True) = 0.15866
Or: Standardized normal distribution
Student 7:
1. Internet survey posted on a website is an example of which sampling method?
Ans: Simple random sampling
2. On average there are 16 defects per 100 meters of electric wire, what is the
probability of finding exactly 2 defects in a randomly chosen 10-meter?
Ans: HYPGEOM.DIST(x,n,s,N,True)
+ N:100
+ n:10
+ X:2
+ S:16
=0.302221
Student 8:
1. Which model best describes the number of births in a hospital until the first twins are
delivered?
Ans: Geometric distribution (it describes the number of trials until the first twins
happens)
1. Scores are normally distributed with a mean of 400 and standard deviation of 50. The
top 10 percent of the applicants would have a score of at least what value?
Ans: 464 scores
2.
Student 9:
1. What is the weakness of the mode?
Ans: it doesn't take all the scores in the data set into consideration (the mode is only
concerned with the most frequently occurring number in a set of raw data, it doesn't
consider any of the other scores.)
2. Can we find the probability of events A or B occurring by summing their probabilities?
Ans: With the mutually exclusive probability we can find probability of events A or B
by sum P(A) and P(B). But the non mutually exclusive probability we have to deduct
the intersection of A and B
Student 10:
1. Using a sample to make generalizations about an aspect of a population is called
what statistics?
Ans: Inferential Statistics (Inferential statistics use measurements from the sample of
subjects in the experiment to compare the treatment groups and make
generalizations about the larger population of subjects.) (2 types: Estimation,
Hypothesis Testing)
2. Scores are normally distributed with a mean of 460 and standard deviation of 80.
What fraction of the applicants would you expect to have a score of 400 or above
Ans: P(X > 400) = P(Z > -0.75) ) =.7734
Student 11:
1. Your telephone area code is an example of a(n) ___ variable.
Ans: Nominal (can’t compare, no ordering)
2. The figure shows a standard normal N(0,1) distribution. Find the z value for the
shaded area
Ans: -1.75
Student 12:
1. The daily closing price of Sacombank stock over the past month is an example of
which data?
Ans: Time series data (it collects the observations collected at the equal time over the
past month)
2. When you send out a resume the probability of being called for an interview is .20.
1. Scores are normally distributed with a mean of 400 and standard deviation of 50. The
top 10 percent of the applicants would have a score of at least what value?
Ans: 464 scores
2.
What is the probability that you get your first interview within the first five resumes
that you send out?
Ans: P(5) = 0.2 * (1-0.2)^(5-1) =0.08192
Student 14:
1. Could we eliminate sampling error by increasing the sample size?
Ans: Yes, as the sample size increases, the sample gets closer to the actual
population. And can be easy to bias because the sample size is too small
2. The median is halfway between Q1 and Q3 on a box plot?
Ans: Correct. Because median is Q2 and Q2 is the middle between Q1 và Q3
Student 15:
1. What is the advantage of stratified sampling?
Ans: It divides the population into groups with the same common characteristics, and
chooses a single random sample so if we choose every individual representing each
group, we can have a better control of each subgroup to ensure all of them are
represented in the sampling.
2. The length of fish caught in a certain river are normally distributed with a mean of
40cm and standard deviation of 5cm. What proportion of fish caught will be between
30 and 50cm in length?
Ans: EXCEL => =NORM.DIST(50,40,5,True)-NORM.DIST(30,40,5,True)=0.9545
Student 17:
1. A manager chose two people from her team of eight to give an oral presentation
because she felt they were representative of the whole team’s views. What sampling
technique?
Ans: Judgement. Because we choose a typical persons to representative of the
whole team’s views
2. Scores are normally distributed with a mean of 460 and standard deviation of 80.
What fraction of the applicants would you expect to have a score of 400 or above
Ans: Z = (400-460)/80 = -0.75 dò ra P = 0.7734
Student 18:
1. What sampling technique is used when groups are defined by their geographical
location?
Ans: Cluster sampling (because it divides the groups with common characteristics is
geographic location)
2. If arrivals occur at a mean rate of 3.6 events per hour, what is the probability of
waiting more than 30 minutes for the next arrival?
Ans: P = e^(-lambda*X) => e^(-3.6*0.5) = 0.1652
Student 20:
1. Compared to dot plot, we lose some detail when we present data in a frequency
1. Scores are normally distributed with a mean of 400 and standard deviation of 50. The
top 10 percent of the applicants would have a score of at least what value?
Ans: 464 scores
2.
Student 21:
1. Log scales are common because most people are familiar with them?
Ans: For people I think it’s rarely used, because log scale illustrate time series data in
financial or for data that grow rapidly (e.g., revenues for a start-up company)
2. Scores are normally distributed with a mean of 460 and a standard deviation of 80.
The top 5 percent of the applicants would have a score of at least what value?
Ans: 591.6
Student 22:
1. Could we use a column chart instead of line chart for time series data
Ans: Yes but we prefer to using line chart because if there are a large number of data
the bar chart is scattered and difficult to analyze
2. Given the contingency table shown here, find P(A or M)
Student 29:
1. In a histogram, the height of a bar represents what?
Ans: Used to represent the frequency of variables or relative frequency (because
vertical axis illustration this)
2. In the standard normal distribution, the probability between z = 1.00 and z = 1.15 is
higher or lower than the probability between z = 2.00 and z = 2.15?
Ans: The probability between z = 1.00 and z = 1.15 is higher than the probability
between z = 2.00 and z = 2.15. Nhìn hình hoặc quy từ bảng ra trừ.
Student 30:
1. A population is of size 200 observations. When the data are represented in a relative
frequency distribution, the relative frequency of a given interval is 0.15. What is the
frequency in this interval?
1. Scores are normally distributed with a mean of 400 and standard deviation of 50. The
top 10 percent of the applicants would have a score of at least what value?
Ans: 464 scores
2.
Student 32:
1. One benefit of the box plot is that it clearly displays the standard deviation. Is that
correct?Ans: Incorrect. Because it’s 5 numbers: min, max, Q1, Q3, median. It’s not
clearly displays the standard deviation
2. Nam scored 85 in an exam (Q1 = 40 and Q3 = 60). Based on the fences, is this an
outlier? (85 dương, upper)
Ans: Interquartiles range = Q3 - Q1 = 20
Inner fence= Q3+1.5*20 = 90
Student 33:
1. What shape of a distribution allows us to apply the Empirical Rules.
Ans: Empirical Rules is used for a normal distribution (Bell curve)
(Chebyshev’s theorem: any distribution, no constant shape)
2. If P(A) = 0.50, P(B) = 0.30. And P (A and B) = 0.15. Are A and B independent events?
Ans: Yes.
Independent Events: P(A | B) = P(A)
P(A|B)=P(A and B)/P(B)=0.15/0.3=0.5 = P(A)
Student 34:
1. Given the date set 2, 5, 10, 6, 3, what is the median value?
Ans: 2, 3, 5, 6, 10 => 5
2. If A and B are mutually exclusive events, then P(A and B) = P(A) + P(B). Is that
correct?
Ans: No. Because P(A and B) = 0. Because two event can not occur simultaneously.
Student 35:
1. If samples are from a normal distribution with mean = 100, standard deviation = 10,
what percentage of data within 90 to 110?
Ans: 0.6846
1. Scores are normally distributed with a mean of 400 and standard deviation of 50. The
top 10 percent of the applicants would have a score of at least what value?
Ans: 464 scores
2.
2. If events A and B are mutually exclusive, then P(A) + P(B) = 0. Is that correct?
Ans: Yes. Because P(A and B) = 0. Because two event can not occur simultaneously.
Student 36:
1. The Empirical Rule can be applied to any distribution, unlike Chebyshev’s theorem.
Is that correct?
Ans: Incorrect, because Chebyshev’s theorem can be applied to any distribution,
(Empirical Rule: Bell-shape)
2.
Student 37:
1. If the standard deviations of two samples are the same, so are their coefficients of
variation. Is that correct?
Ans: Incorrect, CV = (S/Mean) * 100%
2. On average, a major earthquake (Richter scale 6.0 or above) occurs three times a
decade in a certain California country. Find the probability that at least one major
earthquake will occur within the next decade.
Ans: Poisson (Mega)
Approximation
Biominal: repeat n trials, and want to know the probability of successes with replacement
Normal approximation to Binomial: mean = n.pi, SD= sqrt(n*pi(1-pi)) khi n*pi >=10 and n(1-
pi)>=10
1. Scores are normally distributed with a mean of 400 and standard deviation of 50. The
top 10 percent of the applicants would have a score of at least what value?
Ans: 464 scores
2.