Frequency: Saravana Somasundaram Home Work 1 Math 4/5/7600
Frequency: Saravana Somasundaram Home Work 1 Math 4/5/7600
Problem 1 a. Histogram The histogram below provides the following information. The shape of the distribution is skewed to the left. The distribution is unimodal.
frequency
70 60 50 40 30 20 10 0 20.1 - 40.0 0 - 20.0 40.1 - 60 60.1 - 100 100.1 - 120 120.1 - 140
b. Mean = 96.2788, Median = 101.61413, Mode = 80.05 c. These measures indicate that the distribution is skewed to the left. The histogram also agrees with this conclusion. d. Here the median is the center of the area and the mean is the center of balance. The mean and median are close to each other. Due to the absence of the extreme outliers the mean is fairly close to the central tendency. Since our distribution is skewed to the left the mean would be a better representative of the distance travelled by the buses.
140.1 - 160
160.1 - 200
Problem 2
Treatment Times
Normal - 95% CI Quantile Plot
99 95 90 80 70 60 50 40 30 20 10 5 1
Percent
10
20 C1
30
40
50
a. The 25th percentile for the treatment time is 14.25. This means 25% of the patients are treated within 14.25 minutes and the 75% of patients take longer than 14.25 minutes. b. The 90th percentile is 31.1. This is to say that 90 percent of the patients are treated within 31.1 minutes. The data supports the health clinics claim that they treat 90% of the patients in less than 40 minutes.
Problem 3 a. Stem and Leaf plot (using SPSS) Donors Stem-and-Leaf Plot Frequency Stem & Leaf
The box represents middle 50% of the data sample. The lower 25% is between the lower whisker and the box, while the upper 25% is between the box and the upper whisker. The line in the box represents the median and from the figure we can see that it is 320. The box is positioned more or less in the center. The whiskers are also of same size. This implies there is no significant skewness. There are no extreme outliers in the box plot.
Problem 4 a. Mean = 57.52941, Median = 34, Mode = 0 and 4 b. The stem and Leaf Plot (using Minitab)
14 17 17 10 8 7 6 5 4 3 2 2 1 1
0 0 0 0 0 1 1 1 1 1 2 2 2 2
From the above stem and leaf plot we can see that the distribution is heavily skewed to the right. In this case the mean might be misleading considering the extreme outliers. In this case the median would be better measure of central tendency, since its less likely to affect by the extreme outliers. c. Range = 273, Standard deviation S = 70.1955 d. Range approximation of S 68.25 The approximation is fairly close to our computed value. But in our approximation we have underestimated the standard deviation. (In case of error it is better to overestimate than underestimate the standard deviation) e. = 57.52941 70.1955 29 demands between failures or 85.294% fall in this interval. By Empirical Rule 68% should fall in this interval. 2 = 57.52941 140.391 32 demands between failures or 94.118% fall in this interval. By Empirical Rule 95% should fall in this interval. 3 = 57.52941 210.5865 33 demands between failures or 97.059% fall in this interval. By Empirical Rule almost all should fall in this interval. f. The Empirical Rule is less accurate in forecasting the percentages within one standard deviation of the mean. This could be due to the fact that our distribution is heavily skewed to the right and the presence of extreme outliers. Also the range approximation has underestimated the standard deviation. According to the Empirical Rule 95% should fall in 2, which is fairly accurate.
Air Flow
The scatter plot between the prices vs. air flow shows there is a weak positive relationship between them. b. The correlation coefficient is given by = = 0.41285
c. Part b shows that there is a weak positive relationship between the prices and the airflow. We can come to a similar conclusion from the scatter plot. d. From part a and part b we can see that there is only a week relation between the price and the air flow. Except for the fans that are priced at 15 and 20 dollars all the others will produce more or less similar performance. e. Fitted Line (Extra)
Fitted Line Plot
Air Flow
Problem 6 Given A1, A2 and A3 are different diseases and B1, B2, B3 and B4 are mutually exclusive symptom states. Then the Bayess Formula is | | = | Probability of A2 given B1 is 1 |2 2 2 |1 = 1 |1 1 + 1 |2 2 + 1 |3 3 = Probability of A2 given B2 is 2 |2 = = Probability of A2 given B3 is 2 |3 = = Probability of A2 given B4 is 2 |3 = = Problem 7 a. An area 0.025 to the right of = 1.96 (from table 2 in the appendix) b. An area 0.05 to the left of = 1.645 (from table 2 in the appendix) Problem 8 4 |1 1 3 |1 1 2 |1 1 . 17 .15 = 0.4435 . 08 .25 + .17 .15 + .10 .12 2 |2 2 + 2 |2 2 + 2 |3 3
Problem 9 a. Quantitative Assessment Mean = 30.74, Median = 18.25, Trimmed Mean = 19.96 The mean is much higher than that of the median. Also the trimmed mean is close to the median. This implies that the distribution is heavily skewed on the right. b. Box Plot and Stem and Leaf The box plot shows the presence of extreme outliers. Observing the stem and leaf plot and the graph we can conclude that the distribution is heavily skewed to the right.
Body Mass
500
448.0
400
300
C1
200 100
Problem 10 Ten people are in the room, what is the probability that at least two of them share a birthday? Ignoring the leap year let us assume there are 365 days in a year. Calculating the probability that all 10 people have different birthday is 10 = 1 = 1 = 10 1 91 21 1 1 1 365 365 365
Hence the probability of two people in the room not having same birth day is 0.88305. The probability of two people sharing a birth day is 10 = 1 10 = 1 0.88305 = 0.11694 Hence the probability of two people having same birthday in a group of 10 is 11.694 %. The following are some assumptions in solving the above problem. 1. Leap year is not included. 2. Specific days such as New Years days or weekends or specific seasons are not considered. 3. All 365 days are considered equally likely.