0% found this document useful (0 votes)
45 views5 pages

Stats Notes by Warad

Uploaded by

Prasanna warad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
45 views5 pages

Stats Notes by Warad

Uploaded by

Prasanna warad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 5
INTERQUARTILE RANGE The difference between the values of the third and first quartile values, Q, — Q,, is the interquartile range. Because outliers fall into the bottom and top quartiles, they do not affect the interquartile range. For instance, set A: {0, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11} and set B: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11} both have 12 elements and a range of 11, but the data distributions within that range are very different. The first data point (the zero) in the first set is an outlier that skews the value of the range. The interquartile range of set A is 11 — 10 = 1 and that of set B is 8 — 2 = 6. This comparison shows that, with the exception of the outlier, the data in set A are more closely spaced than the data in set B. & A straightforward way to visually display data dispersion by quartiles is a boxplot, also called a box-and-whisker plot. This visual depiction uses five values: L, the least number in the data; G, the greatest number; M, the median; Q,, the first quartile; and Q,, the third quartile. The interquartile range (which includes M) is drawn as a rectangular box, and straight lines extend from the sides of the box to the least and greatest values (L and G). A number line is drawn below the boxplot to show the numerical values of these points. Example: Draw a box-and-whisker plot for the set {7, 0, 3, 8, 1, 2, 7, 4, 8 4, 5, 6, —3, 7, 3, O} The first step is to arrange the data in ascending order: {—3, 0, 0, 1, 2, 3, 3, 4, 4,5, 6, 7,7, 7, 8, 8}. Next, identify the values needed for the box-and-whisker plot. L = —3 and G = 8. Since there are 16 elements in the set, Q, = 1, Q, = 7, and M= 4. L Q, M Q 6 +5 0 5 10 PERCENTILES For large groups of numbers, the position of given data points is sometimes stated in per- centiles rather than quartiles. The principle is the same as for quartiles, but there are 100 subdivisions instead of 4. Converting quartiles to percentiles is easy: Q, is the same value as the 25th percentile, Q, is the same value as the 50th percentile, Q, is the same value as the 75th percentile, and Q, is the same as the 100th percentile. STANDARD DEVIATION Like the range and the interquartile range, standard deviation is a way to measure how spread out the values in a given data set are. You probably won't have to calculate the standard deviation on Test Day, but you will need to understand how it behaves, so it’s worthwhile to calculate it for a couple of lists of numbers to get a feel for it. Here’s how standard deviation is calculated: « Find the average of the data points. « Find the difference between the average and each data point. ‘« Square each of the differences. « Find the average of the squared differences. « Take the square root of that average. Example: Calculate the standard deviation of 1, 3, 8, 11, and 12. 14348411412 _ 35 First, find the average: 4 5 Next, determine the differences between each term and 7: (1-7) = -6, 8-7) =-4, 8-7) =1, (1-7) =4, and 02-7) =5. Square each difference and find the average of the squared differences: (oF + Car +P +t 5? _— 36 +164+1416 425 _ 94 5 5 _ The standard deviation is the square root of that average: io Note that the farther the data points are away from the mean, the greater the standard deviation will be, Also note that two sets whose data points are the same distance from the mean will have the same standard deviation. For example, the sets (2, 4, 6} and {8, 10, 12} will have the same standard deviation. VIB.B © 4.34 FREQUENCY DISTRIBUTIONS A frequency distribution is a description of how often certain data values occur ina set and is typically shown in a table or histogram. As an example, take a look at the table below, which displays the frequency distribution of singing voices in a choir in two ways. The first delineates the count of singers for each vocal range; the second shows the percentage of the total choir for the different voices. Counts can be converted to percents by adding all the counts to get the total and then dividing the individual count for each category by that total to obtain the percentages. For instance, in this chart, there are 75 total singers. If 15 of them are tenors, then tenors make up 2 = 0.20 = 20% of the singers. Soprano | 12 | 16% Alto 18 | 24% Tenor _| 15 | 20% Baritone | 12 | 16% Bass | 18 | 24% Ina relative frequency distribution, also known as a probability distribution, the frequency with which given values occur is given in decimal form rather than as percentages. The value of a randomly chosen value from a known distribution of data is called a random variable X. The table below is an example of a probability distribution of such a variable. 5% of the values in the distribution are 0, 10% are 1, 20% are 2, and so on. Or, stated differ- ently, the probability that a randomly selected value will be a zero is 0.05, the probability that a randomly selected value will be 1 is 0.10, the probability that a randomly selected value will be 2 is 0.20, etc. P(X) 0.05 0.10 0.20 0.30 0.25 0.10 @]=]@)8]=]o]s Note that you can calculate the mean by using a weighted average approach (discussed earlier in this chapter): 0.05(0) + 0.10(1) + 0.20(2) + 0.30(3) + 0.25(4) + 0.10(5) = 0 + 0.10 + 0.40 + 0.90 + 1.00 + 0.50 = 2.90, ‘We mentioned above that frequency distributions can be shown as histograms. Ifthe sample set of an experiment is large enough, as in the example below, the histogram begins to closely resemble a continuous curve. 60 50 40 30 + - 20 0 ey 13 5 7 9 M1 13 15 17 19 21 23 25 27 29 31 33 35 37 39 AL NORMAL DISTRIBUTION There is a special kind of frequency distribution, called the normal distribution, that is closely tied to the concept of standard deviation. Many natural data sets, such as the dis- tribution of the heights of adult males in the United States, very closely approximate the normal distribution. This distribution is commonly referred to as a bell curve because of its shape. Only two parameters are needed to define any normal distribution: the mean and the standard deviation. In a normal distribution, the mean equals the median, and the data are symmetrically distributed around the mean, so the curve to the left of the mean is a mirror image of the curve to the right. Normal Distributions Normal curves 1 and 2 have the same mean, but curve 2 has a greater standard devi that is, curve 2 is much more spread out. Curve 3 has a greater mean value than either curve 1 or curve 2 but has a smaller standard deviation, so it is less spread out. The graph below shows some important probability values that hold true for all normal distributions. The percentage of the area under any portion of a distribution curve equals the probability that a randomly selected event will fall within that area’s range. Mean SD = Standard Deviation sp +1SD sD #2SD 38D #38D 0.1% 26% 13.0% 34.3% 3.3% ' 13.0% ' 26% ' 0.1% Areas of the Normal Distribution Example The lengths of boards cut at a sawmill are normally distributed with a mean of 96.00 inches and a standard deviation of 0.05 inch, What is the approximate probability that a randomly selected board will be longer than 96.10 inches? A board longer than 96.10 inches would be 0.10 inches longer than the mean value. The standard deviation of the board lengths is 0.05 inches, so a 96.10 inch long board would be 2 standard deviations above the mean. Because these board lengths are normally distributed, 50% are at or below the mean, 34% are within the area between the mean and 1 standard deviation above the mean, and 13% are between 1 and 2 standard deviations above the mean. The probability that a random board would be less than or equal to 96.10 inches is approximately 50% + 34% + 13% = 97%, so the probability that a board would be longer is about 100% — 97% = 3%.

You might also like