Additional Mathematics Project Work 2013 (Form 5) : Statistics
Additional Mathematics Project Work 2013 (Form 5) : Statistics
Additional Mathematics Project Work 2013 (Form 5) : Statistics
Name: Gan Ming Jiang Class: 5F I/C No: 961114-12-6983 Teacher: Madam Teresa Tian Li Ken
Introduction We students taking Additional Mathematics are required to carry out a project work while we are in Form 5. We are to complete only ONE task based on statistics. This project can be done in groups or individually, and I gladly choose to do this individually. Upon completion of the Additional Mathematics Project Work, we are to gain valuable experiences and able to: Apply and adapt a variety of problem solving strategies to solve routine and non-routine problems. Experience classroom environments which are challenging, interesting and meaningful and hence improve their thinking skills. Experience classroom environments where knowledge ans skills are applied in meaningful ways in solving real-life problems. Experience classroom environments where expressing ones mathematical thinking, reasoning and communication are highly encouraged and expected. Experience classroom environments that stimulates and enhances effective learning. Acquire effective mathematical communication through oral and writing, and to use the language of mathematics to express mathematical ideas correctly and precisely. Enhance acquisition of mathematical knowledge and skills through problem-solving in ways that increase interest and confidence. Prepare ourselves for the demand of our future undertakings and in workplace. Realize that mathematics is an important and powerful tool in solving real-life problems and hence develop positive attitude towards mathematics. Train ourselves not only to be independent learners but also to collaborate, to cooperate, and to share knowledge in an engaging and healthy environment .
Use technology especially the ICT appropriately and effectively. Train ourselves to appreciate the intrinsic values of mathematics and to become more creative and innovative.
Realize the importance and the beauty of mathematics. We are expected to submit the project work within three weeks from the first day the task is being administered to us.
By the 18th century, the term "statistics" designated the systematic collection of demographic and economic data by states. In the early 19th century, the meaning of "statistics" broadened, then including the discipline concerned with the collection, summary, and analysis of data. Today statistics is widely employed in government, business, and all the sciences. Electronic computers have expedited statistical computation, and have allowed statisticians to develop "computer- intensive" methods. The term "mathematical statistics" designates the mathematical theories of probability and statistical inference, which are used in statistical practice. The relation between statistics and probability theory developed rather late, however. In the 19th century, statistics increasingly used probability theory, whose initial results were found in the17th and 18th centuries, particularly in the analysis of games of chance(gambling). By 1800, astronomy used probability models and statistical theories, particularly the method of least squares, which was invented by Legendre and Gauss. Early probability theory and statistics was systematized and extended by Laplace; following Laplace, probability and statistics have been in continual development. In the19th century, social scientists used statistical reasoning and probability models to advance the new sciences of experimental psychology and sociology; physical scientists used statistical reasoning and probability models to advance the new sciences of thermodynamics and statistical mechanics. The development of statistical reasoning was closely associated with the development of inductive logic and the scientific method. Statistics is not a field of mathematics but an autonomous mathematical science, like computer science or operations research. Unlike mathematics, statistics had its origins in public
administration and maintains a special concern with demography and economics. Being concerned with the scientific method and inductive logic, statistical theory has close association with the philosophy of science; with its emphasis on learning from data and making best predictions, statistics has great overlap with the decision science and microeconomics. With its concerns with data, statistics has overlap with information science and computer science.
Statistics today
During the 20th century, the creation of precise instruments for agricultural research, public health concerns (epidemiology, biostatistics, etc.), industrial quality control, and economic and social purposes (unemployment rate, econometry, etc.) necessitated substantial advances in statistical practices. Today the use of statistic has broadened far beyond its origin. Individuals and organizations use statistics to understand data and make informed decisions throughout the natural and social sciences, medicines, business, and other area. Statistics are generally regarded not as the subfield of mathematics but rather as a distinct, allied, field. Many universities maintain separate mathematics and statistic departments. Statistic is also taught in department as diverse as psychology, education and public health.
PART 1 1. List the importance of data analysis in daily life. The importance of data analysis in daily life: Structuring the findings from survey research or other means of data collection Break a macro picture into a micro one Acquiring meaningful insights from the dataset Basing critical decisions from the findings Ruling out human bias through proper statistical treatment
2. (a) Specify three types of measure of central tendency and at least two types of measure of dispersion. The Types of Measure of Central Tendency and of Measure of Dispersion Central tendency gets at the typical score on the variable, while dispersion gets at how much variety there is in the scores. When describing the scores on a single variable, it is customary to report on both the central tendency and the dispersion. Not all measures of central tendency and not all measures of dispersion can be used to describe the values of cases on every variable. What choices you have depend on the variables level of measurement.
Mean The mean is what in everyday conversation is called the average. It is calculated by simply adding the values of all the valid cases together and dividing by the number of valid cases.
The mean is an interval/ratio measure of central tendency. Its calculation requires that the attributes of the variable represent a numeric scale
Mode The mode is the attribute of a variable that occurs most often in the data set.
For ungrouped data, we can find mode by finding the modal class and draw the modal class and two classes adjacent to the modal class. Two lines from the adjacent we crossed to find the intersection. The intersection value is known as the mode.
Median The median is a measure of central tendency. It identifies the value of the middle case when the cases have been placed in order or in line from low to high. The middle of the line is as far from being extreme as you can get.
There are as many cases in line in front of the middle case as behind the middle case. The median is the attribute used by that middle case. When you know the value of the median, you know that at least half the cases had that value or a higher value, while at least half the cases had that value or a lower value.
Range The distance between the minimum and the maximum is called the range. The larger the value of the range, the more dispersed the cases are on the variable; the smaller the value of the range, the less dispersed (the more concentrated) the cases are on the variable
Interquartile Range Interquartile range (IQR) is the distance between the 75th percentile and the 25th percentile. The IQR is essentially the range of the middle 50% of the data. Because it uses the middle 50%, the IQR is not affected by outliers or extreme values.
Standard Deviation The standard deviation tells you the approximate average distance of cases from the mean. This is easier to comprehend than the squared distance of cases from the mean. The standard deviation is directly related to the variance.
If you know the value of the variance, you can easily figure out the value of the standard deviation. The reverse is also true. If you know the value of the standard deviation, you can easily calculate the value of the variance. The standard deviation is the square root of the variance.
(b) For each type of measure of central tendency stated in (a), give examples of their uses in daily life. Uses of mean Mean can be used to see the average mark of the class obtained. This average helps to see how many students are above average, how many are average students and how many are below averages. The teacher tries to help the average and below average students to score more grades in future. In a factory, the mean of the wages helps the authorities to know if the workers' welfare is maintained. It also helps to compare the salaries of the employees of the different companies. In sales, the average sales in the district help the sales manager to plan for increasing the sales in the future. The government takes the average income and expense of the citizens to know whether the citizens rights are maintained. The family finds the average of their expenses to balance their finance. The average production of agricultural commodities, the industrial goods, the average exports and imports help the country to see their developments.
Uses of median Median is the middle value. It helps us to see both the sides of the middle value. It divides the information into two equal parts, one part lesser than median and the other more than the median. Median is calculated after arranging the information in ascending order or descending order. Median is used to find the students who scores less or more than the middle value. Median is calculated to find the distribution of the wages. It is calculated to find the height of the players, in the points scored by players in a series of matches, to find the middle value of the ages of the students in a class etc. Median also determines the poverty line.
Uses of mode It is used to calculate the frequency of the arrival of the public transport, the frequency of the games won by a team of players, the frequency of the needs of an infant. The mode is also seen in calculation of the wages, in the number of telephone calls received in a minute by the telephone department, the frequency of the visitors, the frequency of the patients visiting the hospitals, the mode of travel etc.
PART 2 1. Get your class marks of any subject in one examination/test. Attach the mark sheet. Mark sheet of 5Fs Physics examination 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. Adriel Li Chi Tak Alan Adam Kwan Bryan Ladislaus Cheng Kang Kian Chin Pin Quan Chung Ci Ming Edwin Lau Kok Vui Gan Ming Jiang Lemuel Chin Jian Lip Liu Yik Hsiang Mervyn Fung Wai Ben Mohamad Ikhwan bin Mohamed Razak Mohd Shafiq bin Abdullah Salleh Muhammad Nur Reldwan Asmi Muhammad Saiful Nizam Shudarsan Nelson Chong Ngee Bow Neville Quinn James Seck Min Kuan Shim Cheng Hau Tleray Lister Wong Sai Yan Wong Sheng Yeng Yau Lye Man Adeline Goh Miling Amanda Wilson Ang Ping Ping Brenda Lee Su Nyin Chin Shu Shan Chong Kah Yinn Demi Chu Yoong Cheng Fong Xue Li Joan Lim Qiao En Leong Mei Tung Leong Sim Yee Lim Wern Yahn Nancy Chang Nur Afiqah binti Mohd Nizam Nur Hijjah Syazalina binti Abdul Rahim 77 61 56 49 57 51 60 85 73 83 67 75 36 18 18 31 50 62 84 29 76 58 67 48 46 67 58 48 51 69 72 64 84 76 58 33 36 50
Nur Syahidah binti Abdul Rashid Nur Syahirah binti Abdul Rashid Nur Syaqira Hananie binti Zainal Pang Yan Rohani Arpa Sally Lai Huey Lin Sharifah Mariani Habib Hamid Sheena Liew Wei Shuan Shelle Tan Chew Ling Total ( )
49 43 37 55 50 71 46 67 63 2 664
2. (a) Mean
(b) Median
Median mark = 58
(c) Mode = 67
3. Construct a frequency distribution table as in Table 1 which contains at least five class intervals of equal size. Choose a suitable class size.
Midpoint, 4.5 14.5 24.5 34.5 44.5 54.5 64.5 74.5 84.5 94.5 495
Frequency, 0 2 1 5 7 11 10 7 4 0 47
0 420.5 600.25 5 951.25 13 861.75 32 672.75 41 602.5 38 851.75 28 561 0 162 521.75
(ii) Mode
( ( ) = 57.5
Method 2: Calculator 1. Click on the MODE button twice. Press button 1 to choose SD. This puts the calculator in Statistics Mode. 2. Key in the first data, x, which is 4.5 into the calculator. Press the SHIFT button, and follows by the , button. 3. After that, key in the frequency of the data 4.5, which is 0. 4. Press the M+ button. 5. Key in the rest of the data by repeating steps 2, 3 and 4. 6. After keying in all the data, press SHIFT button again. 7. Press button 2. 8. Choose xn by pressing button 2 for the answer of Standard Deviation of the data. 9. Press = button. 10. The answer showed on the calculator is 17.30.
(b) Based on your answers from 3(a) above, state the most appropriate measure of central tendency that reflect the performance of your class. Give your reasons. Based on my answers from 3(a), the mean is a more suitable measure of central tendency because it reflects the central value around which the data seems to cluster. The mode is not suitable because the data does not seem to cluster about mode. The median is also not that suitable as it only shows the middle value of the whole data. (c) Measure of dispersion is a measurement used to determine how far the values of data are spread out from its average value. Explain the advantages of using standard deviation compared to interquartile range as the better measure of dispersion Standard deviation (SD) is the most commonly used measure of dispersion. It is a measure of spread of data about the mean. SD is the square root of sum of squared deviation from the mean divided by the number of observations.
In both these formulas n - 1 is used instead of n in the denominator, as this produces a more accurate estimate of population SD. The reason why SD is a very useful measure of dispersion is that, if the observations are from a normal distribution, then 68% of observations lie between mean 1 SD 95% of
observations lie between mean 2 SD and 99.7% of observations lie between mean 3 SD
The other advantage of SD is that along with mean it can be used to detect skewness. Interquartile range is defined as the difference between the 25th and 75th percentile (also called the first and third quartile). Hence the interquartile range describes the middle 50% of observations. If the interquartile range is large it means that the middle 50% of observations are spaced wide apart.
The main disadvantage in using interquartile range as a measure of dispersion is that it is not amenable to mathematical manipulation. The standard deviation gives a measure of dispersion of the data about the mean. A direct analogy would be that of the interquartile range, which gives a measure of dispersion about the median. However, the standard deviation is generally more useful than the interquartile range as it includes all data in its calculation. The interquartile range is totally dependent on just two values and ignores all the other observations in the data.
This reduces the accuracy it extreme value is present in the data. Since the marks does not contain any extreme value, standard deviation give a better measures compared to interquartile range. 4. (a) We can calculate mean, median, mode and standard deviations using both grouped and ungrouped data. However, there are still differences between the answer of the calculation. For an example, the final answer of the mean in ungrouped data is 56.68 while the answer of mean in grouped data is 56.20. Both of the answers differ with a value of 0.48. Ungrouped data gives a more accurate representation than grouped data. Grouped data has been classified and some data analysis has been done, which means this data is no longer raw. Grouped data is also less accurate as you have to generalize each group to be completely homogenous. If the grouped data is used to calculate for answers, not every data obtained is used. Thus, it resulted in a small difference between the answers calculated. In ungrouped data, each variable is displayed as it is. Therefore, it is better to use an ungrouped data instead of grouped data for an accurate answer.
(b) State the conditions when grouped data and ungrouped data are preferred. Grouped data is the data that has been arranged and divided into groups (into a frequency distribution), whereas ungrouped data is the data that is not organized and raw.
Although it is allowed to use both grouped data and ungrouped data in data analysis, there are several conditions when grouped data or ungrouped data are preferred.
First, for example, when presenting a large series of data, it is more preferable to use grouped data as grouped data is well arranged and it is easier for us to observe. Less time and energy will be needed in counting the data each to calculate mean, median, mode and standard deviation of the data.
However, it is more preferable to use ungrouped data while finding results during an investigation. This is because all the data are used but not in grouped data. Thus, it will give a more accurate result.
Lastly, grouped data is also more preferable during classification, determining range and mostly when the number of data is fixed. If the number of data is fixed or it contains an extreme value which it is hard to be grouped, it is better to use an ungrouped data.
1. Your teacher will add 3 marks for each student in your class for completing all their assignments. Make a conjecture for the new values of the following:
(a) Mean New mean = 56.20 + 3 = 59.20 A 1 2 3 4 5 6 7 8 9 10 4.5 14.5 24.5 34.5 44.5 54.5 64.5 74.5 84.5 94.5 B 7.5 17.5 27.5 37.5 47.5 57.5 67.5 77.5 87.5 97.5 C 0 2 1 5 7 11 10 7 4 0 47 D 0 35 27.5 187.5 332.5 632.5 675 542.5 350 0 2 782.5
(b) Mode Mode = 57.5 New mode = 57.5+3 = 60.5 Class Intervals (Marks) 3 12 13 22 23 32 33 42 43 52 53 62 63 - 72 70 -79 80 89 90 - 99 Frequency, 0 2 1 5 7 11 10 7 4 0
New mode
Frequency, 0 2 1 5 7 11 10 7 4 0
Cumulative Frequency 0 2 3 8 15 26 36 43 47 47
Midpoint, 7.5 17.5 27.5 37.5 47.5 57.5 67.5 77.5 87.5 97.5 495
Frequency, 0 2 1 5 7 11 10 7 4 0 47
0 612.5 756.25 7 031.25 15 793.75 36 368.75 45 562.5 42 043.75 30 625 0 178 793.75
2. A new student has just enrolled in your class. The student scored 97% in his/her former school. If the students mark is taken into account in the analysis of your school examination or test, calculate the mean and the new standard deviation. Class Interval (Marks)
3 12 13 22 23 32 33 42 43 52 53 62 63 72 73 82 83 92 93 102
Midpoint, 7.5 17.5 27.5 37.5 47.5 57.5 67.5 77.5 87.5 97.5 495
Frequency, 0 2 1 5 7 11 10 7 4 1 48
0 612.5 756.25 7 031.25 15 793.75 36 368.75 45 562.5 42 043.75 30 625 9 506.25 188 300
Further Exploration 1. The top 20% of the students in your class will be awarded by the subject teacher. Calculate the lowest mark for this group of students by using graphical and calculation methods. Calculation: students 48 10 students = 38 students
18, 18, 29, 31, 33, 36, 36, 37, 43, 46, 46, 48, 48, 49, 49, 50, 50, 50, 51, 51, 55, 56, 57, 58 58, 58, 60, 61, 62, 63, 64, 67, 67, 67, 67, 69, 71, 72, 73, 75, 76, 76, 77, 83, 84, 84, 85, 97 Top 20% Awardees
Graphical method:
Cumulative Frequency