0% found this document useful (0 votes)
18 views36 pages

Sta 101 Research Methods

The document outlines the course content for STA 101: Descriptive Statistics at Adamawa State University, covering definitions, scope, and roles of statistics, methods of data collection, and measures of central tendency. It emphasizes the importance of statistics across various disciplines and discusses the nature of statistical data, including qualitative and quantitative types. Additionally, it details methods for collecting data, such as interviews and questionnaires, and introduces key statistical terms and concepts.

Uploaded by

wushishi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
18 views36 pages

Sta 101 Research Methods

The document outlines the course content for STA 101: Descriptive Statistics at Adamawa State University, covering definitions, scope, and roles of statistics, methods of data collection, and measures of central tendency. It emphasizes the importance of statistics across various disciplines and discusses the nature of statistical data, including qualitative and quantitative types. Additionally, it details methods for collecting data, such as interviews and questionnaires, and introduces key statistical terms and concepts.

Uploaded by

wushishi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 36
ADAMAWA STATE UNIVERSITY, MUBL FACULTY OF SCIENCE DEPARTMENT OF MATHEMATICS Course Code:STA 101 Course Title: escriptive Statisties Course Uni wo (2) Course Content: “Definition of statistics. *The scape and role of statistics. *Nature of statistical data “Methods of data collection. *Measures of central tendency (mean, median and mode). * Tabular and graphical summary of data. *Diagrams and charts. “Graph of frequency distribution. *Ogive and applications. *Fundamentals of probably. *Pascal ‘riangle and binomial expansion, *The Gaussian curveand some ofits properties. “Use OF Normal Tables. “Elements of Regression and Correlation Analysis, *Rank Correlation, INTRODUCTION ‘The subject statistics asit seems, is nota new discipline but itis as old as human society "self It has been used right from the existence of Ife on this earth, although the sphere of ts utility was very much restricted Statistics is a word that is used in everyday life. It deals basically withthe estimation of model parameter from data with testing of hypothesis about their values. However, one may ask whats statistics, how does it work/function, how does it help to solve certain practical problems, and so on. ‘The word statistics can be used with two distinct meaning i, Itcan be referred toas facts and figures which can be put into numerical form, ii On the other hand it can be it can be referred to as statistical methods, 's Defined: ‘Doparencont of Mathenatles STA 10% Lecture Nees for 2a Oat AO Cen ro Statistics has been defined differently by different authors from time to time and Feasons for variation in the definitions are as follows: Firstly, in modern times the field of statistics has widened considerably while in ancient times, it was confined only to the affairs of state but now it embraces almost every sphere of human activities, Hence a number of definitions which were limited toa very narrow field of enquiry have to be replaced by some more comprehensive and exhaustive definitions, Secondly, statistics has been defined in two ways; as statistical data (i.e, numerical statement of fact), while others define it as statistical methods (i.e, complete body of the principles and techniques used in collecting such data). > Statistics can be defined as an area of science which concerns with the design of experiments, analysis of data and making inference about a population from the information contained in a sample. > Statistics is the science of making effective use of numeric data relating to ‘groups of individuals or experiments. It deals with all aspects, including not only the collection, analysis and interpretation of such data, but also planning of the collection of data in terms of design of survey and experiment’ - Statisties is a branch of science with deals with scienti methods of collecting, organising summarising presenting and analysis of data in order to draw a valid and logical conclusion. ‘SCOPE AND ROLE OF STATISTICS Statistics has a major role to play in all stages of the scientific method. This is because it is involved with the definition and evaluation of hypothesis through the collection and analysis of data. Statistics is almost unique around the major disciplines, in that the professional skills of a statistician can be applied in fields as diverse as medicine, natural sciences, (ales STA TOL Lecture Notes for 2 4 Academie Session Page? agriculture and forestry, education, technology, industry, communications, insurance, marketing and management as well as various aspects of government Therefore states isa subject that appeals not only tothe mathematics students with an interest on the probability based on real data on modelling of real life situations but alo to students studying any field of application and who have reasonable background in mathematics. In the industry, statisticians are becoming increasingly trol, quality assurance, of clinical trials is a major statistical involved in such area as process cor industrial experimentation and product reliability. The design and analysis screening of safety of new drugs. Govern’ atistcs is heavily used in everyday life and by all area, particulary, in ment generally is the major employer of statisticians as st government agencies, opportunities in statisti isin fact limitless and by no means or life would have been something different without sta istics. less important. In she ‘The role of statistics therefore is to act as a tool of analysis of data arising from experiment or investigation from al fields of human endeavour NATURE OF STATISTICS Statistics involves collection, presentation, analysis and interpretation of numerical data. The fact which dealt with must be capable of numeric expressions. The statistician isconcemed with developing and using procedures for design, analysis and inferences making that provide the best decision at a minimum cost. All problems involving the use of statistical methods can be classified as belonging to either descriptive or inferential statistics. Assignment No. Discuss briefly the importance of statistics in the following disciplines: (a) Mathematics (9) Sociology (2) Economies (10) Medical sciences (3) Business and management (11) war (4) Planning (22) Social sciences, Tpnrncnt of aon aten STAIOL Lest Nit for Saas ao 24 Acadeile SEitlon Pa (5) Accountancy and auditing (23) Physical sciences (6) Astronomy (24) Psychology and education (7) industry (25) Insurance (8) Biology (16) Education NATURE OF STATISTICAL DATA Statistical data are facts or figures collected from units of experiments. The data (facts or figures) can either be qualitative or quantitative. > Qualitative data are facts or information (data) that cannot be presented in a numerical form, e.g. eye colour, gender, marital status, gender, educational status, etc. > Quantitative data are the type of facts or information that can easily be presented in numerical form e.g. age, height, weight etc. Sources of statistical data Statistical data can be obtained either through primary or secondary sources, depending on the method and purpose of collecting the data. It may be observed that the distinction between primary and secondary data is a matter of degree or relativity only. The same set of data may be secondary in the hands of one and primary in the hands of others. In general, data are primary to the source that collects and process them for the first time and are secondary for all sources that later use such data. Primary Data Primary data are information (data) which is expressly collected for a specific purpose e.g. the data relating to mortality (death rates) and fertility (birth rates) in Nigeria by the national population commission, data and figures relating to traffic flow by the FRSC, collection of information by interview, observation etc. One great advantage of primary data is that the exact information required is obtained. —_—— “Dopartncantof athenaties STA £01 Lecture Notes for 2028/2024 Academie Session Page # Primary data can be collected through any of the following sources; Questionnaire Interview Observation Experiment Secondary Data Secondary data are collected for some other purposes, frequently for administrative reasons e.g. when the primary data are reproduced by either UN, Statistics office, details of import and exports compiled by the customs and exercise department, etc, Sources of Secondary Data 1, Books and journals 5, Textbooks 2, Report registers 6. News papers 3, Survey 4, Maps, photographs and satellite METHODS OF DATA COLLECTION Statistical data can be collected through the following ways: ‘A. Interview: This is an instrument used to extract/elicit Information from the respondent through some verbal interaction between the interviewer and the Interviewee (respondent). These involve one to one chart between the researcher and respondent. Advantages of interview: i, _ It gives an opportunity for the interviewer and the respondent to have a face to face interaction. ji, The respondent can respond the way he likes (freely). ili, Information which the respondents would not want to commit in writing is obtained. iv. The recorded information is relatively reliable because itis recorded by the interviewer himself. It is very useful for subjects that cannot fill the questionnaire ¢.g- illiterate people/uneducated. Disadvantages of interview: Interview consumes time and is expensive to conduct. ji, Subjective information derived from unstructured interview Is sometimes difficult to analyse. B. Questionnaire: These are sequence of questions derived in written form in ler to collect data on a specific subject. The questionnaire can be onde administered directly to the respondents or can be mailed to the respondents to be filled and then matled back to the researcher. There are three types of questionnaire: 1. The structure or closed form questionnaire 2. The unstructured (open-form) questionnaire 3. The pictorial form questionnaire Advantages of question: i. Itis economical in terms of time ii, It can be used to elicit information on non-cognitive constraints like creativity, anxiety, kindness etc. ili, _ Greater percentage of people can be reached at a time. iv, It can be administered to a variety of people. Disadvantages of questionnaire i. Negative or incorrect answers can be given if questions are too lengthy or fit includes the respondent's personal life. ii, There may be low percentage return of the questionnaire especially when the mode of administration is not on the spot. li, Unclear questionnaire may lead to misunderstanding or wrong responses Guidelines for constructing a good questionnaire ‘Think of the attribute you want to measure from the respondents. 2. Construct enough items to actually measure the attribute you want to measure from the respondents Give enough instructions on how to complete the questionnaire. Make sure the language of the questionnaire is clear, precise and unambiguous. Avoid repetition of items in the questionnaire Avoid words that embarrass the respondent ‘The questionnaire should not be too short or too long eNO we w Consider the method of analysis before constructing the questionnaire. C. Direct field measurement: This is a technique common to the researcher direct measurement of phenomena under studies D. Observations (participant or non-participant): observation is a technique that involves watching people, events, situations or phenomena and obtaining first- hand information relating to particular aspects of such people, events, situations or phenomena. jary and secondary and discuss the various Exercise: Distinguish between p methods of collecting primary data. SOME STATISTICAL TERMS Raw data: data collected in its original form. Frequency: the number of times a value or number of values occurs/appear. Frequency distribution: the organisation of raw data in table form with classes and frequencies. _ Page? ‘Department of Mathenatize STA 101 Lecture Notes for 2028/2024 Academie Session Categorical frequency distribution: a frequency distribution which the data is only nominal or ordinal, Ungrouped frequency distribution: a frequency distribution of numerical data. The raw dat not grouped. Grouped frequency distributions: a frequency distribution where several numbers are grouped into one class. Class limit: separate one class in a grouped frequency distribution from another. The limits could actually appear in the data and have gaps between the upper limits of one class and the lower limits of the next Class boundaries: separate one class in a grouped frequency distribution from another. The boundaries have one more decimal place than the raw data and therefore do not appear in the data. There is no gap between the upper boundary of one class and the lower boundary of the next class. The lower class boundary is found by subtracting 0.5 units from the lower class limit and the upper class boundary is found by adding 0.5 units to the upper class limit. Class width: the difference between the upper and lower class boundaries of any class. ‘The class width Is also the difference between the lower of any two consecutive class and the upper limits of two consecutive classes. Itis not the difference between upper and lower limits of the same class. Class mark (mid-point): the number in the middle of the class. It is obtained by adding the upper and lower class limits and dividing by two. It can also be found by adding. the lower and upper class boundaries and dividing by two. ‘Cumulative frequency: the number of values less than the upper class boundary for the current class. This is a running total of the frequencies. Relative frequency: the frequency divided by the total frequency, This gives the percent of values falling in that class. eee “Department of Mathematics STA 10% Lecture Notes for 2028/2024 Academie Session Page ® Cumulative relative frequency (relative cumulative frequency): the running total of the relative frequencies or the cumulative frequency divided by the total frequency, Gives the percent of the values which are less than the upper boundary. Histogram: A graph which displays the data by using vertical bars of various heights to represent frequencies. The horizontal axis can be the class boundaries, the class marks or the class limits Frequency polygon: a line graph. The frequency is placed along the vertical axis and the class mid points are placed along the horizontal axis these points are connected with lines. give: a frequency polygon of the cumulative frequency or the relative cumulative frequency. The vertical axis isthe cumulative frequency or the relative frequency, The horizontal axis is the class boundaries. The graph always starts at zero the lowest class boundary and will end up at the total frequency (for a cumulative frequency) or 1.00 (for a relative cumulative frequency). chart: graphical depiction of data as slices of a pie. The frequency determines the size of the slice. The number of degrees in any slice is the relative frequency multiply by 360 degrees. Pictograph: a graph that uses pictures to represent data, MEASURE OF CENTRAL TENDENCY (AVERAGES) ‘The measure of central tendency or measure of location is important in determining averages of numerical values and itis the value to be expected at a typical or middle data point. The following are the three measures of central tendency that are in common use: i, Arithmetic mean ji, Median Mode Department of Mathonaties STA 10! b ‘We shall briefly discuss each of these measures in this section, The Arithmetic mean (simple mean) The arithmetic mean is the best known and most reliable measure of central tendency. It is the arithmetic average of a group of scores which can be obtained by adding all the scores in a distribution and then dividing the sum of the scores by N (the total number of scores). ‘Arithmetic mean of a set of data or observations is thelr sum divided by the number of observations, e.g., the arithmetic mean ( X ) of the observations x1, x2, ... Xois given by ‘The mean in most cases is not the actual data value. Inthe case of frequency distribution x, f,,1= 1,2 0 Where fiis the frequency of the variable x, Lx + fyXy tn t+ fate Sit htwt hs In the case of grouped or continuous frequency distribution, xis taken as the mid-point of the corresponding class. Remark: the symbol is called sigma and is a Greek alphabet use in mathematics to denote the sum of values. Example 1 ‘The ages in years of random sample of six school children are 3,8,5,12,14 and12. Find the average age of the school children. Solution — Department of Mathematics STA 10% Lecture Notes for 2023/2024 Acadeniie Session Page1O = Lanasat2eider? = 54=9 years 6 6 Example 2 a) Find the arithmetic mean of the following frequency di x)oa 2 3 4 5 6 7 a er ee b) Calculate the arithmetic mean of the mark from the following table: Marks Q=10 10-20 20-30 30-40 40-50 50-60 Noofstudents | 1218 2720 wv 6 Solution 2} Computing the arithmetic mean using an ungrouped frequency distribut x fi ie 1 5 3 2 os 3 fase 4 i 68 5 1470 6 wo 60 7 6 Total 73299 LSS x = 299/73 = 4.09 b) Computing the arithmetic mean using grouped frequency distribution: Marks No. ofstudents(f) Mid-point(x) ie Tpnninin ofraptncnatien STA TC: betture Nees for 2023/2024 Academie Sesion Page HL 0-10 12 5 60 10-20 18 15 270 20-30 27 25 675, 30-40 20 35 700 40-50 7 45 765 50-60 6 55 330 Total 100 2800 Fe DDS x= ig t 2800" 28 Note: If the values of x and f are large, then calculating the mean using the above formula will be time-consuming and tedious. However, the calculation can be reduced to a large extent by taking the deviations of given values from any arbitrary point A as explain below: Let d)=x/—A. Then fd) =fi(xi-A) fxs - Afi ‘Summing both sides over | from 1 to n, we get Dynan Sie Saabs : ae 7 = Yifid, = ES fin-A=%-4, where * is the arithmetic mean of the distribution. ie 1X sa welt The above formula is much more convenient to apply. Exercise 1 ind the mean of the scores. Below are scores for 25 students on a4 point quiz. STA dai Lecture Nowe for 25/2024 Acadenie Session Page12 Exercise 2 Given the data as classified in the table below, calculate the arithmetic mean. Class Frequency (f) 10-14 2 15-19 4 20-24 3 25-28 2 30-34 1 35-39 2 Exercise3 Use the frequency distribution of marks given in the table below to find the mean mark of the following students. ("ark | Glass mark T_ Frequency [rsosa [sz | 1 [esse] sr] 3 60-64 62 5 [65-68 67 | 7 70-76 | 72 1 é 75-79 | 77 10 60-84 a e569 | iz ————— Example 3 Find the mean of the above data using an assumed mean, Choose 77 as the assumed mean, we complete the table as follows: Class mark, Deviation from Frequency) fa () assumed mean (d) 52 -25 1 57 =20 3 GI =15 5 67 =10 7 2 5 8 77 0 10 82 5 6 87 10 4 92 15 5 7 20 1 Total Ysf-s0 . et R= AtLY fd, =774GQ =77-21=769 we, 0 Merits and Demerits of Arithmetic mean Merits Demerits 1. itis rigidly defined It cannot be determined by inspection 2. It is easy to understand and | nor can it be located graphically. compute 2. It cannot be used while dealing with 3. Itis based on all the observation | qualitative characteristics which cannot 4. It is amenable to algebraic | be measured quantitatively such as treatment. intelligence, honesty, beauty, ete 5. Among all the _ averages, | 3. Arithmetic mean cannot be obtained arithmetic mean is least affected | for missing values. by fluctuations of sampling (i.e | 4. Itis affected by extreme values the arithmetic mean is a stable | 5. Arithmetic mean may lead to wrong average) conclusions if there details of data computed from are not given, 6, It cannot be calculated if the extreme class is open, eg, below 10 or above 70 ‘THE MEDIAN (MD) = Department of Mathematics STA 10% Lecture Notes for 2023/2024 Acadlenic Stssion Paget Median of a distribution is the value of the variable which divides it into two equal parts, je,, itis the value such that the number of observations above itis equal to the number of observations below it. = Whena data set is ordered, its called a data array ‘The median is defined to be the midpoint of the data array Median of ungrouped data in the case of ungrouped data, if the number of observation is odd, the median Is the middle value after the values are arranged in ascending or descending order of magnitude. in the case of even number of observations, there are two middle terms, median is obtained by taking the arithmetic mean of the middle terms. IF the observation of an ungrouped data is arranged in an increasing or decreasing order of magnitude, a value which divides the ordered observations into two equal parts is called the median of the data, Itis denoted by M.O Example: The median of the values 25, 20, 15, 35, 18 is 20 and the median of 8, 20, 50, 25, 15, 30s $(20425)= 22.5 Median of ungrouped frequency distribution In the case of discrete frequency distribution, median is obtained by consid cumulative frequencies, using the following steps: 1 Find EN, where N= 3 ii. See the (less than) cumulative frequency (cf just greater than 2. iii, The corresponding value of x is the median Sporto of ation aes STA GOA Listure kes for 2023/2024 Acadenie Session PagesS Example: Obtain the median for the following frequency distribution: xo o1 2 3 4 5 6 7 fo 8 10 1 16 20 25 15 Solution: Computation of the median x f f 1 a 8 fe 0 eerie 3 nn 29 4 16 45 5 20 65 6 2 90 7 15 105 8 9 114 9 6 120 Total N=120 Here, N= 120 => N/2 =60 The point where N/2th value falls is at c.f 65, and the value of x corr 5. Therefore, median is 5. Exercise: responding to 65 is ‘A super market recorded the number of items sold per week over one year period, The data is given below: No.ofitemssold Frequency 1 4 2 9 3 6 4 2 5 3 Median for a Grouped Frequency Distribution: Department of mathenaat DSTA Ta1 Lecture Notes for 2028/2024 Academie Session Page. In the case of continuous frequency distribution, the class corresponding to the c.fjust greater than N/2 is called the median class and the value of median is obtained by the following formula: (N12) - of yw f Median (MO) = In+ Where Nis the sum of the frequencies cfis the cumulative frequency of the class preceding the median class f is the frequency of the median class wis the width of the median class Ins the lower boundary of the median class. Example: Given the data in table below, find the median of the distribution. Class Frequency 155-205 3) 20.5 -25.5 5 25.5 -30.5 4 30.5-35.5 3 35.5-40.5 2 Solution: Form a cumulative frequency distribution table of the given data Class Frequency Cumulative frequency 15.5-20.5 3 3 205-255 5 a 255-305 4 2 30.5 -35.5 3 15 35.5 - 40.5 2 17 N/2is 17/2=85=9 Hence, the class that contains the 9" value is the median class i.e. the median class will then be in the class interval (25.5 - 30.5) Then, 7, of =8, f= 4, W=30.5-25.5 = 5, Im = 25.5 Deparenaant of Mathematics STA 04 Lecture Notes for 2028/2004 Academie Session Paget? Median (MD) = In + (weg Median = 255+ {7 /2)= 815 Exercis ‘The following are marks obtained by 17 students in STA 101 examinations. Find the median mark of the students. ‘Marks 10-20 20-30 30-40 40-50 50-60 Merits and Demerits of Median Merits Demerits 1. Itis rigidly defined T. In the case of even number of 2. Itis easily understood and easy to | observations, median cannot be compute. In some casesit can be | determine exactly, but estimate by located by mere inspection taking the mean of the two middle terms 3. Itisnotatallaffected byextreme|2. It is not based on all the values observations(the median is insensitive) 4, Median can be calculated for|3. It is not amenable to algebraic distributions with open-end | treatment classes. 4, When compared to mean, median is affected much by fluctuations of sampling. MODE Mode is the value which occurs most frequently in a set of observations and around which the other items of the set cluster densely. Thus in the case of discrete frequency distributions, mode is the value of x corresponding to maximum frequency. For ‘example, in the following frequency distribution; x fia 2 3 4 5 6 7 8 Fl 4 9 162522 7 3 — ny Acadeatle Sts Daparennent of athe The value of x corresponding to the maximum frequency is 4. Hence mode of the given frequency distribution is 4 However, for a small data set, where arrangement can easily be done, the data can be arranged in an ascending order of magnitude and mode can easily be obtained by mere inspection of the arranged data. Note: i. Foragrouped data, the mode is the most commonly observed category (class) ii, A data set can have more than one mode ( bimodal) ili, Addata set is said to have no mode if all values occur with equal frequency. Exomples 1. To find the mode of the following data set 8,9,9, 14,8, 8, 10, 7, 6,9, 7, 8, 10, 14, 11, 8, 14, 11. Ordering the data set ascending order of magni de gives 6,7.7,8,8,8, 8,8,9,9,9, 10, 10, 11, 11, 14, 14, 14. ‘Therefore, the mode of the data set is 8 because it appear with highest frequency of occurrence. 2. Six strains of bacteria were tested to see how long they could rem: alive, outside their normal environmental conditions. The time in minutes is given below. Find the mode, data set: 2,3,5,7,8,10 Here, there is no mode since each data value occurs equally with a frequency of one. 3. Find the mode of the data 18, 18, 18, 20, 22, 24, 24, 24, 26, 26. Here, there are two values having same high frequency, these are 18 and 24, hence the data is bimodal (two modes). In case of a large data set, we cannot easily pick the mode by inspection as illustrated. in the above cases, so the mode can be computed using the following formula, ‘Mode of an ungrouped frequency distribution, wi-p) Mode (Me). hs When the frequency distribution is symmetrical, the mean, median and mode coincide > When the frequency distribution is skewed, the mean median and mode do not coincide Deporenacat of Pithecnd > If the frequency distribution is posi ely skewed (skewed to the right), the mean Is greater than the median; the median is greater than the mode i.e. (mean>median>mode) > If the distribution is negatively skewed (skewed to the left), the mode Is greater than the median and the median is greater than the mean Le. (mode>median>mean) NOTE: the following relation holds between the mean, median and mode: ‘*% mode = mean — 3 (mean — median) = mean—3mean + 3median “ mean-— mode = 3(mean — median) = 3mean - 3median % mean~median = 1/3 (mean ~ median) These can also be stated as; 4 mode = mean—3mean + median or mode = 3median ~2mean. ‘ORGANISATION OF STATISTICAL DATA Having collected and edited the data, the next thing is to organise it, i. to present it in a readily comprehensible condense form which will highlight the important characteristics of the data, facilitate comparism and render it suitable for processing (statistical analysis) and interpretation. ‘The presentation of data can be broadly classified into two: (i) Tabular presentation (ii) Diagrammatic or graphic presentation. Tabular Presentation of Data Tabulation and classification are devices of presenting the statistical data in neat, concise, systematic and readily comprehensible and intelligible form. > When data are collected in original form, they are called raw data. > When raw datas organised into a frequency distribution, the frequency will be the number of values in a specific class of the distribution —_ Department of Mathenatice STA 101 Lecture Motes for 2028/2024 Academie Session Page 22 A frequency distribution is the organisation of raw data in tabular form, using classes and frequencies. Classes or Types of Frequency Distributions 1, Categorical frequency distributions - can be used for data that can be placed in specific categories, such as a nominal or ordinal level data. Examples are political affiliation, religious affiliation, blood type etc. Blood type frequency distribution Class. Frequency Percent A 5 20 8 7 28. ° $s 36 AB 4 16 2. Ungrouped frequency distributions - used for data that can be enumerated and when the range of values in the data set is not large. Examples - number of miles you travelled from home to campus, number of girls in a 4-child family ete. Number of miles travelled- Example Class Frequency 5 24 10 16 15 10 3. Grouped frequency distributions - used when the range of values in the data set is. very large. The data must be grouped into classes that are of more than one unit in width. ‘Some Basic Principles for Forming a Grouped Frequency Distribution ‘The following guidelines may be used for a good classification of a frequency data. Departnent of Mathematies STA 102 Lecture Notes for 2023/2024 Acaderaic Session Page 23 ‘% Types of Classes: The classes should be clearly defined and should not lead to any ambiguity. They should be exhaustive and mutually exclusive (ie. Non overlapping) + Number of classes: The choice of the number of classes or the class intervals into which a frequency distribution can be divided primarily depends on; i, _ the total frequency (i.e, total number of observation) Ti, the nature of the data i.e, the size or magnitude of the values of the variable il, the accuracy aimed at Iv. the ease of computation ‘Terms associated with a grouped frequency distribution + Class limits represent the smallest and largest data values that can be included ina class. E.g, in the lifetime of boat batteries example, the values 24 and 30 of the first class are the class limits, = The lower class limit here is 24 and the upper class limit is 30, ~The class boundaries are used to separate the classes so that there are no gaps in the frequency distribution, ~The class width of a class in frequency distribution is found by subtracting the lower (or upper) class limit of one class minus the lower (or upper) class limit of the previous class. Guidelines for constructing a frequency distribution ~ There should be between 5 and 20 classes. = The class width should be an odd number. = The classes should be mutually exclusive, = The classes must be continuous. ~The classes must be exhaustive. = The classes must be equal in width eee Department of Mathenuaties STA 101 Lecture Notes for 2023/2024 Academie Seesion Page ot Procedure for constructing a grouped frequency distribution = Find the highest and lowest value = Find the range (i.e, highest value ~ lowest value) = Select the number of classes desired - Find the width by dividing the range by the number of classes and rounding up. = Select the starting point (usually the lowest value); add the widths to get the lower limits. = Find the upper class limits ~ Find the class boundaries = Tally the data, find the cumulative frequency, Example: Grouped Frequency Distribution In a survey of 20 patients who smoked, the following data were obtained. Each value represents the number of cigarettes the patient smoked per day. Construct a frequency distribution using six classes, 10 8 6 14 22 1B 7 19 i 9 18. 14 2B 2 15 15, s [| ae [aa Solution: ‘Step 1: Identify the highest (H) and lowest (L) values: H= 22 and L= 5. Step 2: find the range: R= H— 2 v7. ‘Step 3: select the number of classes you desire, say 6. ‘Step 4: find the class width by dividing the range by the number of classes. Width = 17/6 = .83. This value is rounded up to 3. —_ Department of Mathematics STA £01 Lecture Notes for 2028/2024 Académie Session. PAGERS Step 5: select a starting point for the lower class limit. For convenience, this value is chosen to be 5, 8, 11, 14, 17 and 20. Step 6: the upper class limits will be 7, 10, 13, 16, 19 and 22. For example, the upper limit for the first class is computed as 8~1, etc. Step find the class boundary by subtracting 0.5 from each lower class limit and adding 0.5 to the upper class limit. Step 8: Tally the data and then write the corresponding numerical values for the tallies in the frequency column, and find the cumulative frequencies. Class limits Class Tally | Frequency | Cumulative boundaries frequency 05-07 45-75 IL 2 a [~ 08-10 75-105 m7 3 5 1-13 10.5- 13.5 HAT 6 ct 14-16 | _ 135-165 Hit 5 16 a7-19 | 165-195 Wh 3 19 20-22 | _19.5-22.5 1 1 20 Graphical Presentation of Stati ‘The three most commonly used graphs in statistical analysis are: 1. The histogram. 2. The frequency polygon. 3. Cumulative frequency graph or ogive. ‘The Histogram: A bar graph that represents a frequency distribution of a quantitative variable. A histogram is made up of the following components: 1. Atitle, which identifies the population or sample of concern. 2. Avertical scale, which identifies the frequencies in the various classes. 3. A horizontal scale, which identifies the variable x. Values for the class boundaries or class midpoints may be labelled along the x-axis. Use which ever method of labeling the axis best presents the variable. —— Department of Mathematizs STA 101 Lecture Netes for 2023/2024 Academie Session Page 2 Example: Draw a frequency histogram of the annual salaries for resort-club managers. ‘Annual Salary ($1000) 15-25 25-35 35-45 45-55 55-65 Number of Managers Solution: Frequency Histogram | Number of Managers 5 T] 15-25 25-35 35-85 SSS 55-65 ‘Annual Salary ‘The Frequency Polygon Frequency polygon is another device of graphic presentation of @ frequency distribution (continuous or discrete). In the case of discrete frequency distribution, frequency polygon is obtained by plotting the frequencies on the vertical axis (Y-axis) against the corresponding values of the variable on the horizontal axis (X-axis) and joining the points so obtained by straight lines. Example: The following data show the number of accidents sustained by 313 drivers of a company over a period of 5 years. Use the data to draw a frequency polygon. Nocteaans | ofayz)214¢ [5 ,*)7/#[2| |» Woofawvers | 60 | a pes | a [2s [2 | [7[s[a| 3 [2 _—————————— eel Dopartmnt of Mathematics STA Lou Listurt Nokes for 2022/2024 Academie Session Page 2 peumngar oF Ree Se SF + frequency polygot 7 ~ . de Boe +o § UGC POLY Cont 0 Aira ge d ™~ ¢ (ras xo. 2 or o 2 at a TE ee a et A frequency polygon for a frequency distribution having equal class intervals is formed by plotting (as points) class frequencies above the mid-points of the classes to which they relate and joining these points using straight lines. Note: The mid-point of a class is defined as that point lying mid-way between the two class boundaries. It is calculated as Lebtueh. Example: The data below gives the frequency distribution of the weekly wages (in naira) of 100 workers in a factory. Use the data and draw the histogram and frequency polygon of the distribution Weeldy wages) 20-24 | 25-29 | 30-34 | 35-39 | 40-44 | 45-49 | 50-54 | 55-59'] 60-64 Number of workers [4 Ce FE 5 2 Solution: All the classes are of equal magnitude j.e.5 but they are not continuous, as such. the distribution is to be converted into a continuous frequency distribution as below: ‘weekly wagestt) | 195-2455 | 245-2955 | 29.5-34.5 | 345-395 | 395-045 | 445-095 | 095-545 | 54.5595 | 595-645 Number of workers 4 5 2 23 31 10 8 5 2 Department of Mathematics STA 101 Lecture Notes for 2020/2021 Acaaerite Session This is obtained by subtracting 0.5 from the lower limit and adding 0.5 to the upper limit of each class interval. Frequency polygon is obtained by joining the mid points of the rectangles by straight lines, and extend both ways to 14.5-19.5 and 64.5-69.5 on the X-axis. STG CAE F FREQUEUG COLYSSt 2 Tenn srecnnet eens e 4 Z— POL AGO a es. tor Qs Ho. 6 14 Sn ge mae BO GATE Wrens Oe her Sette wae We HS SIS Ses We should note that the plotted points for the frequency polygon are just the centres of the top of the bars of the histogram. Cumulative Frequency Graph or Ogive: A line graph of a cumulative frequency or cumulative relative frequency distribution. An ogive has the following components: 1, Atitle, which identifies the population or sample. 2. A vertical scale, which identifies either the cumulative frequencies or the cumulative relative frequencies. 3. Ahorizontal scale, which identifies the upper class boundaries. Until the upper boundary of a class has been reached, you cannot be sure you have accumulated all the data in that class. Therefore, the horizontal scale for an ogive is always based on the upper class boundaries. Example: Construct 3 cumulative frequency graph (ogive) for this frequency distribution. Marks | Frequency ee Departncent of Mathenaatles STA 101 Lecture Notes for 2020/2021 Acad 30-54 55-59 60-64 65-69 10-74 15-73 10 80-84 85-89 90-94 95-99 Solution: To obtain a cumulative frequency distribution, the absolute frequencies are added successively as shown below: ‘warks | Cumulative frequency Tess than 55 T » 60 4 on 5 9 7 16 - Fy on» 80 34 eres 40 » 90 4s Aa 45 ow 100 50 | Cumulative frequency table ‘A graph of the cumulative frequency distribution for any set of data is called an give or cumulative frequency curve. To obtain an ogive, the cumulative frequency of each class is plotted against the upper boundary of that class and the points are Joined by a smooth curve. This curve is very useful for reading off the percentage of observations below or above a given value. Talon Cccture Notes for 2023/2024 Academic Session POQL30 Thus, the corresponding ogive of the cumulative frequency distribution of the cumulative frequency table given above is shown below: Ogive graph . 2. Carseat WE SO Caverrey CAs ae SS a Cpe ntiad Geet te—cs te Note: Every ogive starts on the left with a relative frequency of zero at the lower class boundary of the first class and ends on the right with a cumulative relative frequency of 100% at the upper class boundary of the last class. Pie graph (Diagram) ~ A pie graph is a circle that is divided into sections or wedged according to the percentage of frequencies in each category of the distribution. Steps for Construction of Pie Diagram ZL 2. Express each of the component values as percentage of the respective total Since the angle at the centre of the circle is 360°, the total magnitude of the varius components is taken to be equal to 360° and each component part is to be expressed proportionately in degrees. Draw a circle of appropriate radius using an appropriate scale depending on the space available. Having drawn the circle, draw any radius (preferably horizontal). Now with the radius as the base line draw an angle at the centre (with the help of protractor) equal to the degree represented by the first component. The sector so obtained represent the proportion of the first component. Different sector representing various component part are distinguished from one another by using different shades, dottings, colour, etc, or labels either inside the sector or outside the circle. -COGwe) Remarks: The degrees represented by the various component part of a given magnitude can be obtained as follows: Degree of any component part = Component value Total value * °° Example: Draw a pie diagram to represent the following data of proposed expenditure by a State Government for the 1997 ~ 1998 tems ‘Agriculture & Rural | industries & Urban | Health & ‘Miscellaneous Development. Development | Edueation Proposed 4200 1500 3,000 500 Expenditure fin milions) Solution: Calculations for Pie Chart items Proposed Expenditure ‘Angle atthe centre @) (2) (3) = {35 x360° ‘griculture & Rural Development 4.200 #8060'= 210° $8 x360"= 75° Industries & Urban Development 4,500 Health & Education 34,000 Miscellaneous 500 Total 7,200 Pie diagram representing proposed expenditures by the state government is as given below. PROPOSED EXPENDITURE Miscellaneous Health & Education Industries & Urban Development Department of Matheraat Agriculture & Rural Development Sra aaa Lecture Notes for 2028/2024 Academie Session Page 32 THE BINOMIAL THEOREM The binomial theorem is an algebraic method of expanding 2 binomial expression (vax). Theorem: If x and y are quantities and n is a positive integer, then we can expand (+3) in the form: vor ien Gh Gertie ee We can demonstrate this result easily with two examples; n= 2 andn Example 1: For m =2, the theorem gives: cee Example 2: For n= 3, the theorem gives: wor er peer Gh Psst sseyex Which is also verified by evaluating Department of Mathematics STA 101 Lecture Notts for 2023/2024 Page ss (r+x)=(40realyead= P4397 4aty 4? We are interested in the particular form of the theorem given by y=1-pandx=p, et-prol=E{"pt-n)” =1{since t= p+ p)'=1" =1) ‘There are numerous applications and identities concerned with th theorem but we limit ourselves to only a few of these that are directly applicable to the scope this, course. Pascal's Triangle The coefficients of the binomial expansion (i.e., the numbers pre-multiplying the x and y terms) form an interesting and useful pattern when looked at in isolation. =0. ‘We will expand (y+.x)" for a few values of n beginning with (x+y (c+) try (cs) <1 42a 4p? (ety) =i 430474397 41)? (c+)! =x +4rry 6x29? +ay? +1y* Writing the above coefficients in the form of a triangle gives the pattern shown below, and notice that any adjacent values added together gives the value immediately below it in the following row. This particular characteristic of the number triangle enables us immediately to write down the next row of the triangle, and the one after that, and so on. ee Diporonont of mathematics STA 10 Lecture Notes for 2028/2024 Acadinile Session Page st 1 4 6 4 4 Nothing that each row always begins and ends with a 1, we have the next row as: 1, 5(= 144), 10(= 446), 10(=6+4), 5(=441), and 1. The next row is 1, 6, 15, 20, 15, 6 and 1. We can, of course extend this process indefinitely. Notice, in particular, that each row of coefficients form a symmetric pattern, so that, for instance; (e+y) =¥ s3rtye3g% 43? ay 43ytxt3yx? tx? =(y+2), as we would expect. The ease with which we can generate binomial coefficients using Pascal's Triangle enables us to write down binomial expansions fairly rapidly. That is, the row 1, 5, 10, 10, 5 and 1 represents the coefficients in the expansion of (x +y)*, ie, (x+y) =x5 +5x4y+10x°y? +10x7y? +5xy* + y* or, alternatively: (vtx) sy! +5ytx+lOy'x? +10y2x? + 5x4 +34 Fora value of nas large as 20, say, Pascal's Triangle would grow rather large and in this case we could revert to the coefficients given in the theorem, namely C2} a Department of mathenantics STA 10! Lecture Notes for 2023/2024 Academie Session Page so NORMAL PROBABILITY DISTRIBUTIONS The normal probability distribution is considered the single most important probability distribution. An unlimited number of continuous random variables have either a normal or an approximately normal distribution, The normal probability distribution has a continuous random variable and it uses two functions: one function to determine the ordinates (y values) of the graph picturing the distribution and a second to determine the probabilities. The formula below expresses the ordinate (y value) that corresponds to each abscissa (x value). Note: Each different pair of values for the mean, x, and standard deviation, o, will result in a different normal probability distribution function. When a graph of all such points is drawn, the normal (bell-shaped) curve will appear as shown in this figure below: ease. PR ce ABILIT Gee J Dipartncent of Mathomdtice STA 10H Lecture Notes [or 2020/2021 Academie Session PAGE SE

You might also like