0% found this document useful (0 votes)
27 views18 pages

Chapter 1

The document discusses basic concepts of statistics including definitions, classifications, stages of statistical investigation, types of variables and levels of measurement. Descriptive statistics summarize and organize data while inferential statistics allow making conclusions about populations from samples. There are quantitative and qualitative variables that can be measured on nominal, ordinal, interval or ratio scales.

Uploaded by

lemmademe204
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views18 pages

Chapter 1

The document discusses basic concepts of statistics including definitions, classifications, stages of statistical investigation, types of variables and levels of measurement. Descriptive statistics summarize and organize data while inferential statistics allow making conclusions about populations from samples. There are quantitative and qualitative variables that can be measured on nominal, ordinal, interval or ratio scales.

Uploaded by

lemmademe204
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

AU Stat Dept.

Probability and Statistics Chapter-1

Chapter 1
1. Basic concepts, methods of data collection and presentation
1. 1. Introduction
1.1.1 Definition and Classification of Statistics
Definition: Statistics is the science of conducting studies to collect, organize, analyzed and draw
conclusions from the data.
In general, statistics can be defined into two senses.
1. In singular sense: It is a subject or science that deals with the methods of collection,
organization, analysis of data and interpretation of the results.
2. In plural sense: It is defined as a set (aggregate) of numerical data or a quantitative aspect of
facts.
1.1.2 Classification of Statistics
Statistics can be classified into two broad areas.
1. Descriptive Statistics: It is a part of statistics which can be used to organize and summarize
masses of data.
The frequency distribution, measure of central tendencies such as mean and median, and
measure of variation such as range and standard deviation belong to this category of
statistics.
❖ Example: The average age of students in this class is 21.
1. Inferential Statistics: It is a major part of statistics which concerned with making
decisions, inferences (conclusions) and forecasting about the population based on sample
results.
❖ It includes estimation and test of hypothesis about the population.
Example: Drinking decaffeinated coffee can raise cholesterol levels by 7%.
Exercise: Describe the following sentences whether inferential statistics or descriptive statistics.
Suppose that the height of 6 randomly selected students from section 2 are the following:
160cm, 165cm, 175cm, 170cm, 180cm and 185cm.
1. The average height of six students is 172.5cm.
2. The average height of students in this section is not less than 172.5cm.
3. About half of the six students have the height more than 170cm.
4. The average height of students in section 2 is greater than that of section 1.

By Getahun D. Lecture Notes Page 1


AU Stat Dept. Probability and Statistics Chapter-1

1.1.3 Stages in Statistical Investigation


According to the definition of statistics (in singular sense), there are 5 stages in statistical
investigation.
Stage 1: Collection of Data: It is a process of obtaining data.
Stage 2: Organization of Data: This includes
✓ Editing: measurement of how important it is.
✓ Classification: similar and differences.
✓ Tabulation: organization of data in row and column.
Stage 3: Presentation of Data: It is a process of showing our data in understandable way.
Example: charts, graphs and tables.
Stage 4: Analysis of Data: It is a process of extracting a useful characteristic associated with data.
Stage 5: Interpretation of Data (Inference): It is a process of making interpretations or
conclusions from sample data for the totality of the population.
It is the most difficult and risk stage. It needs professionals in statistics.
1.1.4 Definition of Some Basic Terms
Data: is any recordable interrelated observations.
Population: is the totality of all individuals of the phenomena under study.
Sample: It is a part of population selected in statistical manner to study the population.
Parameter: It is statistical value which refers to the population characteristics or it is a result
obtained from the population.
Statistic: It is statistical value which refers to the sample characteristics or it is a result obtained
from the sample.
Census: It is a process of studying a population at large.
Example: a researcher wants to study the academic performance of fist year student in MTU. But
for several constraints he cannot enumerate the whole students. So, he took randomly 500 students
and obtained the average GPA to be 2.58.
a. Identify the population? b. Identify the sample? c. Identify the statistic?
1.1.5 Uses, Applications and Limitation of Statistics
Uses of Statistics
a. It represents the facts in the form of numerical data.
b. It condenses and summarizes mass of data into a few presentable, understandable and precise
figures.

By Getahun D. Lecture Notes Page 2


AU Stat Dept. Probability and Statistics Chapter-1

c. It facilitates comparison of data.


d. It helps in predicting future trends.
e. It helps in formulating policies.
Applications of Statistics
Statistics is used in almost all fields of human activities and used by governmental bodies, private
business firms and research agencies as an indispensable tool. Particularly it is used in the
following area. For instance, in engineering, economics, natural science etc.
➢ To compare the breaking strength of two types of materials.
➢ To determine the probability of reliability of a product
➢ To asses past trend and current status and to forecast future economic activities for a
firm, an industry or economy as a whole.
➢ Determination of man power requirements personnel selection, making research,
financial analysis, distribution of analysis and development.
➢ In public administration and in the social science like in the studies of poverty,
population, voting pattern, accidents etc.
➢ In communicating information, drawing conclusions and inference from data and
guiding planning and decision.
Limitations of Statistics
✓ It is not suited to the study of qualitative phenomena.
✓ Its results are true on the average. (It does not show the exact fact) like law of physics.
✓ It deals with a set (aggregate) of individuals not a single individual.
✓ It can be easily misused.
✓ Statistical interpretations requires a high degree of skill and understanding of the
subject.
1.1.6 Types of Variables and Level of Measurements
Types of variables: There are two types of variables.
1. Qualitative (Categorical) Variables: are variables that can be placed into distinct category
according to some characteristics. They are not numeric. They cannot be counted or measured.
✓ Example: gender, religion, color etc
2. Quantitative Variables: are variables which are numerical in nature and can be measured and
counted.

By Getahun D. Lecture Notes Page 3


AU Stat Dept. Probability and Statistics Chapter-1

✓ Example: height, weight, no of students, GPA etc.


Quantitative variables can also divided into discrete and continuous variables.
▪ Discrete variables: are variables whose values are determined by counting.
Example: no of students in the class.
▪ Continuous Variables: are variables whose values are determined by measuring rather than
counting.
Example: height of a person.
Exercise: are the following variables discrete or continuous?
a. The no of correct answers on true false test.
b. The duration of effectiveness of a pain medication.

Measurement Scales (Levels)


There are 4 types of measurement scales. These are:
1. Nominal Scale 3. Interval Scale
2. Ordinal Scale 4. Ratio Scale
1. Nominal Scale: When the possible categories of a variable have no a natural order then the
measurement is called nominal scale.
We cannot apply any mathematical operations and inequalities.
Example: Blood type (A, B, AB, O), sex (f, m), no's given to region (1, 2, 3,...)
2. Ordinal Scale: When the possible categories of a variable have a natural order then the
measurement is called ordinal scale.
We can apply any mathematical inequalities but we cannot apply any mathematical operations
Example: Economic status (low, medium, high), Education level (diploma, degree, master).
3. Interval Scale: It is a scale with arbitrary zero point, and zero does not shows a total absence
of the quantity being measured.
We can apply any mathematical inequalities.
We can also apply addition and subtraction but we cannot form multiplication and division.
Example: a) The temperature of a certain area may be00𝐶 . But this does not mean that there is no
heat at all. It simply indicates that it is too cool.
b) The temperature of a certain areas may be 630𝐹 , 680𝐹 , 1100𝐹 , 1260𝐹 & 1310𝐹 .
𝑤𝑒 𝑐𝑎𝑛 𝑠𝑎𝑦 𝑡ℎ𝑎𝑡 680𝐹 > 630𝐹 => 680𝐹 𝑖𝑠 𝑤𝑎𝑟𝑚𝑒𝑟 𝑡ℎ𝑎𝑛 630𝐹 .
680𝐹 − 630𝐹 = 1310𝐹 − 1260𝐹

By Getahun D. Lecture Notes Page 4


AU Stat Dept. Probability and Statistics Chapter-1

𝑠𝑖𝑛𝑐𝑒 𝑒𝑞𝑢𝑎𝑙 𝑡𝑒𝑚𝑝𝑟𝑎𝑡𝑢𝑟𝑒 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒𝑠 𝑎𝑟𝑒 𝑒𝑞𝑢𝑎𝑙.


126
✓ But we cannot say that 1260𝐹 is twice as hot as 630𝐹 . 𝐸𝑣𝑒𝑛𝑡ℎ𝑜𝑔ℎ = 2.
63

✓ To show this change the scale to degree Celsius.


5
1260𝐹 => ( 1260𝐹 − 32) = 52.20𝐶
9
5
630𝐹 => ( 630𝐹 − 32) = 17.20𝐶
9
=> 52.20𝐶 𝑖𝑠 𝑚𝑜𝑟𝑒 𝑡ℎ𝑎𝑛 3 𝑡𝑖𝑚𝑒𝑠 17.20𝐶
4. Ratio Scale: It is a scale with true zero point and zero shows a total absence of the quantity
being measured.
We can apply any mathematical operation and inequalities.
Example: weight 𝑥 = 40𝑘𝑔, 𝑦 = 80𝑘𝑔.=> 𝑦 𝑖𝑠 𝑡𝑤𝑖𝑐𝑒 ℎ𝑒𝑎𝑣𝑦 𝑎𝑠 𝑥.

1.2. METHOD OF DATA COLLECTION AND PRESENTATION


1.2.1 Source and Types of Data
There are two types of data:
a. Primary Data
Data collected by the investigator directly from the source.
Example: observe signs, measure characteristics, record symptoms and interview respondents, etc.
Two activities involved: planning and measuring.
✓ Identify source and elements of the data.
✓ Decide whether to consider sample or census.
✓ If sampling is preferred, decide on sample size, selection method, etc.
✓ Decide measurement procedure.
✓ Set up the necessary organizational structure.
b. Secondary Data
• Data gathered or compiled from published and unpublished sources or files.
Example: Hospital records, vital statistics and registers, etc.
• When our source is secondary data check that:
✓ The type and objective of the situations.
✓ The purpose for which the data are collected and compatible with the present problem.
✓ The nature and classification of data is appropriate to our problem.
✓ There are no biases and misreporting in the published data.
Note: Data which are primary for one may be secondary for the other.
By Getahun D. Lecture Notes Page 5
AU Stat Dept. Probability and Statistics Chapter-1

1.2.2 Methods of Data Collection


There are three major methods of data collection.
1. Observational or measurement.
2. Interview with questionnaires.
a. Face to face interview.
b. Telephone interview.
c. Self-administered questionnaires returned by mail (mailed questionnaire).
3. The use of documentary sources

1. Observational or measurement (direct personal observation)


In this case data can be obtained through direct observation or measurement. This requires
training and monitoring of the measurer to ensure the use of standard procedure.
Provides accurate information but it is expensive and inconvenient.
Example: physical examination, clinical measurements, laboratory tests etc.
2. Interview with questionnaires: Hear one drafts a detailed questionnaire. These
questionnaires can either be mailed to the respondent for filling and returning, or can put in
charge of the enumerators who go around and fill them after obtaining the desired information.
Questionnaires: are written documents which instruct the reader or listener to answer
the questions written on it.
Respondents (Interviewees): are individuals those who are answered the questions on
the questionnaire.
Interviewers: are individuals those who are recorded the responses given by the
respondents.
a) Face to Face Interviews (questionnaires in charge of enumerators)
The interviewer knows exactly who is responding to the questionnaire.
Advantages
❖ The interviewer can help the respondent if he/she has difficulty in understanding the
questions. The difficulty could be due to language, concentration or limited intellectual
capacity.
❖ There is more flexibility in presenting the items; they can range from closed to open.
❖ There is the ability to use the method of skip patterns.

By Getahun D. Lecture Notes Page 6


AU Stat Dept. Probability and Statistics Chapter-1

❖ Skip patterns means skipping a question or a group of questions which are not applicable.
Disadvantages
❖ Untrained interviewer may distort the meaning of the questions.
❖ Attribute of the interviewer may affect the responses due to:
a) Bias of the interviewer and b) his/her social or ethnic characteristics.
❖ It costs much in terms of time and money.
b) Telephone Interviews
Advantages
❖ It is less expensive in time and money compared with face-to-face interviews.
❖ The interviewer is able to help the respondent if he/she doesn’t understand the question
(as seen with face-to-face interview)
❖ Broad representative samples can be obtained for those who have telephone lines.
Disadvantage
❖ Under representation of those groups which do not have telephones.
❖ Problem with unlisted telephone number in the directory.
❖ Respondent may be substituted by another.
c) Self administered questionnaires returned by mail (mailed questionnaire)
Here the questionnaire is mailed to the respondents to be filled. Sometimes
it is known as self-enumeration.
Advantages
❖ These are the cheapest.
❖ There is no need for trained interviewer.
❖ There is no interviewer bias.
Disadvantage
❖ Low response rate
❖ Uncompleted questionnaires due to omission or invalid responses.
❖ No assurance that the questionnaire was answered by the right person
❖ Needs intense follow up to get a high response rate.
3. The use of documentary sources
Extracting information from existing sources (e.g. Hospital records) is much less expensive
than the other two methods. It can be an important source of data.

By Getahun D. Lecture Notes Page 7


AU Stat Dept. Probability and Statistics Chapter-1

Limitation: It is difficult to get information needed, when records are compiled in


unstandardized manner.
1.2.3 METHODS OF DATA PRESENTATION

After having the collected and edited data, the next important step is to organize it. That is
to present it in a readily comprehensible condensed form that aids to draw inferences from
it. It is also necessary that the like be separated from the unlike ones.

✓ The presentation of data is broadly classified in to the following three categories:


Tabular presentation (frequency distribution).
Diagrammatical presentation and
Graphical presentation.
1.2.3.1 Tabular Presentation of Data (Frequency Distribution)
Definitions:
Raw data: is a data which is collected in original form (survey), whether it may be counts
or measurements.
Frequency (f): is the number of observations (values) in a specific class of a distribution.
Frequency distribution (FD): is the organization of raw data in table form, using classes
and frequencies.
✓ Depending on the type of data, there are two basic types of frequency distributions:
➢ Qualitative (Categorical) frequency distribution and
➢ Quantitative frequency distribution Ungrouped frequency distribution.
Grouped frequency distribution.
NB: The main purpose of grouping is now summarization and condensation of a masses of
data.
1). Categorical (Qualitative) frequency Distribution:
It is often constructed for some data sets that can be placed in a specific category such as nominal,
or ordinal data.
Example: A social worker collected the following data on marital status for 25 persons.
(𝑀 = 𝑚𝑎𝑟𝑟𝑖𝑒𝑑, 𝑆 = 𝑠𝑖𝑛𝑔𝑙𝑒, 𝑊 = 𝑤𝑖𝑑𝑜𝑤𝑒𝑑, 𝐷 = 𝑑𝑖𝑣𝑜𝑟𝑐𝑒𝑑). Construct a frequency
distribution for the following data.

M S D W D
S S M M M

By Getahun D. Lecture Notes Page 8


AU Stat Dept. Probability and Statistics Chapter-1

W D S M M
W D D S S
S W W D D
Solution: Since the data are qualitative (categorical), discrete classes can be used. There are four types
of marital status M, S, D, and W. These types will be used as the classes for the distribution.

Classes Frequency (f)


M 6
S 7
D 7
W 5

2). Quantitative frequency Distribution:


a ). Ungrouped frequency Distribution:
It is often constructed for some data sets in which the number of "distinct values" are small. And
also it is constructed for small set or data on discrete variable.
Steps for constructing ungrouped frequency distribution:
▪ Arrange the data in order of magnitude and then count the frequency.
Example: A survey taken in a restaurant shows that the following number of cups of coffee
consumed with each meal. Construct an ungrouped frequency distribution for the following data.

0 2 2 1 1 2
3 5 3 2 2 2
1 0 1 2 4 2
0 1 0 1 4 4
2 2 0 1 1 5

Solution: First arrange the data in order of magnitude (in ascending order) and then count the
frequency. The distinct values for these data are: 0,1,2,3,4 & 5. => 𝑠𝑚𝑎𝑙𝑙.
No of cups Frequency (f)
0 5
1 8
2 10
3 2
4 3
5 2
Total 30

By Getahun D. Lecture Notes Page 9


AU Stat Dept. Probability and Statistics Chapter-1

✓ Each individual value is presented separately, that is why it is named ungrouped frequency
distribution.

b ). Grouped frequency Distribution:


When the number of "distinct values" of the data is too large, the data must be grouped in to
classes. So, we divide the values into groups or class intervals, and then count the number of data
values falling in each class interval.

Class intervals (CI): are a non-overlapping interval such that each value in the set of
observations can be placed in one, and only one, of the intervals.

Steps for constructing Grouped frequency Distribution

1. First arrange the data in ascending order.


2. Find the range (R) : 𝑹 = 𝑴𝒂𝒙𝒊𝒎𝒖𝒎 − 𝑴𝒊𝒏𝒊𝒎𝒖𝒎
3. Find the number of class intervals (k): It should be between 5 and 20. i.e. 5 ≤ 𝑘 ≤ 20 or
𝒖𝒔𝒆 𝑺𝒕𝒖𝒓𝒈𝒆′𝒔 𝒇𝒐𝒓𝒎𝒖𝒍𝒂: 𝒌 = 𝟏 + 𝟑. 𝟑𝟐𝟐 𝒙 𝐥𝐨𝐠 𝟏𝟎 𝒏.
where: k is the number of class intervals desired and n is the total number of observations.
NB: k must be rounded up to the nearest whole number.
4. Find the class width (w): It is the gap between two consecutive class intervals.
𝑹
𝒘= and it is always rounded up.
𝒌

✓ When the data is given as


• Whole number "w" always rounded up to the next whole number. e.g. 𝑤 = 4.13 ≈ 5
• Tenth digit "w" always rounded up to the next tenth digit. For e.g. 𝑤 = 0.325 ≈ 0.4.
• Hundredth digit "w" always rounded up to the next hundredth digit. For e.g.
𝑤 = 2.532 ≈ 2.54 ; 𝑤 = 0.981 ≈ 0.99.
4. Find the class limits (CL): separate one class from another in a frequency distribution.
These are extreme values for each class. They are called lower- and upper-class limits.
• Lower class limit (LCL): The LCL of the first-class interval should be equal to
or smaller than the smallest observation in the data. i.e. 𝒍𝒄𝒍𝟏 ≤
𝒕𝒉𝒆 𝒔𝒎𝒂𝒍𝒍𝒆𝒔𝒕 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏 => 𝒍𝒄𝒍𝟏 = 𝒕𝒉𝒆 𝒔𝒎𝒂𝒍𝒍𝒆𝒔𝒕 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏.
Continue to add the class width to this lower limit to get the rest of the lower
limits. i.e. 𝒍𝒄𝒍𝒊+𝟏 = 𝒍𝒄𝒍𝒊 + 𝒘 , 𝑖 = 1,2, … , 𝑘 − 1.
• Upper class limit (UCL): To find the upper-class limit of the first class, subtract
"𝒖" from the lower limit of the second class. 𝑖. 𝑒. 𝒖𝒄𝒍𝟏 = 𝒍𝒄𝒍𝟐 − 𝒖.

By Getahun D. Lecture Notes Page 10


AU Stat Dept. Probability and Statistics Chapter-1

Then continue to add the class width to this upper limit to get the rest of the
upper class limits. i.e. 𝒖𝒄𝒍𝒊+𝟏 = 𝒖𝒄𝒍𝒊 + 𝒘 , 𝑖 = 1,2, … , 𝑘 − 1.

✓ where "𝒖" is a unit measurement or the smallest difference between the two nearest
observations in the data. It is usually taken as 1, 0.1, 0.01,... as the data is given as whole
numbers , tenth digit, hundredth digit , ... respectively.
6. Find the frequencies.
o Class boundaries (CB): separate one class from another but there is no gab b/n
the consecutive classes.
are the set of exact limits or true limits. They are called lower- and upper-class
boundaries.
o Lower class boundary (LCB): The lcb is obtained by subtracting half the unit
of measurements from the lcl of the class. i.e.
𝒖
𝒍𝒄𝒃𝒊 = 𝒍𝒄𝒍𝒊 − 𝟐 𝑵𝒐𝒕𝒆: 𝒍𝒄𝒃𝒊+𝟏 = 𝒍𝒄𝒃𝒊 + 𝒘

o Upper class boundary (UCB): The ucb is obtained by adding half the unit of
measurements from the ucl of the class. i.e.
𝒖
𝒖𝒄𝒃𝒊 = 𝒖𝒄𝒍𝒊 + 𝟐 𝑵𝒐𝒕𝒆: 𝒖𝒄𝒃𝒊+𝟏 = 𝒖𝒄𝒃𝒊 + 𝒘

❖ Class marks (mid points) (m): It is the average of lcl and ucl or lcb and ucb.
𝒍𝒄𝒍𝒊 +𝒖𝒄𝒍𝒊 𝒍𝒄𝒃𝒊 +𝒖𝒄𝒃𝒊
𝒎𝒊 = 𝒐𝒓 𝒎𝒊 = 𝑵𝒐𝒕𝒆: 𝒎𝒊+𝟏 = 𝒎𝒊 + 𝒘
𝟐 𝟐

Modified frequency distribution


𝒇𝒊
✓ Relative frequency (rf): 𝒓𝒇 = 𝒏
𝒇𝒊
✓ Percentage relative frequency (%rf): %𝒓𝒇 = 𝒙𝟏𝟎𝟎%
𝒏
✓ Cumulative frequency: is the number of observations less than/more than or equal to a
specific value.
✓ Less than cumulative frequency (lcf): it is the total frequency of all values less than or equal
to the upper-class boundary of a given class.
✓ More than cumulative frequency (mcf): it is the total frequency of all values greater than
or equal to the lower-class boundary of a given class.
✓ Relative cumulative frequency (rcf): it is the cumulative frequency divided by the total
frequency.
Example: Construct a grouped frequency distribution for the following data.
11 29 6 33 14 31 22 27 19 20
18 17 22 38 23 21 26 34 39 27
Solutions:

By Getahun D. Lecture Notes Page 11


AU Stat Dept. Probability and Statistics Chapter-1

Step 1: Arrange the data in ascending order.


Step 2: Find the range (R) : 𝑅 = 𝑀𝑎𝑥 − 𝑀𝑖𝑛 = 39 − 6 = 33.
Step 3: Select the number of classes desired using Sturge's formula;
𝑘 = 1 + 3.322 𝑥 𝑙𝑜𝑔 𝑛 = 𝑘 = 1 + 3.322 𝑥 𝑙𝑜𝑔(20) = 5.32 ≈ 5 (𝑟𝑜𝑢𝑛𝑑𝑖𝑛𝑔 𝑑𝑜𝑤𝑛).
𝑅 33
Step 4: Find the class width; 𝑤 = =𝑤= = 6.6 ≈ 7 (𝑟𝑜𝑢𝑛𝑑𝑖𝑛𝑔 𝑢𝑝).
𝑘 5

Step 5: Find the lower and the upper class limits.


Select the starting point, let it be the smallest observation.
▪ 6, 13, 20, 27, 34 are the lower class limits.
Find the upper class limits; e.g. the first upper class limit (𝑢𝑐𝑙1 ) = 13 − 𝑈 = 13 − 1 = 12.
𝑢 = 1 𝑠𝑖𝑛𝑐𝑒 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑖𝑠 𝑔𝑖𝑣𝑒𝑛 𝑎𝑠 𝑎 𝑤ℎ𝑜𝑙𝑒 𝑛𝑢𝑚𝑏𝑒𝑟.
▪ 12, 19, 26, 33, 40 are the upper class limits.
So combining 𝒍𝒄𝒍 𝒂𝒏𝒅 𝒖𝒄𝒍, one can construct the following classes.
Class limits
6 – 12
13 – 19
20 – 26
27 – 33
34 – 40

Step 6: Find the class boundaries;


𝑢 1
𝐸. 𝑔. 𝑓𝑜𝑟 𝑐𝑙𝑎𝑠𝑠 1; 𝑙𝑐𝑏1 = 6 − 2 = 6 − 2 = 5.5
𝑢 1
𝑢𝑐𝑏1 = 12 + 2 = 12 + 2 = 12.5

• Then continue adding 𝒘 on both boundaries to obtain the rest boundaries. By doing so one
can obtain the following classes.

Class boundary
5.5 – 12.5
12.5 – 19.5
19.5 – 26.5
26.5 – 33.5
33.5 – 39.5
Step 7: Find the frequencies.

✓ The complete frequency distribution is given as follows:

By Getahun D. Lecture Notes Page 12


AU Stat Dept. Probability and Statistics Chapter-1

Class Class Class f Lcf Mcf rf. %rf %rcf


limit boundary Mark
6 – 12 5.5 – 12.5 9 2 ≤ 12.5 (≤ 12) =2 ≥ 5.5 (≥ 6) = 20 0.10 10% 10%
13 – 19 12.5 – 19.5 16 4 ≤ 19.5 (≤ 19) = 6 ≥ 12.5 (≥ 13) = 18 0.20 20% 30%
20 – 26 19.5 – 26.5 23 6 ≤ 26.5 (≤ 26) = 12 ≥ 19.5 (≥ 20) = 14 0.30 30% 60%
27 – 33 26.5 – 33.5 30 5 ≤ 33.5 (≤ 33) = 17 ≥ 26.5 (≥ 27) = 8 0.25 25% 85%
34 – 40 33.5 – 39.5 37 3 ≤ 39.5 (≤ 39) = 20 ≥ 33.5 (≥ 34) = 3 0.15 15% 100%

2.3.2 DIAGRAMATICAL PRESENTATION OF DATA


These are techniques for presenting data in visual displays using geometric and pictures.
Importance:
▪ They have greater attraction.
▪ They facilitate comparison.
▪ They are easy to understand.
✓ Diagrams are appropriate for presenting discrete data.
◼ The two most commonly used diagrammatic presentation for discrete as well as
qualitative data are:
• Bar charts and • Pie charts
1. Bar chart
There are three types of bar charts. These are:
I) Simple bar chart II) Component bar chart III) Multiple bar chart

a). Simple Bar chart:


It is a chart which is used to present data that has only one variable. It shows
changes in the totals of different categories.
Example: Construct a simple bar chart for the following table showing annual cases of
HIV patients reported in Ethiopia as of July 31, 1993.

Year of report 1986 1987 1988 1989 1990 1991 1992 1993
Cases 2 17 87 190 448 885 3256 2814

By Getahun D. Lecture Notes Page 13


AU Stat Dept. Probability and Statistics Chapter-1

b). Component Bar chart


It is used to present data which have more than one variable. For each category the bars are
subdivided in to components to allow comparison between parts. The bars represent the total value
of a variable with each total broken in to its component parts and different colors or designs are
used for identifications.
Example
Construct component bar chart for the number of children who were vaccinated with DPT, POLIO
and BCG antigens in Mizan-Aman General Hospital in 1979 E.C.

Sex
Antigen Male Female Total
DPT 250 300 550
Polio 300 320 620
BCG 200 210 410

By Getahun D. Lecture Notes Page 14


AU Stat Dept. Probability and Statistics Chapter-1

c). Multiple Bar chart


✓ These are used to display data on more than one variable.
✓ They are used for comparing different variables at the same time.
Example: draw a multiple bar chart for the above vaccination data.

2. Pie-Chart
It is used to show the partitioning of a total data into its component parts using circles. The
circles should be divided into sectors proportional to the frequencies of the categories they
represent.
Steps to draw a pie chart
1. Convert frequencies into percentage relative frequency.
2. Draw a circle of any radius.
3. Convert percentage relative frequencies into degree measures.
𝟑𝟔𝟎𝟎 𝒙 %𝒓𝒇
𝒂𝒏𝒈𝒍𝒆 𝒐𝒇 𝒂 𝒔𝒆𝒄𝒕𝒐𝒓 =
𝟏𝟎𝟎%
Example
Draw the pie chart for the following data. First construct a table providing the central angles.

Wards Frequency Percentage rf Central angle


Medical A 55 27.5% 99
Medical B 30 15% 54
Surgical A 40 20% 72
Surgical B 25 12.5% 45
Pediatrics 50 25% 90
Total 200 100% 360

By Getahun D. Lecture Notes Page 15


AU Stat Dept. Probability and Statistics Chapter-1

2.3.3 Graphical presentation of data


a) Histogram
It presents a grouped frequency distribution of a continuous type. It is drawn by making class
boundaries in the x-axis and frequencies in the y-axis.
Example: Draw a histogram for the following grouped age data.

Class limit Class boundaries Mid point Frequency


15-19 14.5-19.5 17 2
20-24 19.5-24.5 22 8
25-29 24.5-29.5 27 6
30-34 29.5-34.5 32 12
35-39 34.5-39.5 37 7
40-44 39.5-44.5 42 6
45-49 44.5-49.5 47 4
50-54 49.5-54.5 52 3
55-59 54.5-59.5 57 1
60-64 59.5-64.5 62 1

Histogram

b) Frequency polygon
It is a multi-sided figure which is drawn by plotting the class marks (midpoints) in the x-axis and
the frequencies in the y-axis. Then connect the points with straight lines and extend these lines on
both ends so that it reaches the horizontal axis at the class mid points. This allows the total area to
be enclosed.
Example: draw the frequency polygon for the following age data.

By Getahun D. Lecture Notes Page 16


AU Stat Dept. Probability and Statistics Chapter-1

Class limit Mid point Frequency


15-19 17 2
20-24 22 8
25-29 27 6
30-34 32 12
35-39 37 7
40-44 42 6
45-49 47 4
50-54 52 3
55-59 57 1
60-64 62 1

Note: The total area under the frequency polygon is equal to the area under the histogram.

c) Ogives or cumulative frequency polygon (curve)


It plotted in association with the class boundaries on the x- axis and the cumulative frequencies on
the y- axis. Then connect the points with straight lines.
✓ The curves obtained are called the “less than” and “more than” ogives (curves).
▪ Less than ogive: It is plotted by "UCB" in the x-axis against the "lcf" in the y-axis.
▪ More than ogive: It is plotted by "LCB" in the x-axis against the "mcf" in the y-axis.
Example: draw the less than and more than ogives for the following age data.

Class limit Frequency LCF More than


23-26 3 ≤ 26.5 (≤ 26) = 3 ≥ 22.5 (≥ 23) = 20
27-30 4 ≤ 30.5 (≤ 30) = 7 ≥ 26.5 (≥ 27) = 17
31-34 3 ≤ 34.5 (≤ 34) = 10 ≥ 30.5 (≥ 31) = 13
35-38 5 ≤ 38.5 (≤ 38) = 15 ≥ 34.5 (≥ 35) = 10
39-42 5 ≤ 42.5 (≤ 42) = 20 ≥ 38.5 (≥ 39) = 5

By Getahun D. Lecture Notes Page 17


AU Stat Dept. Probability and Statistics Chapter-1

By Getahun D. Lecture Notes Page 18

You might also like