0% found this document useful (0 votes)
37 views

Chapter 2

This document discusses methods of data presentation in statistics. It describes two types of data: primary data collected directly by researchers, and secondary data previously collected. There are three main methods of data collection: questionnaires, direct investigation through measurement/observation, and using documentary sources. The document also discusses organizing data through classification, tabulation, and creating frequency distributions to summarize raw data in table form using classes and frequencies.

Uploaded by

kader Arefe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Chapter 2

This document discusses methods of data presentation in statistics. It describes two types of data: primary data collected directly by researchers, and secondary data previously collected. There are three main methods of data collection: questionnaires, direct investigation through measurement/observation, and using documentary sources. The document also discusses organizing data through classification, tabulation, and creating frequency distributions to summarize raw data in table form using classes and frequencies.

Uploaded by

kader Arefe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Lecture notes on Introduction to Statistics Chapter 2 METHODS OF DATA PRESNTATION

CHAPTER 2
2. METHODS OF DATA PRESENTATION
Once it is decide what type of study is to be made, it becomes necessary to collected
information about the concerned study, mostly in the form of data. In order to generate
valid conclusion from a data, information has to be collected in a systematic manner.
Whatever the quality of sampling and analysis method, a haphazardly collected dataset is
less likely to produce valuable and generalizable information.

Types of Data:- There are two types (sources) for the collection of data.
1) Primary data
The primary data are the first hand information collected, compiled and published by
organization for some purpose. They are most original data in character and have not
undergone any sort of statistical treatment.
Refer to those that are collected by conducting survey to meet the specific problem needs
at hand.
Example: Population census reports are primary data because these are collected,
complied and published by the population census organization.

2) Secondary data
The secondary data are the second hand information which are already collected by
someone(organization) for some purpose and are available for the present study.
The secondary data are not pure in character and have undergone some treatment at least
once.Data taken from already available published or unpublished source.
2.1 Methods of collection
There are three major methods of data collection
1. self-administered questionnaire
2. direct investigation-measurement (observation) of the subject and interviewing(face-
toface, telephone, --- )
3. the use of documentary source
1. Self-administered questionnaire
Questionnaire is the main data collection instrument in formal sample survey. Before
examining the steps in designing a questionnaire we need to review the types of questions
used in questionnaires. Depending on the amount of freedom given to respondent in
offering responses, there are two basic types of questions that can be used in
questionnaires: openended questions and closed ended questions.
The type of questions for use will be determined by the form of responses wanted, the
nature of the respondents and their ability to answer the questions.
Open–ended questions:- allows the respondent to answer it freely in his or her own
words
Example: what do you think are the reasons for a high drop-out rate of village health
committee members?
Closed–ended questions:- Predetermined list of alternate responses is presented to the
respondent for checking the appropriate one(s). It implies that the respondent‟s answers
are restricted in some way to a limited range of alternatives.

2. Direct investigation

Page 1 of 16
Lecture notes on Introduction to Statistics Chapter 2 METHODS OF DATA PRESNTATION

i) Measurement or/and observation


Advantage
It is the cheapest and can be conducted by a single researcher.
Questionnaires can be sending to a wide geographical area.
There is no interviewer variability
Disadvantage
Low response rate
No assurance that the questioners were answered by the right person.
Mail questionnaire is not suitable for illiterate community

Data can be obtained through direct observation or measurement that provides accurate
information but it is expensive and inconvenient
eg: Land area measurement, Animal weight gain, Physical examination, direct
observation of work.
ii) Interview
a) Face-to-Face interview
Advantage:-
Interviewers can observe the surroundings and can use nonverbal communication and
visual aids.
The interviewer can help the respondent if he/she has difficulty in understanding the
questions.
Respondent is likely to answer all the questions alone
Disadvantage:-
Cost is high
Interviewer bias is also high
Untrained interviewer may distort the meaning of the questions
b) Telephone Interview
Advantage:-
It is less expensive in time and money compared to face to face interviews
Relatively high response rate
Reach people who would not open their doors to an interviewer, but might willing to
talk on the telephone
Disadvantage:-
Unrepresentative of the groups which do not have telephones
Unlisted telephone numbers are excluded from the study.
Respondent may be substitute by another
3. The use of documentary source
Extracting information from existing resources.
Is much less expensive than any other two sources
It is difficult to get the information needed when records are compiled in un standardized
manner.
Example: - Hospital records, professional institutes, Official statistics, - - -

2.2. Methods of Data Organization


This topic introduces tabular and graphical methods commonly used to summarize both

Page 2 of 16
Lecture notes on Introduction to Statistics Chapter 2 METHODS OF DATA PRESNTATION

qualitative and quantitative data. Tabular and graphical summaries of data can be
obtained in annual reports, newspaper articles and research studies. Everyone is exposed
to these types of presentations, so it is important to understand how they are prepared and
how they will be interpreted.
Modern statistical software packages provide extensive capabilities for summarizing data
and preparing graphical presentations. MINITAB, SPSS and STATA are three packages
that are widely available.
2.2.1 Editing of Data.
After collecting the data either from primary or secondary source, the next step is its
editing.Editing means the examination of collected data to discover any error and mistake
before presenting it. It has to be decided before hand what degree of accuracy is wanted
and what extent of errors can be tolerated in the inquiry. The editing of secondary data is
simpler than that of primary data.
2.2.2. Classification of Data
The process of arranging data into homogenous group or classes according to some
common characteristics present in the data is called classification. For Example, The
process of sorting letters in a post office, the letters are classified according to the regions
and further arranged according to zones, cities, etc.
Bases of Classification:- There are four important bases of classification:
(1) Qualitative Base:- When the data are classified according to some quality or
attributes such as sex, religion, literacy, intelligence etc…
(2) Quantitative Base:- When the data are classified by quantitative characteristics like
heights,weights, ages, income etc…
(3) Geographical Base:- When the data are classified by geographical regions or
location, like states, provinces, cities, countries etc…
(4) Chronological or Temporal Base:- When the data are classified or arranged by their
time of occurrence, such as years, months, weeks, days etc…
For Example: Time series data.
2.2.3 Tabulation of Data
The process of placing classified data into tabular form is known as tabulation. A table is
a symmetric arrangement of statistical data in rows and columns. Rows are horizontal
arrangements whereas columns are vertical arrangements.

2.2.4 Frequency distribution.

Frequency distribution: is the organization of raw data in table form using classes and
frequencies.

Raw data: recorded information in its original collected form, whether it is counts or
measurements, is referred to as raw data.

Frequency: is the number of values in a specific class of the distribution.

There are three basic types of frequency distributions

Page 3 of 16
Lecture notes on Introduction to Statistics Chapter 2 METHODS OF DATA PRESNTATION

 Categorical frequency distribution


 Ungrouped frequency distribution
 Grouped frequency distribution

2.2.4.1. Categorical Frequency Distribution:- The categorical frequency distribution is


used for data which can be placed in specific categories such as nominal or ordinal level
data. For example, data such as data such as political affiliation, religious affiliation, or
major field of study would use categorical frequency distribution.
The major components of categorical frequency distribution are class, tally and
frequency.
Moreover, even if percentage is not normally a part of a frequency distribution, it will be
added since it is used in certain types of graphical presentations, such as pie graph.
Example 2.1: Twenty-five army inductees were given a blood test to determine their
blood type.
The data set is given as follows:
B B AB O
O B AB B Construct a frequency distribution for the data.
B O A O Solution:
O O O AB
A O B A
A B C D
Class Tally Frequency Percent
A //// 5 20
B //// // 7 28
O //// //// 9 36
AB //// 4 16

Used for data that can be place in specific categories such as nominal, or ordinal. e.g.
marital status.

Example 2: a social worker collected the following data on marital status for 25
persons.(M=married, S=single, W=widowed, D=divorced)

M S D W D
S S M M M
W D S M M
W D D S S
S W W D D

Solution:

Page 4 of 16
Lecture notes on Introduction to Statistics Chapter 2 METHODS OF DATA PRESNTATION

Since the data are categorical, discrete classes can be used. There are four types of marital
status M, S, D, and W. These types will be used as class for the distribution. We follow
procedure to construct the frequency distribution.

Step 1: Make a table as shown.

Class Tally Frequency Percent

(1) (2) (3) (4)


M
S
D
W

Step 2: Tally the data and place the result in column (2).

Step 3: Count the tally and place the result in column (3).

Step 4: Find the percentages of values in each class by using;

f
% * 100
n Where f= frequency of the class, n=total number of value.

Percentages are not normally a part of frequency distribution but they can be added since
they are used in certain types diagrammatic such as pie charts.

Step 5: Find the total for column (3) and (4).

Combing the entire steps one can construct the following frequency distribution.

Class Tally Frequency Percent

(1) (2) (3) (4)


M ///// 6 24

S //// // 7 28
D //// // 7 28
W //// 5 24

Page 5 of 16
Lecture notes on Introduction to Statistics Chapter 2 METHODS OF DATA PRESNTATION

ih

2.Ungrouped frequency Distribution

-Is a table of all the potential raw score values that could possible occur in the data along
with the number of times each actually occurred.

-Is often constructed for small set or data on discrete variable.

Steps in constructing ungrouped frequency distribution:

 First find the smallest and largest raw score in the collected data.
 Arrange the data in order of magnitude and count the frequency.
 To facilitate counting one may include a column of tallies.

Example:

The following data represent the mark of 20 students.

80 76 90 85 80
70 60 62 70 85
65 60 63 74 75
76 70 70 80 85

Construct a frequency distribution, which is ungrouped.


Solution:

Step 1: Find the range, Range=Max-Min=90-60=30.


Step 2: Make a table as shown
Step 3: Tally the data.
Step 4: Compute the frequency.
Mark Tally Frequency
60 // 2
62 / 1
63 / 1
65 / 1
70 //// 4
74 / 1
75 / 1
76 // 2
80 /// 3
85 /// 3
90 / 1

Each individual value is presented separately, that is why it is named ungrouped


frequency distribution.

Page 6 of 16
Lecture notes on Introduction to Statistics Chapter 2 METHODS OF DATA PRESNTATION

1) Grouped frequency Distribution

-When the range of the data is large, the data must be grouped in to classes that are more than
one unit in width.

Definitions:

 Grouped Frequency Distribution: a frequency distribution when several numbers


are grouped in one class.
 Class limits: Separates one class in a grouped frequency distribution from another.
The limits could actually appear in the data and have gaps between the upper limits of
one class and lower limit of the next.
 Units of measurement (U): the distance between two possible consecutive measures.
It is usually taken as 1, 0.1, 0.01, 0.001, -----.
 Class boundaries: Separates one class in a grouped frequency distribution from
another. The boundaries have one more decimal places than the row data and
therefore do not appear in the data. There is no gap between the upper boundary of
one class and lower boundary of the next class. The lower class boundary is found by
subtracting U/2 from the corresponding lower class limit and the upper class
boundary is found by adding U/2 to the corresponding upper class limit.
 Class width: the difference between the upper and lower class boundaries of any
class. It is also the difference between the lower limits of any two consecutive classes
or the difference between any two consecutive class marks.
 Class mark (Mid points): it is the average of the lower and upper class limits or the
average of upper and lower class boundary.
 Cumulative frequency: is the number of observations less than/more than or equal to
a specific value.
 Cumulative frequency above: it is the total frequency of all values greater than or
equal to the lower class boundary of a given class.
 Cumulative frequency blow: it is the total frequency of all values less than or equal
to the upper class boundary of a given class.
 Cumulative Frequency Distribution (CFD): it is the tabular arrangement of class
interval together with their corresponding cumulative frequencies. It can be more than
or less than type, depending on the type of cumulative frequency used.
 Relative frequency (rf): it is the frequency divided by the total frequency.
 Relative cumulative frequency (rcf): it is the cumulative frequency divided by the
total frequency.

Guidelines for classes

1. There should be between 5 and 20 classes.


2. The classes must be mutually exclusive. This means that no data value can fall
into two different classes
3. The classes must be all inclusive or exhaustive. This means that all data values
must be included.
4. The classes must be continuous. There are no gaps in a frequency distribution.

Page 7 of 16
Lecture notes on Introduction to Statistics Chapter 2 METHODS OF DATA PRESNTATION

5. The classes must be equal in width. The exception here is the first or last class. It
is possible to have an "below ..." or "... and above" class. This is often used with
ages.

Steps for constructing Grouped frequency Distribution

1. Find the largest and smallest values


2. Compute the Range(R) = Maximum - Minimum
3. Select the number of classes desired, usually between 5 and 20 or use Sturges rule
k  1  3.32 log n where k is number of classes desired and n is total number of
observation.
4. Find the class width by dividing the range by the number of classes and rounding
R
w
up, not off. k .
5. Pick a suitable starting point less than or equal to the minimum value. The starting
point is called the lower limit of the first class. Continue to add the class width to
this lower limit to get the rest of the lower limits.
6. To find the upper limit of the first class, subtract U from the lower limit of the
second class. Then continue to add the class width to this upper limit to find the
rest of the upper limits.
7. Find the boundaries by subtracting U/2 units from the lower limits and adding U/2
units from the upper limits. The boundaries are also half-way between the upper
limit of one class and the lower limit of the next class. !may not be necessary to
find the boundaries.
8. Tally the data.
9. Find the frequencies.
10. Find the cumulative frequencies. Depending on what you're trying to accomplish,
it may not be necessary to find the cumulative frequencies.
11. If necessary, find the relative frequencies and/or relative cumulative frequencies

Example*:

Construct a frequency distribution for the following data.

11 29 6 33 14 31 22 27 19 20
18 17 22 38 23 21 26 34 39 27

Solutions:

Step 1: Find the highest and the lowest value H=39, L=6

Step 2: Find the range; R=H-L=39-6=33

Step 3: Select the number of classes desired using Sturges formula;

k  1  3.32 log n =1+3.32log (20) =5.32=6(rounding up)

Step 4: Find the class width; w=R/k=33/6=5.5=6 (rounding up)

Page 8 of 16
Lecture notes on Introduction to Statistics Chapter 2 METHODS OF DATA PRESNTATION

Step 5: Select the starting point, let it be the minimum observation.

 6, 12, 18, 24, 30, 36 are the lower class limits.

Step 6: Find the upper class limit; e.g. the first upper class=12-U=12-1=11

 11, 17, 23, 29, 35, 41 are the upper class limits.

So combining step 5 and step 6, one can construct the following classes.

Class limits
6 – 11
12 – 17
18 – 23
24 – 29
30 – 35
36 – 41

Step 7: Find the class boundaries;

E.g. for class 1 Lower class boundary=6-U/2=5.5

Upper class boundary =11+U/2=11.5

 Then continue adding w on both boundaries to obtain the rest boundaries. By


doing so one can obtain the following classes.

Class boundary
5.5 – 11.5
11.5 – 17.5
17.5 – 23.5
23.5 – 29.5
29.5 – 35.5
35.5 – 41.5

Step 8: tally the data.

Step 9: Write the numeric values for the tallies in the frequency column.

Step 10: Find cumulative frequency.

Step 11: Find relative frequency or/and relative cumulative frequency.

Page 9 of 16
Lecture notes on Introduction to Statistics Chapter 2 METHODS OF DATA PRESNTATION

The complete frequency distribution follows:

Class Class boundary Class Tally Freq. Cf (less Cf (more rf. rcf (less
limit Mark than than type) than type
type)
6 – 11 5.5 – 11.5 8.5 // 2 2 20 0.10 0.10
12 – 17 11.5 – 17.5 14.5 // 2 4 18 0.10 0.20
18 – 23 17.5 – 23.5 20.5 ////// 7 11 16 0.35 0.55
24 – 29 23.5 – 29.5 26.5 //// 4 15 9 0.20 0.75
30 – 35 29.5 – 35.5 32.5 /// 3 18 5 0.15 0.90
36 – 41 35.5 – 41.5 38.5 // 2 20 2 0.10 1.00

2.2 Diagrammatic and Graphic presentation of data.

These are techniques for presenting data in visual displays using geometric and pictures.

Importance:

 They have greater attraction.


 They facilitate comparison.
 They are easily understandable.

-Diagrams are appropriate for presenting discrete data.

-The three most commonly used diagrammatic presentation for discrete as well as qualitative
data are:

 Pie charts
 Pictogram
 Bar charts

Pie chart

- A pie chart is a circle that is divided in to sections or wedges according to the percentage of
frequencies in each category of the distribution. The angle of the sector is obtained using:

Example: Draw a pie chart to represent the following population in a town.

Page 10 of 16
Lecture notes on Introduction to Statistics Chapter 2 METHODS OF DATA PRESNTATION

Men Women Girls Boys


2500 2000 4000 1500

Solutions:

Step 1: Find the percentage.

Step 2: Find the number of degrees for each class.

Step 3: Using a protractor and compass, graph each section and write its name corresponding
percentage.

Class Frequency Percent Degree


Men 2500 25 90
Women 2000 20 72
Girls 4000 40 144
Boys 1500 15 54
Total 10000 100 360

Pie Chart

Boys,15%
Men,25%

Girls,40% Women,20% Pictogram

-In these
diagrams,
we represent
data by
means of
some picture symbols. We decide about a suitable picture to represent a definite
number of units in which the variable is measured.

Example: draw a pictogram to represent the following population of a town.

Year 1989 1990 1991 1992


2000 3000 5000 7000

Page 11 of 16
Lecture notes on Introduction to Statistics Chapter 2 METHODS OF DATA PRESNTATION

Populatio
n

Bar Charts:

- A set of bars (thick lines or narrow rectangles) representing some magnitude over time space.
- They are useful for comparing aggregate over time space.
- Bars can be drawn either vertically or horizontally.
- There are different types of bar charts. The most common being :

 Simple bar chart


 Component or sub divided bar chart.
 Multiple bar charts.

Simple Bar Chart


-Are used to display data on one variable.
-They are thick lines (narrow rectangles) having the same breadth. The magnitude of a quantity
is represented by the height /length of the bar.
Example: The following data represent sale by product, 1957- 1959 of a given company for three
products A, B, C.

Product Sales($) Sales($) Sales($)


In 1957 In 1958 In 1959
A 12 14 18
B 24 21 18
C 24 35 54

Solutions:

Sales by product in 1957

30
25
Sales in $

20
15
10
5
0
A B C
product

Component Bar chart


-When there is a desire to show how a total (or aggregate) is divided in to its component parts, we
use component bar chart.

Page 12 of 16
Lecture notes on Introduction to Statistics Chapter 2 METHODS OF DATA PRESNTATION

-The bars represent total value of a variable with each total broken in to its component parts and
different colours or designs are used for identifications

Example:
Draw a component bar chart to represent the sales by product from 1957 to 1959.
Solutions:

SALES BY PRODUCT 1957-1959

100

80
Sales in $

Product C
60
Product B
40
Product A
20

0
1957 1958 1959
Year of production

Multiple Bar charts

- These are used to display data on more than one variable.


- They are used for comparing different variables at the same time.

Example:

Draw a component bar chart to represent the sales by product from 1957 to 1959.

Page 13 of 16
Lecture notes on Introduction to Statistics Chapter 2 METHODS OF DATA PRESNTATION

Solutions:

Sales by product 1957-1959

60
50
40
Sales in $

Product A
30 Product B
20 Product C

10
0
1957 1958 1959
Year of production

Graphical Presentation of data


The histogram, frequency polygon and cumulative frequency graph or ogive are most
commonly applied graphical representations for continuous data.

Procedures for constructing statistical graphs:

 Draw and label the X and Y axes.


 Choose a suitable scale for the frequencies or cumulative frequencies and label it on the Y
axes.
 Represent the class boundaries for the histogram or ogive or the mid points for the
frequency polygon on the X axes.
 Plot the points.
 Draw the bars or lines to connect the points.

Histogram
A graph which displays the data by using vertical bars of various height to represent
frequencies. Class boundaries are placed along the horizontal axes. Class marks and class limits
are some times used as quantity on the X axes.

Page 14 of 16
Lecture notes on Introduction to Statistics Chapter 2 METHODS OF DATA PRESNTATION

Example: Construct a histogram to represent the previous data (example *).

Frequency Polygon:
- A line graph. The frequency is placed along the vertical axis and classes mid points are placed
along the horizontal axis. It is customer to the next higher and lower class interval with
corresponding frequency of zero, this is to make it a complete polygon.
Example: Draw a frequency polygon for the above data (example *).
Solutions:
8

4
Value Frequency

0
2.5 8.5 14.5 20.5 26.5 32.5 38.5 44.5

Class Mid points

Ogive (cumulative frequency polygon)


- A graph showing the cumulative frequency (less than or more than type) plotted against upper
or lower class boundaries respectively. That is class boundaries are plotted along the horizontal
axis and the corresponding cumulative frequencies are plotted along the vertical axis. The
points are joined by a free hand curve.

Example: Draw an ogive curve(less than type) for the above data.(Example *)

Page 15 of 16

You might also like