Statchapter 2

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 41

Chapter Two

2. Methods of Data
Collection and Presentation
2.1. Sources of Data

• There are two sources of data.


• These are Primary sources and Secondary sources.
• Primary sources of data are objects or persons from which
we collect the figures used for first hand information.
• The data obtained from sources are measurements observed
or recorded as a part of an original study or surveys being
conducted.
• Secondary sources are either published or unpublished
materials or records.
Cont …
• Secondary data can be literally defined as second-
hand information and data or
• Information that was either gathered by someone
else (e.g., researchers, institutions, other NGOs, etc.)
Some of the sources of secondary data are
• Government • Trade Journals, • Hospitals,
Document, • Review Articles, • Libraries,
• Official Statistics, • Reference Books, • Library Search
• Technical Report, • Research Engines.
• Scholarly Institutes,
Journals, • Universities,
Before use of secondary data
investigator should examine:

• The type and objective of the situations.


• The purpose for which the data are collected and
compatible with the present problem.
• The nature and classification of data is appropriate to
our problem.
• There are no biases and misreporting in the
published data.
• Reliability, homogeneity, and completeness.
Two activities involved: planning and
measuring to collect a scientific primary data

1.Planning:
 Identify source and elements of the data.
 Decide whether to consider sample or census.
 If sampling is preferred, decide on sample size, selection method,… etc
 Decide measurement procedure.
 Set up the necessary organizational structure
2.Measuring: there are different options
• Telephone Interview • Laboratory experiment/experimental
• Mail Questionnaires design
• Personal Interview • Focus group discussion
• New Product Registration
Assingment: Please read/refer about advantage and disadvantage of methods
of data collection
2.2. Methods of Data Collection
1. Observation: involves recording the behavioral patterns of
people, objects and events in a systematic manner.
2. Questionnaire: Is a popular means of collecting data. A set
of questions are administered to respondent either
physically or through mail (Email, Postal, etc).
3. Interviewing: Interviews can be undertaken on a personal
(face to face) or via telephone (indirect method).
4. Extract from Records/Documentary Sources: is method
of collecting information (secondary data) from published
or unpublished sources.
2.3. Methods of Data
Presentation
• So far you know how to collect data. So what do we do with the
collected data next?
• Now you have to present the data you have collected.
• Thus, the collected data also known as ‘raw data’ are always in an
unorganized form.
• It needs to be organized and presented in a meaningful and readily
comprehensible form in order to facilitate further statistical analysis.
• The presentation of data is broadly classified in to the following two
categories:
 Tabular presentation
 Diagrammatic and Graphic presentation.
2.3.1. Tabular presentation of
data
• Tables are important to summarize large volume of data in
more understandable way.
Based on the characteristics they present tables are:
i. Simple (one way table): table which present one
characteristics for example age distribution.
ii. Two way table: it presents two characteristics in columns
and rows for example age versus sex.
iii. A higher order table: table which presents two or more
characteristics in one table.
Cont ….
• In statistics usually we use frequency distribution table for
different type of data.
• Frequency Distribution: is the organization of raw data in
table form, using classes and frequencies.
• Frequency: is the number of values in a specific class of the
distribution.
• There are three basic types of frequency distributions
 Categorical frequency distribution
 Ungrouped frequency distribution
 Grouped frequency distribution
Categorical Frequency
Distribution
• Used for data which can be placed in specific
categories such as nominal or ordinal level data.
• For example: marital status, political affiliation,
religious affiliation, blood type …
Steps of constructing categorical frequency
distribution

Step 1: You have to identify that the data is in nominal or ordinal


scale of measurement
Step 2: Make a table as show below

Step 3: Put distinct values of a data set in column A


Step 4: Tally the data and place the result in column B
Step 5: Count the tallies and place the results in column C
Step 6: Find the percentage of values in each class by using the
formula
Where, f is frequency, and n is total number of values.
Example

Example 2.1: Twenty-five army inductees were given


a blood test to determine their blood type. The data
set is given as follows:

Construct a frequency distribution for the above data.


Ungrouped Frequency
Distribution
• Is a table of all the potential raw score values that could possible
occur in the data along with the number of times each actually
occurred.
• Is often constructed for small set or data on discrete variable.
• The major components of this type of frequency distributions are
Class, tally, frequency, and cumulative frequency.
• Cumulative frequency (CF):- are used to show how many values are
accumulated up to and including a specific class.
• Less than Cumulative Frequency (LCF):-is the total sum of
observations below specified class including that class
• More than Cumulative frequency (MCF):- is the total sum of
observations above specified class including that class.
Constructing ungrouped frequency distribution:
• First find the smallest and largest raw score in the
collected data.
• Arrange the data in order of magnitude and count
the frequency.
• To facilitate counting one may include a column of
tallies.
Example

Example 2.2: A demographer is interested in the number of


children a family may have, he/she took sample of 30 families
and obtained the following observations.
Number of children in a sample of 30 families

Construct a frequency distribution for this data.


Solution
• These individual observations can be arranged in ascending or
descending order of magnitude
2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5,
6, 7, 7, 8, 8, 8
• Frequency distribution of children in a 30 families is as follow:
Class Tally Frequency LCF MCF
2 ||||| 5 5 30
3 ||||| || 7 12 25
4 ||||| ||| 8 20 18
5 |||| 4 24 10
6 | 1 25 6
7 || 2 27 5
8 ||| 3 30 3
Grouped Frequency
Distribution
• When several numbers are grouped in one class; the data
must be grouped in which each class has more than one unit
in width.
• When the range of the data is large, and for data from
continuous variable.
Some of basic terms that are most frequently
used
• Upper Class Limits: are the largest number that can belong to
the different classes.
• Lower Class Limits: are the smallest number that can belong
to the different class.
• Class Boundaries: (true class limits) are the number used to
separate classes, but without the gaps created by class limits.

• Class mark (midpoints): are the midpoints of the classes.


• Class width: is the difference between two consecutive lower
class limits or two consecutive lower class boundaries.
Steps in constructing grouped frequency
distribution

Step 1: Find the highest and the lowest values


Step 2: Find the range;
Step 3: Select the number of classes desired. Select the number
of classes arbitrarily between 5 and 20 or Use Struge’s rule. That
is, where k is the number of class desired and n is the number
of observations.
Step 4: Find the class width (W) by dividing the range by the
number of classes
Note that: Round the value of W up to the nearest whole
number if there is a reminder. For instance, 4.7≈5 and 4.12≈5.
Cont …
Step 5: Select the starting point as the lowest class limit. This
is usually the lowest score (observation). Add the width to that
score to get the lower class limit of the next class.
Keep adding until you achieve the number of desired classes
calculated in step 3.
Step 6: Find the upper class limit; subtract unit of
measurement(U) from the lower class limit of the second class
in order to get the upper class limit of the first class. Then add
the width to each upper class limit to get all upper class limits.
Step 7: Unit of measurement: Is the smallest value of
difference between consecutive observations or sometimes it is
next value.
Note that: U=1 is the maximum value of unit of measurement.
Cont …
Step 8: Find the class boundaries and
Step 9: Tally the data and write the numerical values for tallies in the
frequency column.
Step 10: Find cumulative frequency (LCF and MCF)
Step 11: Find relative frequency or/and relative cumulative frequency.

• Relative frequency distribution enables us to understand the


distribution of the data and to compare different sets of data.
Example

Example 2.3: Consider the following set of data and


construct the frequency distribution.
Solution
Solution
Solution
Solution
Diagrammatic Presentation of the Data

• We have discussed the techniques of classification and tabulation


that help us in organizing the collected data in a meaningful fashion.
• However, this way of presentation of statistical data does not always
prove to be interesting to a layman.
• One of the most effective and interesting alternative way in which a
statistical data may be presented is through diagrams and graphs.
• The three most commonly used diagrammatic presentation for
discrete as well as qualitative data are:
• Pie charts
• Pictogram
• Bar charts
Pie Chart

• Pie chart can used to compare the relation between


the whole and its components.
• Pie chart is a circular diagram and the area of the
sector of a circle is used in pie chart.
• To construct a pie chart (sector diagram), we draw
a circle with radius (square root of the total).
• The total angle of the circle is .
Example
Example2.6: The following table gives the details of
monthly budget of a family. Represent these figures by a
suitable diagram.

Monthly
mis- budget of family
cla-
Fuel neous
and 20% food
food
Light clothing
40%
7% House Rent
Fuel and Light
House Rent misclaneous
27%

cloth-
ing
7%
Bar Charts

• The bar charts (simple bar chart, multiple bar charts)


use vertical or horizontal bins to represent the
frequencies of a distribution.
• Simple Bar Chart is used to represents data involving
only one variable classified on spatial, quantitative or
temporal basis.
• In simple bar chart, we make bars of equal width but
variable length, i.e. the magnitude of a quantity is
represented by the height or length of the bars.
Example
• Draw simple bar diagram to represent the profits of a bank for 5 years.
Multiple Bars
• When two or more interrelated series of data are depicted by a
bar diagram, then such a diagram is known as a multiple-bar
diagram.
• Suppose we have export and import figures for a few years.
• We can display by two bars close to each other, one exports
while the other imports.

Suitable where
some
comparison is
involved
Graphical Presentation of
Data
• Often we use graphical presentation form for
continuous data type; results from the grouped
frequency distribution and continuous variables
distributed over time.
A. Histogram
B. Frequency Polygon
C. O- give Graph
Procedures for constructing statistical graphs:

• Draw and label the X and Y axes.


• Choose a suitable scale for the frequencies or
cumulative frequencies and label it on the Y axes.
• Represent the class boundaries for the histogram or
ogive or the mid points for the frequency polygon on
the X axes.
• Plot the points.
• Draw the bars or lines to connect the points.
Histogram

• Histogram is a special type of bar graph in which the


horizontal scale represents classes of data values
and the vertical scale represents frequencies.
• The height of the bars correspond to the frequency
values, and the drawn adjacent to each other (without
gaps).
• We can construct a histogram after we have first
completed a frequency distribution table for a data set.
Example
• Example2.9: The histogram for the data in example 2.4
is
7.0

6.0
Frequency 5.0

4. 0

3.0

2.0

1.0

0.0 5.5 11.5 17.5 23.5 29.5 35.5 41.5


Class boundaries
Frequency Polygon

• A frequency polygon uses line segment connected


to points located directly above class midpoint
values.
• The heights of the points correspond to the class
frequencies, and the line segments are extended to
the left and right so that the graph begins and ends
on the horizontal axis with the same distance that
the previous and next midpoint would be located.
Example

C 7.0
o Frequency polygon
m
.
6.0
fr
e 5.0
q
u
e 4.0
n
c
y 3.0

2.0
2.5 8.5 14.5 20.5 26.5 32.5 38.5 44.5
Midpoints
O-give Graph
• An o-give is a line that depicts cumulative frequencies.
• Note that the O-give uses class boundaries along the horizontal
scale, and graph begins with the lower boundary of the first
class and ends with the upper boundary of the last class.
• There are two type of O-give namely less than O-give and
more than O-give.
End

Thank you!!!
Quiz (5%)
The investigator was interested in studying the marital status, which is
often grouped as Single(S), Married (M), Divorced (D), and Widowed
(W) of people in a certain town. The following data were obtained.
DSDDSWSDSSDDWMMSDDDWMSSWMDDMD
WDSSWDDSDSMWMDSDWDMSSDWWSSSWS
DMWSS
A. To which scale of measurement do these data belong?
B. Summarize the data by constructing the appropriate frequency
distribution
C. Present the data using the appropriate Graph/Diagrams.

You might also like