Data Collection and Presentation
Data Collection and Presentation
Q + 59 + 37 = 102 giving Q = 6
ii) x = 76 + 12 + 14 = 102
y = 12 + 59 + 6 = 77
z = 37 + 14 + 6 = 57
BINOMIAL THEOREM
PROGRESSION
CHAPTER TWO
Specific objectives
At the end of this topic the trainee should be able to:
❖ Discuss the basic consideration for data collection.
❖ Classify collected data into various categories.
❖ Tabulate collected data.
❖ Diagrammatically and graphically present data.
Introduction
28
a) Statistics
Statistics viewed as a subject is a process of collecting, tabulating and
analyzing numerical data upon which significant conclusions are drawn.
Statistics may also be defined as numerical data, which has been, collected
from a given source and for a particular purpose e.g. population statistics
from the ministry of planning, Agricultural statistics from the ministry of
Agriculture
Statistics may also refer to the values, which have been obtained from
statistical calculations e.g. the mean, mode, range e.t.c.
b) Application of statistics
1. Quality Control
Usually there is a quality control departments in every industry which is
charged with the responsibility of ensuring that the products made do meet
the customers standards e.g. the Kenya bureau of standards (KeBS) is one of
the national institutions which on behalf of the government inspects the
various products to ensure that they do meet the customers specification.
The KeBS together with other control department have developed quality
control charts. They use these charts to check whether the products are up
to standards or not.
3. Forecasting
Statistics is very important for business managers when predicting the
future of a business for example if a given business situation involves a
dependent and independent variables one can develop an equation which
can be used to predict the output under certain given conditions.
29
Statistics may be used in efficient use of human resources for example we
may give questionnaires to workers to find out where the management is
weak
By compiling the statistics of those who were signing it may be found useful
to analyze such data to establish the causes of resignation thus whether it
is due to frustration or by choice.
Introduction
A statistical investigation involves a number of stages:
❖ Definition of the problem or issue;
❖ Collection of relevant data;
❖ Classification and analysis of the collected data;
❖ Presentation of the results.
Even before the collection of data starts, then, there are some important
points to consider when planning a statistical investigation.
Preliminary considerations.
It is important to be aware of these issues as they impact on the data which
is to be collected and analyzed.
❖ Exact definition of the problem
This is necessary in order to ensure that nothing important is omitted
from the enquiry, and effort is not wasted by collecting irrelevant
data. The problem as originally put to the statistician is often of a
very general type and it needs to be specified precisely before work
can begin.
30
instrument will determine the accuracy of the results. The degree of
precision required in an estimate might affect the amount of data we
need to collect. In general, the more precisely we wish to estimate a
value, the more readings we need to take.
Primary data
Primary data are measurement, observed and recorded as part of an
originals study. There are two basic methods of obtaining primary data
namely;
• Questionnaires
• Observation
• Interviews
• Sampling
Primary data is data which is both original and has been obtained in order
to solve the specific problem in hand. Primary data is, therefore, raw data
and has to be classified and processed using appropriate statistical methods
in order to reach a solution to the problem.
Secondary sources
Secondary data can be obtained from journals, reports, and government
publications, publications of research organization, trade and professional
bodies.
Secondary data must be used with utmost care, before using secondary data
the investigator should examine the following;
1. Whether the data are suitable for the purpose of investigation.
2. whether the data are adequate for the purpose of investigation
3. Whether the data are reliable.
Secondary data is any data other than primary data. Thus, it includes any
data which has been subject to the processes of classification or tabulation
or which has resulted from the application of statistical methods to primary
data, and all published statistics.
Internal data
Internal data refers to the measurement that are the by products of routine
businesses records keeping like accounting , finance, production personnel,
quality control , sales , R&D.
Since internal data originate within the business collecting the desired
information does not usually offer much difficult. The particular procedure
31
depends largely upon the nature of the facts being collected and the form
in which they exist.
a. Questionnaire
As the name suggest is distinguished by the fact that data are collected by
asking questions from people who are thought to have the desired
information.
A formal list of such questions is called a questionnaire
Questionnaire refers to a device for securing answers to questions by using
a form which the respondent fills in it.
b. Observation
The investigator observes the object or action in which he is interested.
Sometime individual makes the observation on other occasions mechanical
device observe and record the desired information.
Observation method does not automatically produce accurate data. Physical
difficulties n the observation situation on the part of the observer may
result in errors.
Classification of Data
Classification is grouping of related facts into different classes. Facts in one
class differ from another class with respect to some characteristics called a
basis of classification.
Sorting facts on one basis of classification and then another basis is called
cross-classification
Rules of Classification
1. the number of classes should preferably be between 5 and 15
2. as far as possible one should avoid odd values of class intervals
3. the starting point i.e. the lower limit of the first class should either
be zero or 5 or multiple of 5
4. to ensure continuity and to get correct interval we should adopt
exclusive method of classification
5. Whenever possible all classes should be of the same size.
Types of classification
❖ Geographical
❖ Chronological
32
❖ Qualitative
❖ Quantitative
Classification functions.
Classification of data is a function very similar to text of sorting letters in a
post office. It is well known that the letter collected in a post office are
sorted into different geographical basis i.e in accordance with their
destination as Nairobi , Mombasa , kampala etc .they are then put in
separate bags each containing letters with a common characteristic , or
having the same destination.
Classification of statistical data is comparable to the sorting operation, the
process of classification gives prominence to important information
gathered while dropping unnecessary details facilitates comparison and
enables a statistical treatment of the material collected.
33
the lowest to the highest. Then put a bar( vertical line ) opposite the
particular value to which it relates.
To facilitate counting blocks of five bars are prepared and some space is
left in between each block.
We finally count the number of blocks and bars corresponding to each value
to each value of the variable and place it in the column and frequency.
Example .
Construct a frequency distribution from the following data.
23, 30,20,26,30,30,20,23,40,40,26,20,23,40,28,26,30,40,28,28,30.
(b) Classs interval .it is the span of a class i.e the difference between
the upper limit and the lower limit is known as class interval , for
example in the class 20-40 , the class interval is 20 (i.e 40 – 20) the
size of the class interval is dertemined by the number of classes and
the total range in the data.
(c) Class frequency . this is the number of observation corresponding to
the particular class.it is also known s the frequency of that class or
34
the class frequency .if we add together the frequency of all
individual classes , we obtain the total frequency.
(d) Class mid – point.it is the value lying half way between the lower and
the upper class limit of a class interval.
Mid point of a class is ascertained as follows:
Mid point of a class = upper limit of the class + lower limit of the
class
2
There are two methods of classifying the data according to class
interval namely
- Exclusive method.
- Inclusive method.
(a) Exclusive method.
When the class interval are so fixed that the upper limit of one class is
the lower limit of the next class, it is known as the exclusive method of
classification .this can be illustrated as follow.
DATA TABULATION.
- a table is a systematic arrangements of statistical data in column and
rows.
- Rows are horizontal arrangements whereas columns are vertical
ones.
- The purpose of a table is to simplify the presentation and to
facilitate comparisons.
- The simplication result from the clear cut and systematic
arrangements which enables the reader to quickly locate desired
information.
Parts of a table.
- The various parts of a table may vary from case to case depending
upon the given data, but a good table must contain atleast te
following parts:
(a) Table number
(b) Title of the table
(c) Caption
(d) Stub
(e) Body of the table
(f) Headnote
(g) Footnote
36
(f) Head note. It is a brief explanatory statement applying to all or a
major part of the material in the table and is placed below the tittle
entered and closed in brackets.
(g) Footnote .they are placed directly below the body of the table.
Types of diagrams.
- One dimensional diagram e.g bar diagram.
- Two dimensitional diagram e.g squares .
- Picto grams and cartograms.
37
- Bars may either horizontal or vertical .the vertical bars should be
preferred because they give a better look an also facilitate
comparison.
- While constructing the bar diagram it is desirable to write the
respective figure at the end of each bar so that the reader can know
the precise value without looking at the scale.
38
When such diagrams are prepared , the length of the bars is kept equal to
100 and segment are cut in these bars to represent the component of an
aggregate.
Illustration.
Using the data below , construct the following charts.
(a) Simple bar chart.
(b) Component bar chart.
(c) Multiple bar charts.
(d) Percentage component bar chart.
Solution
(a) Component chart
39
b) simple chart
40
TWO DIMENSITIONAL DIAGRAMS.
As distinguished from one dimensional diagram in which only the length
of the bars is taken into account in two dimension diagram, the length
as well as the width of the bars is considered.
Thus the area of the bar represents the given data. Two dimensional
diagrams are also known as surface diagrams example of this are-
- Rectangles
- Squares
- Circles
% component = x x 360
Total
41
Pie chart of the above.
Tabulation of Data
One of the simplest and most revealing devices of summarizing data and
presenting them in meaningful fashion is the statistical tables.
Types of tabulation
Tabulation may be classified as:
1. simple tabulation
2. complex or matrix tabulation
Simple tabulation only one characteristic is shown hence this type of table
is also known as one –path table. This can be illustrated as follows:
42
Table1: Distribution of workers by workshop
Complex tabulation has two or more characteristics are shown i.e. two or
more aspects of a problem are dealt with at the same time. It is some times
called two path tables. Such tables show two characteristics and are
formed when either the stub or the caption is divided into two coordinate
parts.
Presentation of data
After the data has been collected the next step is to present them in a
suitable form.
Charting data
One of the most convincing and appealing ways in which data may be
presented is through charts
A chart can take the shape of either a diagram or a graph.
Types of diagrams
❖ One dimensional diagram e.g. bar graphs
❖ Two dimensional diagram e.g. rectangles squares
❖ Pictograms and cartograms (circles)
43
One dimensional diagram
Bar diagrams is the most common type of diagram used in practice.
Pictograms
Also known as picture grams are very popularly used in presenting situation
data. They are no abstract presentation such as lines or bars
When pictograms are used data are represented through a pictorial symbol
that is carefully selected.
Illustration
The following table gives the production of tea in India by a leading
company.
Solution for representing the above data by a pictogram we will use the
symbol of a star
Pictogram
Year Production of tea
2004 ◊◊◊◊
2005 ◊◊◊◊◊◊
2006 ◊◊◊◊◊◊
2007 ◊◊◊◊◊◊◊
44
2008 ◊◊◊◊◊◊◊
Merits of diagrams
❖ Pictograms have a greater attraction thus stimulate interest in the
information being presented
❖ Facts portrayed in pictorial form are generally remembered longer.
Limitations
❖ They are difficult to construct
Graphs
A large variety of graphs are used in practice. Graphs can be divided under
the following headings
❖ Time series.
❖ Z- Charts.
❖ Scatter graphs
❖ Semi- logarithmic graphs
❖ Graphs of frequency distribution
45
Histogram
Histograms are a set of vertical bars whose areas are proportional to the
frequencies represent3ed. Histogram is most widely used for graphical
presentation of a frequency distribution.
Illustration
Represent the following data by a histogram
Marks number of students
0-10 8
10-20 12
20-30 22
30-40 35
40-50 40
Solution
Draw a histogram
Frequency polygon
A frequency polygon is a graph of frequency distribution it has more than
four sides
It is particularly effective I comparing two or more frequency distribution.
Illustration
The daily profits (in thousand rupees) of 100 shops are distributed as
follows
Daily profit No of shops
0-50 12
50-100 18
100-150 21
150-200 20
200-250 17
Solution
The frequency polygon of the above data are shown bellow
Draw graph
Smoothed frequency curve
The following points should be kept in mind while smoothing a frequency
graph.
Only frequency distribution based on sample should be smoothed
Only continuous series should be smoothed
The total area under the curve should be equal to the area under the
original histogram or polygon.
Illustration
Draw a histogram frequency polygon and frequency curve representing the
following information
46
Length of service (in years) Number of employees
5-10 5
10-15 12
15-20 25
20-25 48
25-30 32
30-35 6
1
Histogram frequency polygon and curve
Draw
Cumulative frequency curve / ogives
Sometimes one needs to know the answers to questions like how May
workers of a factory earn more the Rs 1500 per month
There are two methods of constructing a cumulative frequency curve
namely;
The less than method
The more than method
Less than method; in the less than method we start with the upper limit of
the classes and go on adding the frequencies. When these frequencies are
plotted we get a rising curve.
More than method; in the more than method, we start with the upper limits
of the classes and from the total frequencies we subtract the frequency of
each class. When the frequencies are plotted we get a declining curve.
given by less than method
Yearly profit(Rs) Frequency
Less than850 21
Less than 900 50
Less than 950 69
Less than 1000 108
Less than1050 151
Less than 1100 245
Less than 1150 318
Less than 1200 422
Less than 1250 467
Less than 1300 494
Less than 1350 542
Less than 1400 563
Less than 1450 575
Less than 1500 580
47
Yearly Rs Frequency
More than 800 580
More than 850 559
More than 900 530
More than 950 511
More than 1000 472
More than 1050 429
More than 1100 335
More than 1150 262
More than 1200 194
More than 1250 158
More than 1300 113
More than 1350 86
More than 1400 38
More than 1450 17
More than 1500 5
48