Data-Collection
Data-Collection
Objectives: At the end of this chapter, the student would be able to:
3. Organize and present the gathered data using appropriate tables and graphs.
Collection of Data refers to the process of gathering numerical information such as interview,
questionnaire, experiments, observation, and documentary analysis. Data should be properly collected
so that an investigator may be able to answer the questions under consideration with a reasonable
degree of confidence. Data are also the collections of any number of related observations on one or
more variables. It is a statistical facts, historical facts, principles, opinions and items of various sources
like scores, ages, I.Q., income, intelligence test scores, aptitude tests, personality trait ratings, and
others.
Data are the facts and figures that are collected, analyzed, and summarized for presentation and
interpretation. Data may be classified as either quantitative or qualitative. Quantitative data measure
either how much or how many of something, and Qualitative data provide labels, or names, for
categories of like items. For example, suppose that a particular study is interested in characteristics such
as age, gender, marital status, and annual income for a sample of 100 individuals. These characteristics
would be called the variables of the study, and data values for each of the variables would be associated
with each individual.
Sample survey methods are used to collect data from observational studies, and experimental design
methods are used to collect data from experimental studies. The area of descriptive statistics is
concerned primarily with methods of presenting and interpreting data using graphs, tables, and
numerical summaries. Whenever statisticians use data from a sample - i.e., a subset of the population –
to make statements about a population, they are performing statistical inference. Estimation and
hypothesis testing are procedures used to make statistical inferences.
Fields such as health care, biology, chemistry, physics, education, engineering, business, and economics
make extensive use of statistical inference. Methods of probability were developed initially for the
analysis of gambling games. Probability plays a key role in statistical inference; it is used to provide
measures of the quality and precision of the inferences. Many of the methods of statistical inference are
described in this article. Some of these methods are used primarily for single-variable studies, while
others, such as regression and correlation analysis, are used to make inferences about relationships
among two or more variables.
This is done in a personal communication with the individual you want to interview.
This is done by sending questionnaires to the person from whom you would like to get information.
3. Registration method
This is done utilizing existing records.
4. Observation method
5. Experiment method
1. Raw Data (Ungrouped Data) is the collected data that have not been organized numerically. It is an
arrangement of raw data in ascending or descending order or magnitude.
2. Categorical Data are observations that are put in the same or different classes, the classes possessing
qualitative differences.
3. Ranked Data are observations that show their relative position based on some characteristic, without
necessarily yielding a numerical value for that characteristic.
4. Quantitative Data are concerned with commodity stocks, prices, costs, and profits are analyzed in
relation to consumption, supply and demand.
5. Discrete Data consist of either a finite number of values or countable number of values. This
characterized by gaps for which no real values may be obtained. They are made up of items the values of
which have been obtained by counting. Example: number of books, school enrollment and etc.
6. Continuous Data arises from measurement of a continuous variable. Examples: weights of children,
school achievement, I.Q., heights of children.
The presentation of data, are incorporated in the paragraphs of discussion. Many people cannot easily
understand or comprehend data set in a tabular form unless a preliminary explanation is made.
Another presentation of data is in tabular form, a way of classifying related numerical facts in horizontal
arrays and vertical arrays. It is the process of condensing classified data and arranging them in a table.
Data can more readily be understood and comparisons may more easily be made. The most commonly
used tabular summary of data for a single variable is a frequency distribution. A frequency distribution
shows the number of data values in each of several non-overlapping classes. Another tabular summary,
called a relative frequency distribution, shows the fraction, or percentage, of data values in each class.
The most common tabular summary of data for two variables is a cross tabulation, a two-variable
analogue of a frequency distribution.
Constructing a frequency distribution for a quantitative variable requires more care in defining the
classes and the division points between adjacent classes. A frequency distribution would show the
number of data values in each of these classes, and a relative frequency distribution would show the
fraction of data values in each.
A cross tabulation is a two-way table with the rows of the table representing the classes of one variables
and the columns of the table representing the classes of another variable. To construct a cross tabulation
using the variables gender and age, gender could be shown with two rows, male and female, and age
could be shown with six columns corresponding to the age classes 20-29, 30-39, 40-49, 50-59, 60-69, 70-
79.
What are the Parts of Statistical Table?
1. Table heading
2. Stub
3. Box head
4. Body
Table Number
Title
Master Caption
Stub
Row Caption
Row Caption
Row Caption
Analyzing rows and columns. This simple example began with a discussion of the row-points in the table
shown above. However, one may rather be interested in the column totals, in which case one could plot
the column points in a small-dimensional space, which satisfactorily reproduces the similarity (and
distances) between the relative frequencies for the columns, across the rows, in table shown above. In
fact it is customary to simultaneously plot the column points and the row points in a single graph, to
summarize the information contained in a two-way table.
Another method of presentation of data is by using graphs or charts; the most commonly used being the
line diagrams, bar charts, pie diagrams, pictorial graphs, and statistical maps. It is more
understandable. Graph is the data of statistical analysis results into diagram which easily understandable
at a glance. To appeal to a person’s sense of sight, as much data as possible is conveyed in a condensed,
quick and accurate manner by putting the data in a diagrammatical form. Graphs which are commonly
used in quality control activities.
A number of graphical methods are available for describing data. A bar graph is a graphical device for
depicting qualitative data that have been summarized in a frequency distribution. Labels for the
categories of the qualitative variable are shown on the horizontal axis of the graph. A bar above each
label is constructed such that the height of each bar is proportional to the number of data values in the
category.
A bar graph is a graphical device for depicting qualitative data that have been summarized in a frequency
distribution. Labels for the categories of the qualitative variable are shown on the horizontal axis of the
graph. A bar above each label is constructed such that the height of each bar is proportional to the
number of data values in the category.
a. Be clear on purpose of drawing the graph. When drawing the graph, the most
important thing is to be clear about the purpose of drawing it. Then parallel with the
purpose, collect the information and the data.
b. Arrange the data and information into graph form. It is difficult to hold the interest
of or to convince anyone with information and data as it is. It is therefore essential to
process the data by taking the average to do a comparison.
c. Select the graph. Examine each graph’s advantages and disadvantages: match these
to the purpose of usage before deciding on the graph to be used.
d. Decide on the graph title. Reaffirm the purpose of drawing the graph as in step “a”,
then bearing in mind these points, decide on the graph title: 1) be concise; b) convey
the facts at a glance in an easily understandable manner; c) hold the interest of the
person; d) attract the attention of the person; e) put a substitute in if the main title is
insufficient to explain the content of the graph.
e. Decide the composition and color shades. In attempting to draw a good graph, do
not overdo by placing too much emphasis on composition and color shades. Be
cautious or it could be a failure.
f. Draw a draft for the graph. Using free hand, try to draw a draft for the graph.
Examine the graph size, scale, units and the total balance of the completed graph.
g. Draw the graph. When preparations on steps (a-f) are completed, you are ready for
the actual drawing. Bear these points in mind as you proceed.
1. Be definite on the base line (the line where the scale is zero).
2. When drawing the scale lines, they must be lighter than the base line.
3. In a graph where there are different units for entry, use double scales.
4. In the graph where there are many line, bar or sectors put an index to indicate
each line. Make the main lines border or create the differences by changing the
colors.
5. The numeric values of the scales for the X and Y axes should be such that the
coordinate values in the graphs are easily understandable.
8. When there are many lines, bars or sections, rank them in the order of
importance.
9. It is a must to put in the explanation for scale units, scale numbers and the index.
The Histogram
A histogram is the most common graphical presentation of quantitative data that have been
summarized in a frequency distribution. The values of the quantitative variable are shown on the
horizontal axis. A rectangle is drawn above each class such that the base of the rectangle is equal to the
width of the class interval and its height is proportional to the number of data values in the class. A
number of graphical methods are available for describing data.
a. Standard Histogram. The right and left sides of the peak are symmetrical. This is a
histogram where there is consistency in the work process.
c. Cliff-like Histogram. This shape is seen when the data of the things which are
outside of a certain specification are picked out from the total sum. In this chart, all
things below a certain specification value have been picked out from the total sum
but even those below the specification value, when the measurements are
rechecked, are now within the specification set.
d. High Plateau Histogram. When the differences in the average values are very small,
there will be no peaks but a flat top. When this shape is obtained, search for the
factor to differentiate the average vale, and divide (stratify) into different
histograms.
e. Bi-Modal Histogram. A bi-modal histogram occurs when different set of data with
different average values are placed into one graph.
f. Isolated Island Histogram. This histogram occurs when there is a miss in the process
sampling, data collection, method or measurement method. Then it must investigate
the cause by looking back into past daily reports and record.
Bar Graph. A graphic representations of frequencies in vertical or horizontal lines which is similar to
histogram. It is one of the most common and widely used graphical devices.
Bar Chart. The bar chart is a graph for the comparison of independent elements. Generally, the vertical
axis indicates the size of the numeric quantity (degree, number of cases, defect ratio, cost, etc.), while
the horizontal axis indicates the characteristic values (deficit items, defect cause, etc.) a bar chart is used
for the comparison of quantities. Therefore it would be correct to take the information from the
comparison of the proportion of the bar length and the magnification. Bar charts cannot indicate any
change in time series but they can be very effective graphs to indicate the comparison of quantities at a
specific time.
a. The highest value and the lowest value are easily found.
b. Where there is a small difference between the items compared which in numeric
form would be difficult to detect here the difference is easily detected.
d. The comparison data (previous months, year, etc.) can be collectively shown.
1. Draw the vertical axis and the horizontal axis. The vertical line in a graph is generally
then vertical axis of a graph on the left hand side in an L shape. Draw the axes in
bold lines.
a. Put the largest value on the top of the vertical axis and move downwards in the
order.
c. To see the changes for easy understanding, do not place the base point as zero.
g. Space intervals between one bar and another bar should be in the ratio of 2:1 of
the column width.
k. The item name should be written in the center of the bar column.
3. Putting in hatching.
a. When there is no necessity to differentiate between the items, the hatching can
be similar.
b. Where there is necessity to differentiate between the items, use different types
of hatching.
c. If the bars are crowded, the oblique lines alternately placed may make the bars
appear bent, be careful of these.
4. In the cause of numbers of extreme range differences, use a way line to indicate this.
5. When comparison is done for figures with small differences, shorten the mid section
by putting a wavy line to it.
b. Are there any mistakes in writing down the numeric values of the scales?
a. Write the title larger than the scale numbers and the items.
Pie Chart (Circle Graph) can provide a fast and easy presentation of nominal data divided into a few
categories. Irrespective of whether it is the population figure, sales turnover, productivity level, budget
figures, defect total, accident occurrence cases, etc. by the area size, we can intuitively grasp the
composition ratio of each category in the pie chart. A donut chart is a form of pie chart; a concentric
circle is drawn for the data name and statistic, etc. to be entered into the graph. The graph resembles a
donut shape as it is aptly called. A pie chart enables one to grasp in a glance the composite ratio of each
category such as, by characteristic elements.
A pie chart is another graphical device for summarizing qualitative data. The size of each slice of the pie
is proportional to the number of data values in the corresponding class.
How to Draw a Pie Chart
1. Draw a circle, placing a line from the center to the right in the horizontal position.
This will be the base line.
2. Take the angle for each item; draw the division line.
b. Starting from the item with the biggest percentage, entering the item in a
clockwise direction is the usual practice.
a. Write the item name in a horizontal position with the percentage below.
c. Should the pie be small, use the indicator lines to denote the wordings outside
the pie.
4. In a donut chart, the center of the donut will be for the statistics.
Line Graph is a graph suitable for plotting the information of the changes in time process.
1. Draw the vertical axis and the horizontal axis, marking in the scales respectively.
4. For data which is way above, put a wave to cut the vertical axis, thereby shortening
the scales midway.
5. When plotting different data, it is the best to change the line indication to create a
differential.
2. It can be used for the control of periodic changes in quality, cost, delivery time, etc.
In concrete terms the characteristic of the achievement value to be controlled
should be plotted on the graph, and when it is compared to the standard value or
the objective value, the problem can be detected early and preventive action can be
taken.
3. To analyze the level and to have a grasp of the points movement in an abnormal
process. It is also useful to analyze improvement activities that must be carried out.
4. To detect the problem areas in the present state. When the present state is shown in
the graph, it would show whether the present level is normal or whether it is
necessary to carry out improvement. It also picks up the problem areas where the
action must be taken.
6. For preparation of report. The data obtained from workplace experiment or market
survey is rearranged into a line graph which summarizes a large quantity of
information into a concise and easily understood form. This can be widely used in
the arrangement of information.
Scatter Diagram. Examines the relationship between one data and another and the level of the
relationship. The objective of the drawing a scatter diagram is to correspond two sets of data and to
examines the distribution pattern:
Situations:
b. Relationship between the number of years of sales experience and sales figures.
d. Relationship between monthly sales turnover of any product and gross profit.
The most important point to note in reading scatter diagrams is to check to see whether the data is
stratified or not. When the data is stratified where there seems to be no correlation, a correlation is seen
to exist now. And where there seems to be a correlation, no correlation exists.
When judging the correlation in this manner, it is important to note the range of the data and to
carefully read the graph.
A scatter diagram will not teach why there is a correlation, so it is vital to examine the two sets of data
technically.
What to note in scatter diagram?
To ensure the effective application of a scatter diagram, check the following points:
c. Are the results obtained from the scatter diagram being applied to the next action?
a. To examine the relationships between two sets of data, collect the corresponding
group of data.
b. Whether that should be in the x-axis or y-axis in the case of factor and characteristic,
the main data will be X, Y respectively. Also decide on the respective largest value
and smallest value.
c. Put the horizontal axis as X and draw the horizontal axis and vertical axis. In the
scale, the smallest value is X, Y. Make the two scales as closely identical as possible.
d. Where the X,Y data intersects plot the point. If the data values overlap, make
concentric circles.
e. Lastly, enter the data sampling, collection period, objective, and product name,
manufacturing group, operator’s name and the data manufactured.
Frequency Curve refers to the graphic representation of the number scores in each interval of the
distribution.
Frequency Distribution is the tabular arrangement of the given data by using categories or classes and
their corresponding frequencies.
Frequency Polygon is a line graph of class frequencies plotted against class marks. It is made by
connecting the midpoints of the rectangular tops in the histogram, or simply joining the plotted points
for the class marks and their corresponding frequencies. Thos kind of graphical presentation can also
accommodate categories of wide range, but is more useful for data such as ordinal and interval because
it stresses continuity along a scale.
Expected Frequency is the theoretical frequency for a cell in a contingency table or multinomial table,
computed on the basis of some hypothesis.
Observed Frequency is the actual frequency count of any observation and recorded in a cell of a
contingency table or multinomial table.
1. The less than cumulative frequency distribution whose sum of frequencies for each
class interval is less than the upper class boundary of the interval they correspond
to.
2. The greater than cumulative frequency distribution whose sum of frequencies for
each class interval is greater than the lower class boundary of the interval they
correspond to.
2. Find the number of class intervals or categories desired. The ideal number of class
intervals is somewhere between 5 and 15.
3. Find the approximate size of the class interval by dividing the range by the desired
number of class intervals.
4. Write the interval starting with the lowest lower limit as determined by the
researcher’s choice. The upper limit is determined by the size of the class interval
minus 1.
5. Find the class frequencies for each class interval by referring to the tally column.
85 90 88 86 80
74 81 85 81 90
82 87 84 89 72
70 77 71 85 78
76 74 70 73 74
89 83 90 74 90
78 88 85 81 89
86 91 84 90 88
76 75 83 70 80
75 79 86 80 76
Solution:
91 88 84 80 75
90 87 84 79 74
90 87 83 79 74
90 86 83 78 74
90 86 82 78 74
90 86 81 77 73
89 85 81 76 72
89 85 81 76 71
88 85 80 76 70
88 85 80 75 70
Range = 91 – 70
Range = 21
Number of Classes =
Number of Classes =
Number of Classes = 8
Note:
a. If series contains less than 50 cases, 10 classes or less just enough. The usual class intervals
are 3, 5, and 10.
a. Divide the highest value or score by the class interval or size. Take note the remainder
Remainder 1
Step 5: You can write the class limits or intervals in either descending or ascending order. Note that the
lower and upper limits of every class interval are included in the class size.
Midpoint =
Midpoint =
Midpoint = 91
Class
Class
Limits/ Midpoint Tally Frequency Cf< Cf> Cf<% Cf>%
Boundaries
Intervals
87 – 89 88 87.5-89.5 IIIII-II 7 44 13 88 26
84 – 86 85 84.5-87.5 IIIII-IIII 9 37 22 74 44
81 – 83 82 81.5-84.5 IIIII-I 6 28 28 50 56
78 – 80 79 78.5-81.5 IIIII-II 7 22 35 44 70
75 – 77 76 75.5-78.5 IIIII-I 6 15 41 30 82
72 – 74 73 72.5-75.5 IIIII-I 6 9 47 18 94
N = 50
Class Interval is the distance between the upper and lower limits of a step of test scores in a grouped of
frequency distribution.
Class Boundaries are values obtained from a frequency distribution by increasing the upper class limits
and decreasing the lower class limits by the same amount so that there are no gaps between
consecutive classes. These are carried out to one more decimal place than the recorded observation.
Class Frequency is the number of observations belonging to a class interval, or the number of items
within the category.
1. Documentary Sources. This source may be taken from primary or secondary information.
2. Field Sources. This includes living persons which have sufficient knowledge about social
conditions or had been in intimate contact with the subject over a considerable period of
time.