Lecture - MODULE 1 Lesson 2 & 3

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 15

Lesson 2

Collection of Data
In order to ensure the accuracy of data, you must know the right sources and
method of collecting them because any statistical investigation must necessarily be
based on accurate data.
Two types of data
1. Primary data
Refers to information which are gathered directly from an original source or which
are based on direct or first hand experience.
So long as acquired systematically have a greater degree of precision,
More life-like, interesting, and relevant because the researchers are directly
involved in the process.
Note:
When primary data are not available the researcher has to be contented with secondary
data
Examples:
1. first-person accounts
2. autobiographies
3. diaries
Advantages of primary data over secondary data:
1. Primary date frequently give detailed definition of terns and accurate statistical
units used in the survey.
2. Primary data lend more relevance to the researchers study because of his direct
participation in the project.
3. Primary data are more reliable because of their first-hand nature.
2, Secondary data
Refers to information which are taken from published or unpublished data which
are previously gathered by other individual or agencies.
Example:
1. Published books
2. Newspapers
3. Magazines
4. Biographies
5. Business reports
Methods used in the collection of data:
1. The direct or interview method
A person-to-person exchange between the interviewer and the interviewee.
Provides consistent and more precise information since clarification may be given by
the interviewee. Questions maybe repeated or modified to suit each interviewee level of
understanding but time consuming, expensive and has limited field coverage,
2. The indirect or questionnaire method
Written responses are given to prepared questions.
A questionnaire is a list of questions which are intended to elicit answers to the
problems of a study. Questions may be mailed or hand carried.
Inexpensive and cover a wide area in a shorter span pf time.
Informers may feel a greater sense of freedom to express views and opinions
because their anonymity is maintained.
There is a strong probability however of non-response especially questionnaires
are mailed. Questions not easily understood will also probably not be answered.
3. The Registration Method
The gathering information is enforced by certain laws.
Example: Registration of
1. Births
2. Deaths
3. Vehicles
4. Marriages
5. Licenses
Advantage:
Information is kept systematized and made available to all because of the
requirement of the law.
4. The observation Method
The investigator observes the behavior of persons or organizations and their
outcomes. It is usually used when the subjects cannot talk or write. This method makes
possible the recording of behavior at the appropriate time and situation.
5. The experiment method
Used when the objective is to determine the cause and effect, relationship of
certain phenomena under controlled conditions. This is usually the scientific
researchers.
Table 2.1
Sample size for Specified Margins of Error
Population Margin of
(N) Sample size (n) per Error ( e ) of

±1 ± 2% ± 3% ± 4% ± 5% ± 10%
500 * * * * 222 83
1,500 * * 638 441 316 94
2,500 * 1,250 769 500 345 96
3,000 * 1,364 811 517 353 97
4,000 * 1,538 870 541 364 98
5,000 * 1,667 909 556 370 98
6,000 * 1,765 938 566 375 98
7,000 * 1,842 959 574 378 99
8,000 * 1,905 976 580 381 99
9,000 * 1,957 989 584 383 99
10,000 5,000 2,000 1,000 588 385 99
50,000 8,333 2,381 1,087 617 397 100
* In these cases, the assumption of normal approximation is poor and the sample size
formula does not apply
Applying the formula you can solve for the sample size as seen in the table.
n= N____
1 + Ne 2
Where:
n = sample size
N = population size
e = desired margin of error (percent allowance for
Non precision because of the use of
The sample instead of the population)
Sampling Techniques
As mentioned it is not necessary for the researcher to examine every member of
the population to get data or information about the population. Cost and time constraints
will prohibit one from undertaking a study of the entire population. So what the
researcher needs to do is to draw sample units systematically or at random. If sampling
is done in this way,he can validly infer conclusions about the entire population from our
sample.
A. PROBABILITY SAMPLING
All elements in the population has an equal chance of being selected as a
respondent.
I. Random Sampling
When we say picking things at random it means picking things without bias or
any predetermined choice.
Random Sampling is the method of selecting sample size (n) from a universe (N)
such that each member of the population has an equal chance of being included in the
sample and all possible combinations of size (n) have an equal chance of being
selected as the sample.
A prerequisite for the randomness of the selection is a complete listing of the
population. Thus, prior to the actual picking of sample units, the complete listing or
enumeration of the population has to be undertaken. This phase provides the
researcher the list from where he would randomly pick his sample units.
Several ways of drawing sample units at random
1. Lottery sampling
Usually carried out by assigning numbers to each member of the population.
Example:
Write down the names of each ,member of the population on pieces of paper,
rolled and placed in a box or container drum. The box or container drum must be
shaken thoroughly to prevent some pieces of paper from sinking at the bottom where
they will have less chances of being drawn. The required number of sample units are
picked from the rolled paper in the box or container drum.
2. Table of Random Numbers
The selection of ech member of the population is left adequately to chance, and
every member of the population has an equal chance of being chosen.
Table of Random Numbers
613238 946267 983341 473358
990065 028290 796267 759112
067217 252131 492824 556579
655118 613844 329285 543481
253755 019182 240271 039218

715080 381570 434045 003326


482228 615413 827812 066524
011007 596369 691115 132954
515444 626165 381865 857213
852784 486456 377015 656580
II. Systematic Sampling
Use prior knowledge of the individuals comprising a universe with the end in
view to increasing precision and representation of samples. When sample units are
obtained by drawing every, say, 4th or 7th or 10th item on a list.
This method involves selecting every nth element of a series representing the
population A complete listing is required in this method.
III. Stratified sampling
The population is first divided into groups based on homogeneity in order
to avoid the possibility of drawing samples whose members come only from one
stratum.
a.stratified proportional sampling
The distribution of sampling units is proportionate to the total
number of units in each stratum. The bigger the population, the more
sample units are drawn, the less population, the less sample units.

IV. Cluster Sampling


Referred to as an area sample because it is frequently applied on a
geographical basis. This is useful in selecting the sample when blocks in a
community or city are occupied by heterogeneous groups.
V. Multi- stage Sampling
Uses several stages or phases in getting the sample from the
general population. Selecting of the sample is still done at random. Useful
in conducting nation wide surveys or any survey involving a large
universe.
B. Non-Random sampling or Non Probability Sampling
Not all members of the population are given equal chances to be chosen. Certain
elements in the population are deliberately left out in the choice of the sample for varied
reasons.
1. Purposive sampling
Based on certain criteria laid down by the researcher. People who satisfy
the criteria are interviewed.
2. Quota Sampling
Relatively quick and inexpensive method to operate. Each interviewer is
given definite instructions about the section of the public he is to question, but the
final choice of the actual persons is left to his own convenience or preference,
and is not predetermined by some carefully operated randomizing plan. Each
interviewer then proceeds to fill the prescribed quota.
3. Convenience Sampling
A researcher might want to find out whether the production of fish balls
confirms to the minimum standards of health and safety. There are hundreds of
ambulant peddlers of this product. Thus it is impossible for the researcher to
make a complete list, much less to interview all the producers and test all their
products. So what the researcher just do is to get samples of the product, say,
from the fish ball peddler near the school or near the residence.
Lesson 3
Presentation of Data
Raw data gathered either from primary or secondary source after applying the
different methods of collecting data should be organized and presented in a
summarized form. Collected data must be organized in order to show significant
characteristics.
Three forms where data can be presented:
1. Textual
This combines text and numerical facts and presented in paragraph form
to explain the summary of data gathered. In the presentation of the text, the
writer can emphasize the importance of some figures or can call attention to the
relevance of other figures since many persons cannot easily understand or
comprehend data set in a tabular form unless a preliminary explanation of data is
made. It usually discusses the highlights of the data.
2. Tabular
This uses statistical table that shows the data in a more concise and
systematic manner. Data is presented in rows and columns in which a class or
subclass is assigned to a particular column or row and the numerical figures for
various classifications are written in its respective cells.. The table facilitates the
analysis of relationships of data.
Tabulation is the process of condensing classified data and arranging
them in a table. Data can more readily be understood and comparisons may
more easily be made.
Data collected must first be classified before we can tabulate and interpret
them.
Classification is the process of putting together similar items from the
mass of data we have collected based on such characteristics.
Frequency distribution refers to the tabular arrangement of data by classes or
categories together with their corresponding class frequencies/
Class frequency refers to the number of observations belonging to a class interval or the
number of items within a category.
Class interval is a grouping or category defined by a lower limit and upper limit.
Steps in Constructing a Frequency Distribution:
1. Determine the range by getting the difference between the highest and lowest values
in the set of data.
2. Determine the number of class intervals or categories desired.
3. Determine the approximate size of the class interval by dividing the range by the
desired number of class intervals.
4. Write the class intervals starting with the lowest lower limit as determined by the
researcher’s choice. The upper limit is determined by adding the lower limit and the size
of the class interval minus 1. Subsequent classes shall be obtained in the same
manner.
5. Determine the class frequencies for each class interval by referring to the tally
column.
6. Compute for the class mark by adding the lower and upper limits of the class interval,
then divide the sum by 2. The class nark is the representative value of the
corresponding interval
Class boundaries are more precise expressions of the class limits by at least 0.5 of their
values.
- True class limits
- situated between the upper limit of one interval and the lower limit of the next interval.
Example:
Given a raw data prepare a frequency distribution table following the steps above on the
scores of students in a 40 item quiz.
16 11 18 5
6 22 15 10
30 26 23 16
20 19 23 25
31 29 36 27
 Arrange the data in ascending or descending order so it would be easy for us to
count or tally the number of item or items in a particular class interval. We call this
arrangement data array.
5 6 10 11 15
16 16 18 19 20
22 23 23 25 26
27 29 30 31 36
Determine the range by getting the difference between the highest and lowest values in
the set of data.
Highest score = 36
Lowest score = 5
Range = 36 - 5 = 31
Determine the number of class intervals or categories desired.
Number of class intervals or categories desired = 6
Determine the approximate size of the class interval by dividing the range by the
desired number of class intervals.
Class interval size = 31 / 6 = 5.16 rounded off to the nearest whole number
i=5
Write the class intervals starting with the lowest lower limit as determined by the
researcher’s choice. The upper limit is determined by adding the lower limit and the
size of the class interval minus 1. Subsequent classes shall be obtained in the same
manner.
 The lower limit of the first class interval could be the lowest score or any multiples of
the class interval size but should contain the lowest score
So lower limit = 5
Upper limit = lower limit + class interval size minus 1
Upper limit = 5 + 5 - 1 = 9
Determine the class frequencies for each class interval by referring to the tally column.
Compute for the class mark by adding the lower and upper limits of the class interval,
then divide the sum by 2. The class nark is the representative value of the
corresponding interval.
Class Mark or Midpoint = Lower limit + upper limit
2
= 5+ 9 = 7
2
Advantages of Tabular Presentation:
1. It provides the reader a good grasp of the meaning of the quantitative
relationship of the data presented in the report.
2. The systematic arrangement of columns and rows makes the table easily
understood by the reader.
3. The rows and columns facilitate comparison.
4. It gives vivid picture of the whole document, thus decision making will be easier.
5. It saves time for the reader to analyze and interpret data.

Table 1

Class Interval Frequency Class Mark


5-9 2 7
10 - 14 2 12
15 - 19 5 17
20 - 24 4 22
25 - 29 4 27
30 - 34 2 32
35 - 39 1 37
N 20

A complete frequency distribution table contains the following


Table 1
Class Frequency Class Cumulative Cumulative Relative frequency
Interval Mark frequency < frequency > (%)
5-9 2 7 2 20 (2/20) x 100 % = 10
10 - 14 2 12 4 18 10
15 - 19 5 17 9 16 25
20 - 24 4 22 13 11 20
25 - 29 4 27 17 7 20
30 - 34 2 32 19 3 10
35 - 39 1 37 20 1 5
N 20 100

Relative frequency = frequency of the class interval


Total number of cases or items
Cumulative frequency less than of the first class interval is the frequency of the
first class interval
Cumulative frequency less than of the second class interval is the cumulative
frequency less than of the first class interval + the frequency of the second class
interval.
2+2= 4
Do the same with the succeeding class intervals. The last cumulative frequency less
than of the last class interval should contain the total number of cases or items.
For the cumulative frequency greater than:
 The cumulative frequency greater than of the first class interval is the total number
of cases or items
 The cumulative frequency greater than of the second class interval is the difference
between the first cumulative frequency and the frequency of the second class
interval
Cumulative frequency of the first class interval = 20
Frequency of the second class interval = 2
Therefore the cumulative frequency of the second class interval is 18. Do
the same with the succeeding cumulative frequencies. The cumulative frequency
of the last class interval should be the frequency of the last class interval.
3. Graphical
The most interesting and most effective means of organizing and
presenting statistical data. Data is presented in visual form. The important
relationships of data can be seen easily by merely looking at colorful fugures
creatively designed.
Graphs are nothing else but pictures of numerical data.
Presentation of numerical data should be done in a clear and appealing way so that
readers can readily understand what the graph intend to deliver.
Advantages of graphical presentation:
1. Graphs enable students, readers and busy executives to easily grasp the essential
facts that numerical data intend to convey.
2. They can easily attract attention and are more readily understood. It is easier to go
through graphs than through quantitative data.
3. Graphs simplify concepts that would otherwise have been expressed in so many
words.
Kinds of graphs/charts or diagrams:
1. Circle graph or Pie Chart
Represents relationships of the different components of a particular data. It is an
ideal graph if you want to show the partition of a whole.

2. Line Graphs

3. Bar Graph
4. Scatter Point Diagram

Pictogram

 pictograms are charts in which icons represent numbers to make it more interesting


and easier to understand. A key is often included to indicate what each icon represents.
All icons must be of the same size, but a fraction of an icon can be used to show the
respective fraction of that amount.
https://www.bing.com/search?q=pictogram+in+statistics&qs=n&form

You might also like