MODULE 5
Methods of Data Collection
There are different methods of collecting data, each suited to different research needs:
1. Literature Sources
Data can be collected from existing literature, such as books, journals, reports,
and articles. This type of data is secondary and involves reviewing published
research to gather relevant information.
2. Surveys
Surveys involve using questionnaires to collect data from a group of respondents.
It is a structured method of data collection and allows for the collection of both
quantitative and qualitative data. Surveys are cost-effective and can be
administered online or through paper forms.
3. Interviews
Interviews involve a researcher asking a participant about their views or
experiences regarding a topic. Interviews can be structured (pre-set questions),
semi-structured, or unstructured (more conversational), depending on the level
of flexibility desired in the responses.
4. Observations
Observational data collection involves watching people or events in a specific
setting to gather insights. Observations can be participant (where the researcher
is involved in the activity) non-participant (where the researcher remains a
bystander).or
5. Documents and Records
Existing records, such as official documents, reports, or historical records, can be
valuable sources of data. This method is secondary and involves analyzing
existing written materials that are relevant to the research topic.
6. Experiments
Experiments involve controlled conditions where a researcher manipulates one
variable to observe its effect on another. This method is typically used in
quantitative research and is particularly common in scientific and medical
research.
Sampling Techniques
Sampling refers to selecting a subset of individuals from a larger population for the
purpose of conducting research. This allows researchers to make generalizations about
a population without studying everyone.
• Population: The entire group that the researcher is interested in studying.
• Sample: A smaller, manageable group chosen from the population.
There are two main types of sampling:
1. Probability Sampling
In probability sampling, every member of the population has a known and equal
chance of being selected. This method ensures a random sample, leading to
more reliable and generalizable results. Types of probability sampling include:
o Simple Random Sampling: Each individual has an equal chance of being
selected.
o Stratified Sampling: The population is divided into subgroups (strata), and
random samples are taken from each subgroup.
o Cluster Sampling: The population is divided into clusters, and entire clusters
are randomly selected.
o Systematic Sampling: A starting point is chosen at random, and then
every nth person is selected.
o Multi-Stage Sampling: A combination of various sampling techniques is
used.
2. Non-Probability Sampling
In non-probability sampling, individuals are selected based on researcher
judgment, and not all members of the population have an equal chance of
being chosen. This can introduce bias. Types of non-probability sampling include:
o Convenience Sampling: Selecting individuals who are easiest to access.
o Purposive Sampling: Selecting individuals based on specific characteristics
relevant to the research.
o Quota Sampling: Ensuring that certain characteristics are represented in
the sample.
o Snowball Sampling: Existing participants refer others who meet the study
criteria.
Slovin's Formula
Slovin’s Formula helps determine the sample size when the population size is known but
the exact variability or characteristics of the population are not available. The formula is
used to calculate the sample size needed to achieve a certain level of accuracy.
The formula is:
n=N1+N×e2n = \frac{N}{1 + N \times e^2}n=1+N×e2N
Where:
• nnn = sample size
• NNN = total population size
• eee = margin of error (often 0.05 for a 5% margin)
For example, if a population has 1,000 people and a margin of error of 5%, you can use
the formula to find out how many people should be sampled to ensure accurate
results.
Presentation of Data
After data is collected, it is organized and displayed in a way that makes it easier to
analyze and interpret. There are several common methods used to present data:
Tabular Forms
Data can be presented in tables, where it is organized into rows and columns.
This method allows researchers to see data clearly and compare different
variables.
Line Graphs
Line graphs show trends over time or other continuous variables. They are ideal
for showing how one variable changes in relation to another. Multiple line graphs
can be used to compare different trends.
Pie Charts
Pie charts display data as a circle, divided into slices that represent the
percentage or proportion of different categories. Pie charts are useful for
showing how parts make up a whole.
Bar Charts
Bar charts use bars to represent data. The length of the bars is proportional to the
value they represent. Bar charts are useful for comparing different categories or
groups.
MODULE 6
Raw Data- Raw data (sometimes called source data or atomic data) is data that has
not been processed for use. A distinction is sometimes made between data and
information to the effect that information is the end product of data processing.
Example: The scores of 10 students in a quiz:
10, 13, 11, 15, 9, 16, 12, 10, 19, 15 (the scores were not yet arranged in any manner).
Array- In mathematics, an array is an arrangement of numbers or symbols in rows and
columns. In statistics, it is a group of numbers in rows and columns with the smallest at
the beginning and the rest in order of size up to the largest at the end or vice versa.
Example: 9, 10, 10, 11, 12, 13, 15, 15, 16, 19 (the scores were arranged from lowest to
highest).
Ungrouped Data- Ungrouped data, which is also known as raw data, is data that has
not been placed in any group or category after collection. Data is categorized in
numbers or characteristics; therefore, the data which has not been put in any of the
categories is ungrouped.
Example: When conducting a census and you want to analyze how many women
above the age of 45 are in a particular area, you first need to know how many people
reside in that area.
Grouped Data- Grouped data is the type of data which is classified into groups after
collection. These data are presented on a frequency distribution table. The table
consists of columns and rows where the data are grouped into classes with a constant
class interval.
Example: The number of people in different age groups, such as 20-29, 30-39, 40-49, etc.
Frequency Distribution Table- A frequency distribution table is a way of organizing data
so that it makes more sense. It tells you how often something happened. The frequency
of an observation tells you the number of times the observation occurs in the data.
Example: For the following list of numbers:
1, 2, 3, 4, 6, 9, 9, 8, 5, 1, 1, 9, 9, 0, 6, 9.
The frequency of the number 9 is 5 (because it occurs 5 times).
How to Make a Frequency Distribution Table
Part 1: Choosing Classes
Step 1: Choose between 5 and 20 classes.
• For example, for the list of IQs, we chose 5 classes.
• Note: Make sure you have a few items in each class (for example, if you have 20
items, choose 5 classes, not 20 classes).
Part 2: Sorting the Data
Step 2: Subtract the minimum data value from the maximum data value to find the
range.
• For example:
o Maximum value = 154, Minimum value = 118
o 154 – 118 = 36 (range)- it is called the Range ( the difference between the
highest and the lowest scores)
Step 3: Divide the range by the number of classes you chose in Step 1.
• 36 ÷ 5 = 7.2
• Round this up to 7 (class width).
Step 4: Round the result from Step 3 up to the nearest whole number to get the class
width.
• Rounded off, 7.2 becomes 7 (class width).
Step 5: Write down the lowest value for your first minimum data value.
• The lowest value in the IQ list is 118.
Step 6: Add the class width from Step 4 to the first value to get the next lower class limit.
• 118 + 7 = 125.
Step 7: Repeat Step 6 for the other minimum data values to create the number of
classes you chose in Step 1.
• For 5 classes:
o 118
o 125 (118 + 7)
o 132 (125 + 7)
o 139 (132 + 7)
o 146 (139 + 7).
Step 8: Write down the upper class limits. These are the highest values that can be in the
category.
• Example:
o For the class 118-124, the upper class limit is 124.
o For the class 125-131, the upper class limit is 131.
o And so on.
Part 3. Finishing the Table Up
Step 9: Add a second column for the frequency of each class, and label the columns
with appropriate headings.
• Example:
Step 10: Count the number of items in each class and put the total in the second
column.
• Example:
o For the IQ range 118-124, there are 4 data points: 118, 123, 124, 124.
IMPORTANT TERMS
Class Limit- Class limits correspond to a class interval, where the class interval has a
minimum and maximum value. The minimum value is called the lower class limit (LCL),
and the maximum value is the upper class limit (UCL).
Example:
In the class interval 10 - 20:
• Lower class limit = 10
• Upper class limit = 20
Class Boundaries- Class boundaries are the numbers that separate each class. The gap
between the classes is the difference between the upper class limit of one class and
the lower class limit of the next class.
Example:
For the classes 10-20 and 20-30, the boundaries are:
• Lower boundary for 10-20 = 9.5 (one half less than the lower class limit)
• Upper boundary for 10-20 = 20.5 (one half more than the upper class limit)
• Lower boundary for 20-30 = 19.5
• Upper boundary for 20-30 = 30.5
Class Mark (Midpoint)- The class mark is the number in the middle of a class interval. It
can be calculated by adding the lower and upper class limits and dividing by two, or
by adding the lower and upper class boundaries and dividing by two.
Class Size- Class size is the difference between the true upper limit and the true lower
limit of a class interval. In the inclusive form, the class size is found by subtracting 0.5
from the lower limit and adding 0.5 to the upper limit.
Example:
For the class interval 10-20 (inclusive form):
• Lower limit = 9.5
• Upper limit = 20.5
• Class Size = 20.5 - 9.5 = 11
Class Frequency- Class frequency refers to the number of observations (data points) in
each class interval. The total number of observations in the entire data set is denoted
by n.
Example:
If the class interval 10-20 has 5 observations, then the class frequency for this interval is 5.
Graphical Representation of Data
1. Histogram
A Histogram is a graphical display of data using bars of different heights. Each
bar in the histogram represents a range (or class interval), and the height of the bar
shows how many data points fall into that range. Taller bars indicate that more data
points are in that range.
• Purpose of a Histogram: It helps display the shape and spread of continuous
sample data, giving a clear view of how data is distributed across the ranges.
2. Frequency Polygon
A Frequency Polygon is a graph that uses lines to join the midpoints of each class
interval or bin. The heights of the points represent the frequencies of the respective class
intervals.
A. THE MEAN
The mean is the most commonly used measure of central tendency. It represents
the average of a data set.
1. Arithmetic Mean (Ungrouped Data)
2. Weighted Mean (Ungrouped Data)
• Definition: Assigns different weights to values based on their importance.
3. Mean for Grouped Data
• Steps:
1. Find the midpoint (mmm) of each class interval.
2. Multiply each midpoint by its frequency (fff).
3. Divide the sum of f⋅mf \cdot mf⋅m by the total frequency (NNN).
B. THE MEDIAN
The median is the middle value when the data is arranged in ascending or
descending order.
• Ungrouped Data:
o Odd number of observations: Median = middle value.
o Even number of observations: Median = average of two middle values.
• Example:
For 5,5,6,7,85, 5, 6, 7, 85,5,6,7,8: Median = 6 (middle value).
For 5,6,7,85, 6, 7, 85,6,7,8: Median = 6+72=6.5\frac{6 + 7}{2} = 6.526+7=6.5.
• Grouped Data:
C. THE MODE
The mode is the most frequent value in a data set.
• Ungrouped Data: Simply count the frequency of each value.
o Example:
For 5,5,6,7,85, 5, 6, 7, 85,5,6,7,8: Mode = 5 (most frequent).
• Grouped Data:
These terms are part of categorical logic, used to classify statements based on
their quantity (universal or particular) and quality (affirmative or negative).
1. Universal Affirmative (A)
• Definition: A statement that asserts something about all members of a group.
• Form: "All S are P" (where S = subject, P = predicate).
• Example:
o All dogs are mammals.
o Every student in the class passed the exam.
These statements apply to the entire category (dogs, students).
2. Universal Negative (E)
• Definition: A statement that denies something about all members of a group.
• Form: "No S are P."
• Example:
o No cats are reptiles.
o None of the books on this shelf are fiction.
These statements exclude the entire category (cats, books).
3. Particular Affirmative (I)
• Definition: A statement that asserts something about some members of a group.
• Form: "Some S are P."
• Example:
o Some birds can fly.
o A few students enjoy mathematics.
These statements apply to part of the group (birds, students).
4. Particular Negative (O)
• Definition: A statement that denies something about some members of a group.
• Form: "Some S are not P."
• Example:
o Some flowers are not red.
o A few employees are not punctual.
These statements exclude part of the group (flowers, employees).
Key Differences
• Quantity:
o Universal = Entire group ("All" or "No").
o Particular = Some members ("Some" or "A few").
• Quality:
o Affirmative = Asserts inclusion.
o Negative = Denies inclusion