RM 4
RM 4
Data:
Data refers to raw, unprocessed facts, figures, or observations collected from the real world. Data is often presented
in the form of numbers, symbols, text, or signals. Data by itself does not have meaning or context until it is
interpreted or analyzed.
Key Characteristics of Data:
❖ Raw: Data is unorganized and unprocessed. It may consist of numbers, text, symbols, images, sounds, other
formats.
❖ Unstructured or Structured:
✓ Structured data: Data that is organized and stored in a specific format, like in databases or spreadsheets (e.g.,
names, addresses, financial figures).
✓ Unstructured data: Data that doesn’t follow a specific format or organization, like emails, social media posts,
or videos.
❖ Without Context: Data alone does not have inherent meaning. It needs to be processed or interpreted to become
meaningful.
❖ Collected through Observation or Measurement: Data comes from observations, experiments, surveys,
sensors, or other sources.
Information: Information is data that has been processed, organized, structured in a way that adds meaning, context,
and usefulness. It is the result of analyzing and interpreting raw data, transforming it into a form that can help with
decision-making, problem-solving, or understanding a situation.
Types of Data
Primary Data: Primary data is data that is collected directly by the researcher for the specific purpose of the study.
It is first-hand and original. For examples: Survey responses, Observations from an experiment, Interviews
conducted by the researcher.
Key Characteristics of Primary Data:
➢ Original Source: Collected directly by the researcher from first-hand sources.
➢ Tailored for Specific Research: The data is collected specifically to address the research question or
hypothesis.
➢ Up-to-Date: Since it is freshly gathered, primary data is current and relevant to the researcher's needs.
➢ Control over Data Collection: The researcher has full control over the process of gathering data, including
the design of the data collection methods.
➢ Resource Intensive: Collecting primary data often requires significant time, effort, and financial resources.
Secondary Data: Secondary data refers to data that has already been collected, processed, and published by
someone else for purposes other than the current research project. This type of data is often used to support research,
validate findings.
Qualitative data: Qualitative data is descriptive and non-numerical. This type of data is often gathered through
interviews, observations, or open-ended surveys and is typically used in social sciences, humanities, and market
research to understand experiences, opinions, or behaviors.
Quantitative data: Quantitative data refers to numerical data that can be measured, counted, and expressed in
numbers. Quantitative data allows researchers to identify patterns, make predictions, and generalize results across
populations because of its precise, measurable nature.
Cross-sectional data: Cross-sectional data refers to data collected at a single point in time across multiple subjects,
such as individuals, organizations, or countries.
Key Characteristics of Cross-Sectional Data:
➢ Single Time Frame: Data is collected at one specific point in time, rather than over an extended period.
➢ Multiple Entities: Cross-sectional data typically involves multiple subjects allowing for comparisons.
➢ Descriptive Analysis: It is often used for descriptive analysis, providing insights into the characteristics of a
population at a given moment.
Time series data: Time series data refers to a sequence of observations collected at successive points in time,
typically at equal intervals. This type of data is used to analyze trends, patterns, and fluctuations over time, making
it valuable for forecasting and understanding temporal dynamics in various fields, including economics, finance,
environmental science, and social sciences.
Panel data: Panel data (also as known longitudinal data) refers to a dataset that combines both cross-sectional and
time series data. It consists of observations on multiple subjects (such as individuals, firms, countries, etc.) collected
at multiple points in time. For example: Data on GDP, unemployment rates, and inflation for various countries over
several years.
Unstructured Data: Unstructured data lacks a predefined structure and is more difficult to organize, often involving
text, images, videos, or other formats. Examples: Customer reviews written in free text form.
Big Data: Big Data refers to extremely large and complex datasets that traditional data processing software cannot
efficiently handle. Big Data has gained prominence due to the exponential increase in data generation, driven by
advancements in technology and the rise of the internet.
Observation: Observation is a fundamental research method used to collect data and gather insights about
behaviors, events, or phenomena in a systematic manner. Observational methods are commonly used in various
fields, including social sciences, psychology, education, and natural sciences.
Interview: Interviews are a qualitative research method used to gather in-depth information from individuals
through direct interaction. This method involves asking questions to obtain insights, opinions, experiences, and
perspectives on a specific topic.
Types of Interviews
➢ Structured Interviews: In structured interviews, researchers ask a predefined set of questions in a specific
order. This approach ensures consistency across interviews, making it easier to compare responses.
➢ Semi-Structured Interviews: Semi-structured interviews combine predetermined questions with the flexibility
to explore topics in more depth as they arise during the conversation.
➢ Unstructured Interviews: Unstructured interviews are informal and do not follow a specific format or order.
This approach can lead to unexpected insights.
➢ Focus Group Interviews: Focus groups involve guided discussions with a small group of participants. A
moderator facilitates the conversation, encouraging participants to share their thoughts and interact with each
other.
Advantages of Interviews:
➢ In-Depth Information: Interviews allow researchers to gather rich, detailed information and insights that may
not be captured through surveys or other methods.
➢ Flexibility: The interviewer can adapt questions and explore topics in more depth based on participant responses,
allowing for a more comprehensive understanding.
➢ Non-Verbal Cues: Interviews provide an opportunity to observe non-verbal communication, such as body
language and facial expressions, which can enhance understanding.
➢ Building Rapport: Interviews can create a comfortable environment for participants, encouraging them to share
more openly and honestly.
➢ Clarification: Interviewers can clarify questions or concepts as needed, ensuring that participants fully
understand what is being asked.
Disadvantages of Interviews
➢ Time-Consuming: Conducting interviews can be labor-intensive and time-consuming, both for researchers and
participants.
➢ Subjectivity: Interviewer bias can influence the way questions are asked or interpreted, potentially affecting the
responses.
➢ Limited Generalizability: Findings from interviews may not be generalizable to larger populations due to the
typically small sample size.
➢ Data Analysis: Analyzing qualitative data from interviews can be complex and requires careful coding and
interpretation.
➢ Interviewer Influence: The presence and demeanor of the interviewer can impact participant responses, leading
to potential biases.
Questionnaire method:
The Questionnaire Method is a popular data collection technique where a set of written questions is used to gather
information from respondents.
Types of Questionnaires
➢ Structured Questionnaires: Consist of predefined questions with fixed response options (e.g., multiple-choice,
Likert scales, yes/no questions). These are easy to administer and analyze. Example: A survey asking
respondents to rate their satisfaction with a product on a scale of 1 to 5.
➢ Unstructured Questionnaires: Open-ended questions that allow respondents to answer in their own words.
These provide more in-depth qualitative data but are harder to analyze systematically. Example: A survey asking
respondents to describe their experience using a product.
➢ Semi-Structured Questionnaires: A combination of structured and unstructured questions. Some questions
have fixed responses, while others are open-ended, allowing for a balance of quantitative and qualitative data.
Example: A questionnaire with multiple-choice questions about demographics, followed by open-ended
questions about opinions.
Inferential statistics: Inferential statistics involves using data from a sample to draw conclusions or make
inferences about a larger population. It provides tools to analyze data, test hypotheses, and make predictions. Unlike
descriptive statistics, which summarizes data, inferential statistics uses probability theory to make generalizations.
Main Techniques in Inferential Statistics are: Point Estimation, Hypothesis Testing, linear Regression Analysis, and
Multiple Regression Analysis.
Problem #01
Let us suppose that we want a sample of size n = 30 to be drawn from a population of size N = 8000 which is divided
into three strata of size N1= 4000, N2= 2400 and N3= 1600. Adopting proportional allocation, we shall get the
sample sizes as under for the different strata:
n x N1 n x N2 n x N3
Solution: n=------------ + -------------- + --------------
N N N
Problem #03
A population is divided into three strata so that N1 = 5000, N2 = 2000 and N3 = 3000. Respective standard deviations
are: σ1 =15, σ2 =18 and σ3=5. How should a sample of size n = 84 be allocated to the three strata, if we want
optimum allocation using disproportionate sampling design?
Solution: Using the disproportionate sampling design for optimum allocation, the sample sizes for different strata
will be determined as under:
n. N1. σ1
ni=----------------------------------------------------
N1. σ1 + N2. σ2 + N3. σ3
Sample size for strata with N1= 5000
84x5000x15
n1 =--------------------------------------------------- = 50
(5000x15)+ (2000x18)+(3000x5)
Sample size for strata with N2= 2000
84x2000x18
n2 =--------------------------------------------------- = 24
(5000x15)+ (2000x18)+(3000x5)
Sample size for strata with N3= 3000
84x3000x5
n3 =--------------------------------------------------- = 10
(5000x15)+ (2000x18)+(3000x5)
Problem #04
The following are the number of departmental stores in 15 cities: 35, 17, 10, 32, 70, 28, 26, 19, 26, 66, 37, 44, 33,
29 and 28. If we want to select a sample of 10 stores, using cities as clusters and selecting within clusters proportional
to size, how many stores from each city should be chosen? (Use a starting point of 10).
Solution:
Since in the given problem, we have 500 departmental stores from which we have to select a sample of 10 stores,
the appropriate sampling interval is 50. As we have to use the starting point of 10th, so we add successively
increments of 50 till 10 numbers have been selected. The numbers, thus, obtained are: 10, 60, 110, 160, 210, 260,
310, 360, 410 and 460 which have been shown in the last column of the table against the concerning cumulative
totals. From this we can say that two stores should be selected randomly from city number five and one each from
city number 1, 3, 7, 9, 10, 11, 12, and 14. This sample of 10 stores is the sample with probability proportional to
size.
State the meaning of Universe, Population, and Sample with an appropriate example.
Universe: The universe (also known as the target population or theoretical population) is the broadest group or
collection of elements that a researcher wants to study. It includes all potential subjects or elements that fit the
criteria of interest. For example: In a study about the reading habits of students, the universe could be all students
worldwide.
Population: The population is a specific group of individuals or items within the universe that meet the criteria of
interest for the research. It is more narrowly defined than the universe and often focuses on a specific geographical,
temporal, or contextual subset. For example: Continuing from the earlier example, the population might be all
students in higher education institutions in Bangladesh.
Sample: The sample is a smaller subset of the population that is actually studied or surveyed. Researchers collect
data from the sample because it is usually impractical or impossible to study the entire population. The sample
should be representative of the population to allow for generalization of results. For example: In the same study, the
sample could be 500 higher education students from 20 higher academic institutions across different states in the
Bangladesh.
Pilot Survey
A pilot survey is a small-scale preliminary study conducted before the main survey to test the feasibility, design,
methodology, and effectiveness of the research instruments, such as questionnaires or interviews. The purpose of a
pilot survey is to identify potential issues and make necessary adjustments before conducting the full-scale research.
Key Features of a Pilot Survey
➢ Small Sample Size: Pilot surveys are conducted with a limited number of participants, often representing the
target population. The small sample allows researchers to identify issues without expending too many resources.
➢ Testing Research Instruments: The primary goal is to test the survey questionnaire, interview guide, or other
data collection tools for clarity, relevance, and effectiveness. Researchers check if the questions are clear, if they
yield the required information, and if the respondents understand them properly.
➢ Identifying Flaws: Researchers look for issues such as ambiguous or confusing questions, response bias, and
technical difficulties (e.g., problems with online survey platforms). Pilot surveys also identify logistical
problems, like the time required to complete the survey or how to access the target population.
➢ Preliminary Data Analysis: The data from a pilot survey are usually analyzed to identify potential challenges
in data collection and processing. Although the sample is small, it can give researchers insights into patterns or
trends. This helps researchers refine or modify their survey design.
➢ Testing Operational Procedures: It checks the operational side of the survey, including participant recruitment,
distribution methods, and response rates. Researchers can test how well they are able to administer the survey
under real-world conditions.
Why do researchers conduct pilot survey?
➢ Validate Survey Design: To ensure that the survey questions are appropriately worded and relevant to the
research objectives.
➢ Improve Clarity: To identify ambiguous questions or questions that could be misinterpreted by respondents.
➢ Check Timing: To ensure that the survey doesn’t take too long to complete, which could result in participant
fatigue or low response rates.
➢ Refine Sampling Methods: To test whether the proposed sampling method is effective in reaching the intended
audience.
➢ Assess Data Collection Tools: To test whether the survey tools (online platforms, paper forms, etc.) are
functioning properly and collecting data as intended.
➢ Minimize Errors: To catch potential sources of error early, such as technical problems with online surveys,
incorrect question formats, or incomplete instructions.
➢ Adjust Survey Flow: To determine whether the order of questions is logical and easy to follow for participants.
Data Editing
Editing of data is the process of reviewing and adjusting raw data to ensure that it is accurate, complete, consistent,
and ready for analysis. The editing process ensures that the data is clean, reliable, and suitable for drawing
meaningful conclusions.
Objectives of Data Editing
➢ Ensure Accuracy: Correcting data entry errors, such as typos, misreported figures, or coding mistakes.
➢ Improve Completeness: Filling in missing data where possible or addressing incomplete responses.
➢ Enhance Consistency: Ensuring that the data follows the same format throughout, such as having consistent
units (e.g., all weights in kilograms, dates in a standard format).
➢ Eliminate Irrelevant Data: Removing data that is not relevant to the research objectives or analysis (e.g.,
outliers or unintentional duplications).
➢ Facilitate Analysis: Making the data suitable for statistical analysis or other methods of examination, ensuring
it meets the required standards for the chosen techniques.
Coding of Data
Coding of data refers to the process of converting categorical or non-numerical data (such as responses from surveys,
interviews, or observations) into numerical values to facilitate statistical analysis. This is especially useful for
variables that are not inherently numerical, like gender, occupation, or educational level, so that the data can be
analyzed using statistical techniques like regression, correlation, or factor analysis.