RM 4
RM 4
Data:
Data refers to raw, unprocessed facts, figures, or observations collected from the real world. Data is often
presented in the form of numbers, symbols, text, or signals. Data by itself does not have meaning or context until it
is interpreted or analyzed.
Key Characteristics of Data:
Raw: Data is unorganized and unprocessed. It may consist of numbers, text, symbols, images, sounds, other
formats.
Unstructured or Structured:
Structured data: Data that is organized and stored in a specific format, like in databases or spreadsheets
(e.g., names, addresses, financial figures).
Unstructured data: Data that doesn’t follow a specific format or organization, like emails, social media
posts, or videos.
Without Context: Data alone does not have inherent meaning. It needs to be processed or interpreted to
become meaningful.
Collected through Observation or Measurement: Data comes from observations, experiments, surveys,
sensors, or other sources.
Information: Information is data that has been processed, organized, structured in a way that adds meaning,
context, and usefulness. It is the result of analyzing and interpreting raw data, transforming it into a form that can
help with decision-making, problem-solving, or understanding a situation.
Types of Data
Primary Data: Primary data is data that is collected directly by the researcher for the specific purpose of the
study. It is first-hand and original. For examples: Survey responses, Observations from an experiment, Interviews
conducted by the researcher.
Key Characteristics of Primary Data:
Original Source: Collected directly by the researcher from first-hand sources.
Tailored for Specific Research: The data is collected specifically to address the research question or
hypothesis.
Up-to-Date: Since it is freshly gathered, primary data is current and relevant to the researcher's needs.
Control over Data Collection: The researcher has full control over the process of gathering data, including
the design of the data collection methods.
Resource Intensive: Collecting primary data often requires significant time, effort, and financial resources.
Secondary Data: Secondary data refers to data that has already been collected, processed, and published by
someone else for purposes other than the current research project. This type of data is often used to support
research, validate findings.
Qualitative data: Qualitative data is descriptive and non-numerical. This type of data is often gathered through
interviews, observations, or open-ended surveys and is typically used in social sciences, humanities, and market
research to understand experiences, opinions, or behaviors.
Quantitative data: Quantitative data refers to numerical data that can be measured, counted, and expressed in
numbers. Quantitative data allows researchers to identify patterns, make predictions, and generalize results across
populations because of its precise, measurable nature.
Cross-sectional data: Cross-sectional data refers to data collected at a single point in time across multiple
subjects, such as individuals, organizations, or countries.
Time series data: Time series data refers to a sequence of observations collected at successive points in time,
typically at equal intervals. This type of data is used to analyze trends, patterns, and fluctuations over time, making
it valuable for forecasting and understanding temporal dynamics in various fields, including economics, finance,
environmental science, and social sciences.
Panel data: Panel data (also as known longitudinal data) refers to a dataset that combines both cross-sectional
and time series data. It consists of observations on multiple subjects (such as individuals, firms, countries, etc.)
collected at multiple points in time. For example: Data on GDP, unemployment rates, and inflation for various
countries over several years.
Structured Data: Structured data is highly organized and easily searchable in a fixed format, often stored in
databases or spreadsheets. Examples: Employee data in an Excel sheet (e.g., names, employee IDs, salaries).
Unstructured Data: Unstructured data lacks a predefined structure and is more difficult to organize, often
involving text, images, videos, or other formats. Examples: Customer reviews written in free text form.
Big Data: Big Data refers to extremely large and complex datasets that traditional data processing software cannot
efficiently handle. Big Data has gained prominence due to the exponential increase in data generation, driven by
advancements in technology and the rise of the internet.
Observation: Observation is a fundamental research method used to collect data and gather insights about
behaviors, events, or phenomena in a systematic manner. Observational methods are commonly used in various
fields, including social sciences, psychology, education, and natural sciences.
Interview: Interviews are a qualitative research method used to gather in-depth information from individuals
through direct interaction. This method involves asking questions to obtain insights, opinions, experiences, and
perspectives on a specific topic.
Types of Interviews
Structured Interviews: In structured interviews, researchers ask a predefined set of questions in a specific
order. This approach ensures consistency across interviews, making it easier to compare responses.
Semi-Structured Interviews: Semi-structured interviews combine predetermined questions with the flexibility
to explore topics in more depth as they arise during the conversation.
Unstructured Interviews: Unstructured interviews are informal and do not follow a specific format or order.
This approach can lead to unexpected insights.
Focus Group Interviews: Focus groups involve guided discussions with a small group of participants. A
moderator facilitates the conversation, encouraging participants to share their thoughts and interact with each
other.
Advantages of Interviews:
In-Depth Information: Interviews allow researchers to gather rich, detailed information and insights that may
not be captured through surveys or other methods.
Flexibility: The interviewer can adapt questions and explore topics in more depth based on participant
responses, allowing for a more comprehensive understanding.
Non-Verbal Cues: Interviews provide an opportunity to observe non-verbal communication, such as body
language and facial expressions, which can enhance understanding.
Building Rapport: Interviews can create a comfortable environment for participants, encouraging them to
share more openly and honestly.
Clarification: Interviewers can clarify questions or concepts as needed, ensuring that participants fully
understand what is being asked.
Disadvantages of Interviews
Time-Consuming: Conducting interviews can be labor-intensive and time-consuming, both for researchers and
participants.
Subjectivity: Interviewer bias can influence the way questions are asked or interpreted, potentially affecting
the responses.
Limited Generalizability: Findings from interviews may not be generalizable to larger populations due to the
typically small sample size.
Data Analysis: Analyzing qualitative data from interviews can be complex and requires careful coding and
interpretation.
Interviewer Influence: The presence and demeanor of the interviewer can impact participant responses,
leading to potential biases.
Questionnaire method:
The Questionnaire Method is a popular data collection technique where a set of written questions is used to gather
information from respondents.
Types of Questionnaires
Structured Questionnaires: Consist of predefined questions with fixed response options (e.g., multiple-
choice, Likert scales, yes/no questions). These are easy to administer and analyze. Example: A survey asking
respondents to rate their satisfaction with a product on a scale of 1 to 5.
Unstructured Questionnaires: Open-ended questions that allow respondents to answer in their own words.
These provide more in-depth qualitative data but are harder to analyze systematically. Example: A survey
asking respondents to describe their experience using a product.
Semi-Structured Questionnaires: A combination of structured and unstructured questions. Some questions
have fixed responses, while others are open-ended, allowing for a balance of quantitative and qualitative data.
Example: A questionnaire with multiple-choice questions about demographics, followed by open-ended
questions about opinions.
Inferential statistics: Inferential statistics involves using data from a sample to draw conclusions or make
inferences about a larger population. It provides tools to analyze data, test hypotheses, and make predictions.
Unlike descriptive statistics, which summarizes data, inferential statistics uses probability theory to make
generalizations.
Main Techniques in Inferential Statistics are: Point Estimation, Hypothesis Testing, linear Regression Analysis,
and Multiple Regression Analysis.
Problem #01
Let us suppose that we want a sample of size n = 30 to be drawn from a population of size N = 8000 which is
divided into three strata of size N1= 4000, N2= 2400 and N3= 1600. Adopting proportional allocation, we shall
get the sample sizes as under for the different strata:
n x N1 n x N2 n x N3
Solution: n=------------ + -------------- + --------------
N N N
Problem #04
The following are the number of departmental stores in 15 cities: 35, 17, 10, 32, 70, 28, 26, 19, 26, 66, 37, 44, 33,
29 and 28. If we want to select a sample of 10 stores, using cities as clusters and selecting within clusters
proportional to size, how many stores from each city should be chosen? (Use a starting point of 10).
Solution:
Since in the given problem, we have 500 departmental stores from which we have to select a sample of 10 stores,
the appropriate sampling interval is 50. As we have to use the starting point of 10th, so we add successively
increments of 50 till 10 numbers have been selected. The numbers, thus, obtained are: 10, 60, 110, 160, 210, 260,
310, 360, 410 and 460 which have been shown in the last column of the table against the concerning cumulative
totals. From this we can say that two stores should be selected randomly from city number five and one each from
city number 1, 3, 7, 9, 10, 11, 12, and 14. This sample of 10 stores is the sample with probability proportional to
size.
State the meaning of Universe, Population, and Sample with an appropriate example.
Universe: The universe (also known as the target population or theoretical population) is the broadest group or
collection of elements that a researcher wants to study. It includes all potential subjects or elements that fit the
criteria of interest. For example: In a study about the reading habits of students, the universe could be all students
worldwide.
Population: The population is a specific group of individuals or items within the universe that meet the criteria of
interest for the research. It is more narrowly defined than the universe and often focuses on a specific
geographical, temporal, or contextual subset. For example: Continuing from the earlier example, the population
might be all students in higher education institutions in Bangladesh.
Sample: The sample is a smaller subset of the population that is actually studied or surveyed. Researchers collect
data from the sample because it is usually impractical or impossible to study the entire population. The sample
should be representative of the population to allow for generalization of results. For example: In the same study,
the sample could be 500 higher education students from 20 higher academic institutions across different states in
the Bangladesh.
Pilot Survey
A pilot survey is a small-scale preliminary study conducted before the main survey to test the feasibility, design,
methodology, and effectiveness of the research instruments, such as questionnaires or interviews. The purpose of a
pilot survey is to identify potential issues and make necessary adjustments before conducting the full-scale
research.
Key Features of a Pilot Survey
Small Sample Size: Pilot surveys are conducted with a limited number of participants, often representing the
target population. The small sample allows researchers to identify issues without expending too many
resources.
Testing Research Instruments: The primary goal is to test the survey questionnaire, interview guide, or other
data collection tools for clarity, relevance, and effectiveness. Researchers check if the questions are clear, if
they yield the required information, and if the respondents understand them properly.
Identifying Flaws: Researchers look for issues such as ambiguous or confusing questions, response bias, and
technical difficulties (e.g., problems with online survey platforms). Pilot surveys also identify logistical
problems, like the time required to complete the survey or how to access the target population.
Preliminary Data Analysis: The data from a pilot survey are usually analyzed to identify potential challenges
in data collection and processing. Although the sample is small, it can give researchers insights into patterns or
trends. This helps researchers refine or modify their survey design.
Testing Operational Procedures: It checks the operational side of the survey, including participant
recruitment, distribution methods, and response rates. Researchers can test how well they are able to
administer the survey under real-world conditions.
Data Editing
Editing of data is the process of reviewing and adjusting raw data to ensure that it is accurate, complete,
consistent, and ready for analysis. The editing process ensures that the data is clean, reliable, and suitable for
drawing meaningful conclusions.
Objectives of Data Editing
Ensure Accuracy: Correcting data entry errors, such as typos, misreported figures, or coding mistakes.
Improve Completeness: Filling in missing data where possible or addressing incomplete responses.
Enhance Consistency: Ensuring that the data follows the same format throughout, such as having consistent
units (e.g., all weights in kilograms, dates in a standard format).
Eliminate Irrelevant Data: Removing data that is not relevant to the research objectives or analysis (e.g.,
outliers or unintentional duplications).
Facilitate Analysis: Making the data suitable for statistical analysis or other methods of examination, ensuring
it meets the required standards for the chosen techniques.
Coding of Data
Coding of data refers to the process of converting categorical or non-numerical data (such as responses from
surveys, interviews, or observations) into numerical values to facilitate statistical analysis. This is especially useful
for variables that are not inherently numerical, like gender, occupation, or educational level, so that the data can be
analyzed using statistical techniques like regression, correlation, or factor analysis.