Research method lecture notes
Research method lecture notes
Descriptive statistics
presenting data in a meaningful way. Unlike inferential statistics, which involves making
outline the characteristics of a dataset. It plays a vital role in research, business, economics,
it lays the groundwork for further exploration and informed decision-making. Its applications
extend across multiple disciplines, including finance, psychology, social sciences, and
engineering.
These measures summarize data by identifying a central point within a dataset. The three main
Mean: Also known as the average, it is calculated by summing all values in a dataset and
dividing by the total number of observations. The mean provides a general representation
of the data but can be affected by extreme values (outliers). It is widely used in economic
number of values, the median is the average of the two middle values. It is less sensitive
to outliers than the mean, making it a more reliable measure when dealing with skewed
distributions. For instance, median income is often used instead of mean income because
Mode: The mode is the most frequently occurring value in a dataset. A dataset may have
no mode, one mode (unimodal), or multiple modes (bimodal or multimodal). The mode is
particularly useful in categorical data analysis, such as determining the most common
These measures provide insight into the spread or variability of data points within a dataset. The
Range: The range is the difference between the maximum and minimum values in a
dataset. While easy to compute, it does not provide information about data distribution.
For instance, if two datasets have the same range but different distributions, the range
Variance: Variance is a statistical measure that quantifies how much the values in a
dataset deviate from the mean (average). It indicates the spread or dispersion of data
points. A higher variance means greater spread (data points are far from the mean),
while a lower variance means the data points are closer to the mean.
Standard Deviation: The standard deviation is the square root of the variance and
represents the average distance of data points from the mean. A higher standard deviation
indicates greater dispersion. Standard deviation is commonly used in quality control,
Interquartile Range (IQR): The IQR is the range between the first quartile (Q1) and the
third quartile (Q3), which eliminates the influence of extreme values and better represents
These measures describe the shape and symmetry of a dataset. The two key components are:
Positive skewness indicates that data is concentrated on the left, while negative skewness
heavy tails (outliers), while low kurtosis suggests light tails. It helps in identifying
whether a dataset has extreme values that may influence statistical conclusions.
Descriptive statistics are widely used in various fields, including business, healthcare, education,
1. Data Summarization
Descriptive statistics help summarize large datasets in a meaningful way. Instead of analyzing
thousands of data points, measures like mean, median, and mode provide quick insights into the
data’s key characteristics. For example, a hospital can use descriptive statistics to summarize
2. Comparison of Data
By using statistical measures such as mean, standard deviation, and quartiles, researchers can
compare different datasets effectively. For example, comparing the average income across
different regions helps in economic analysis. Businesses use descriptive statistics to evaluate
Descriptive statistics allow for the identification of trends in datasets. Businesses use statistics to
track sales performance over time, while epidemiologists monitor disease outbreaks using
statistical trends. Trend analysis is essential in stock market prediction, climate change studies,
4. Decision Making
making. Descriptive statistics provide a foundation for making informed decisions by presenting
relevant data insights. For example, insurance companies rely on statistical data to assess risks
5. Data Visualization
Descriptive statistics facilitate data visualization through tables, graphs, and charts such as
histograms, pie charts, and box plots. These visual representations make it easier to interpret
complex datasets. Businesses use graphical summaries to present sales reports and market trends
to stakeholders.
A time series is a sequence of data points recorded at successive time intervals. It represents
observations collected over time, typically at regular intervals such as daily, monthly, quarterly,
or annually. Time series data can be used to analyze trends, patterns, and seasonal variations,
3. Cyclic Variations – Fluctuations that occur over longer periods without a fixed pattern.
Time series analysis helps in forecasting future values, identifying relationships between
Index numbers are statistical measures used to express changes in economic data over time,
allowing for comparisons between different periods. They help track variations in prices,
production, income, or other economic indicators by converting complex data into a simplified
numerical form.
3. Value Index – Reflects changes in both price and quantity to measure total revenue or
expenditure trends.
Index numbers are essential tools in economics and statistics for understanding relative changes
SAMPLING
Sampling is the process of selecting a subset of individuals from a larger population for analysis,
making data collection more manageable while Randomness ensures that each member has an
Sample design refers to the framework or strategy used to select a subset of individuals, items,
or observations from a larger population for study. It determines how the sample will be chosen,
ensuring that it accurately represents the entire population. A well-structured sample design
improves the reliability and validity of research findings while minimizing bias and errors.
Key Components of Sample Design:
Sampling Errors
Sampling errors occur when a sample is not representative of the population. This can happen
1. Random Sampling Error: This occurs when a sample is randomly selected, but it still doesn't
2. Bias Sampling Error: This occurs when the sampling process is biased, resulting in a sample
1. Small Sample Size: A small sample size can lead to sampling errors.
2. Poor Sampling Method: Using a poor sampling method, such as convenience sampling, can
Non-Sampling Errors
Non-sampling errors occur when there are errors in the data collection process, data processing,
or data analysis.
2. Data Entry Error: This occurs when data is entered incorrectly into a database or spreadsheet.
1. Poor Data Collection Methods: Using poor data collection methods, such as poorly designed
2. Human Error: Human error, such as data entry mistakes, can lead to non-sampling errors.
3. Technical Issues: Technical issues, such as software glitches, can lead to non-sampling errors.
Probability sampling is a sampling method in which every individual or unit in the population
has a known chance of being selected. This method ensures that every member of the population
2. Known probability: Every individual or unit has a known chance of being selected.
3. Equal opportunity: Every member of the population has an equal opportunity to be included in
the sample.
1. Simple Random Sampling: Every individual or unit is selected randomly from the population,
without replacement.
2. Systematic Random Sampling: Every nth individual or unit is selected from the population,
3. Stratified Random Sampling: The population is divided into strata, and a random sample is
4. Cluster Random Sampling: The population is divided into clusters, and a random sample of
clusters is selected.
1. Representative sample: Probability sampling ensures that the sample is representative of the
population.
parameters.
3. Reliable results: Probability sampling provides reliable results, as the sample is selected
population.
populations.
based on non-random criteria, such as convenience, judgment, or quota. This method does not
ensure that every member of the population has an equal chance of being selected.
1. Convenience Sampling: Selecting individuals or units that are easily accessible or convenient
to sample..
3. Quota Sampling: Selecting individuals or units to meet specific quotas or criteria, such as age,
sex, or income.
1. Time-efficient: Non-probability sampling can be faster and more efficient than probability
sampling.
4. Expertise: Non-probability sampling can be useful when selecting experts or individuals with
specific characteristics.
of the population.
2. Bias: Non-probability sampling can introduce bias into the sample, as the selection criteria
3. Limited generalizability: Non-probability sampling may limit the generalizability of the results
4. Lack of precision: Non-probability sampling may not provide precise estimates of population
parameters.
making. It can be in various forms, such as numbers, text, images, audio, and video. Data is used
to describe, analyze, and visualize information, helping individuals and organizations make
informed decisions.
Types of data
Primary data is original, raw data collected directly from the source, typically through
Secondary data is existing, pre-collected data that has been previously gathered, analyzed, and
published by others. It is secondhand information that can be used to answer research questions
or test hypotheses.
4. General: Collected for a broader purpose, not specific to the researcher's question.
4. Advantages: Provides in-depth understanding, captures nuanced information, and allows for
exploration.
behaviors.
relationships.
5. Disadvantages: May not capture changes over time, can be influenced by external factors.
trends.
analysis of changes.
engagement.
5. Disadvantages: Can be challenging to collect and analyze, requires specialized skills and
software.
popular method of collecting primary data, especially in social sciences, marketing, and
healthcare research.
Types of Questionnaires
1. Structured Questionnaires: Questions are pre-determined and respondents are asked to select
2. Unstructured Questionnaires: Questions are open-ended and respondents are free to provide
detailed answers.
Advantages
method.
2. Time-Efficient: Respondents can complete questionnaires at their own pace, making it a time-
efficient method.
4. Anonymity: Respondents can remain anonymous, which can increase the likelihood of honest
responses.
Disadvantages
1. Response Rate: The response rate may be low, especially if the questionnaire is lengthy or
complex.
2. Bias: Respondents may provide biased answers, especially if they have a vested interest in the
outcome.
3. Lack of Depth: Questionnaires may not provide in-depth answers, especially if the questions
are closed-ended.
4. Data Quality: The quality of data collected may be poor if the questionnaire is not well-
Administration of Questionnaires
2. Inferential Statistics: Use inferential statistics to draw conclusions about the population.
4. Thematic Analysis: Use thematic analysis to identify patterns and themes in the data.
I. Clear Objectives
1. Define the purpose: Clearly define the purpose of the questionnaire and what you want to
achieve.
2. Specific goals: Identify specific goals and objectives that the questionnaire should fulfill.
1. Avoid jargon: Avoid using technical terms or jargon that respondents may not understand.
2. Simple vocabulary: Use simple vocabulary and phrases that are easy to understand.
3. Avoid ambiguity: Avoid ambiguous or unclear language that may confuse respondents.
1. Relevant questions: Ask only relevant questions that are related to the purpose of the
questionnaire.
2. Concise questions: Keep questions concise and to the point, avoiding unnecessary words or
phrases.
3. Avoid leading questions: Avoid leading questions that may influence respondents' answers.
1. Introduction: Start with an introduction that explains the purpose of the questionnaire.
2. Warm-up questions: Begin with warm-up questions that are easy to answer and help
3. Transition questions: Use transition questions to move from one topic to another.
4. Conclusion: End with a conclusion that thanks respondents for their time and participation.
V. Question Types
information.
of options.
4. Rating scales: Use rating scales to measure respondents' attitudes or opinions.
1. Avoid leading questions: Avoid leading questions that may influence respondents' answers.
2. Avoid loaded questions: Avoid loaded questions that may contain emotionally charged
language.
1. Pilot test: Pilot test the questionnaire with a small group of respondents to identify any issues
or problems.
2. Revise and refine: Revise and refine the questionnaire based on feedback from pilot testers.
2. Guidance: Provide guidance on any technical terms or concepts used in the questionnaire.
X. Feedback Mechanism
1. Feedback mechanism: Provide a feedback mechanism for respondents to provide comments or
suggestions.
2. Respondent feedback: Use respondent feedback to improve the questionnaire and make it
more effective.
Definition
Types of Interviews
information.
information.
Advantages
1. In-Depth Information: Interviews can provide in-depth and detailed information on a specific
topic or issue.
collection.
3. Personal Interaction: Interviews allow for personal interaction between the researcher and
Disadvantages
3. Bias: Interviews can be biased if the researcher's questions or demeanor influence the
respondent's answers.
Observation is a method of collecting primary data by watching and recording behavior, events,
Types of Observation
1. Participant Observation: The researcher participates in the activity or event being observed.
3. Structured Observation: A set of pre-determined criteria are used to observe and record
behavior or events.
4. Unstructured Observation: An open-ended approach is used to observe and record behavior or
events.
Advantages
1. Rich Data: Observation can provide rich and detailed data on behavior, events, or phenomena.
2. Natural Setting: Observation can be conducted in a natural setting, which can provide a more
collection.
Disadvantages
3. Reactivity: Observation can be reactive, as the presence of the researcher may influence the
Definition
Documentary secondary data refers to existing data that is contained in documents, such as
2. Academic Journals: Academic journals publish research articles, reviews, and other scholarly
4. Newspaper and Magazine Articles: Newspaper and magazine articles can provide information
5. Archival Records: Archival records, such as letters, diaries, and other historical documents,
1. Published Documents: Published documents, such as books, articles, and reports, that are
widely available.
2. Unpublished Documents: Unpublished documents, such as letters, diaries, and other archival
4. Personal Documents: Personal documents, such as letters, diaries, and autobiographies, that
3. Time-Saving: Documentary secondary data can save time, as it provides existing information
4. Established Validity: Documentary secondary data has already been validated by the original
authors or researchers.
1. Limited Control: Researchers have limited control over the data collection process and
methodology.
2. Potential Bias: Documentary secondary data may be biased due to the original authors'
perspectives or methodologies.
3. Outdated Information: Documentary secondary data may be outdated, which can limit its
4. Limited Depth: Documentary secondary data may lack depth or detail, which can limit its
4. Established Validity: Secondary data has already been validated by the original researchers or
authors.
6. Comparability: Secondary data can be compared across different studies, populations, or time
periods.
1. Limited Control: Researchers have limited control over the data collection process and
methodology.
2. Potential Bias: Secondary data may be biased due to the original researchers' perspectives or
methodologies.
3. Outdated Information: Secondary data may be outdated, which can limit its relevance and
usefulness.
4. Limited Depth: Secondary data may lack depth or detail, which can limit its usefulness for
5. Inconsistent Quality: Secondary data can vary in quality, which can affect the accuracy and
6. Difficulty in Verification: Secondary data can be difficult to verify, especially if the original
and data.
8. Dependence on Original Research: Secondary data is dependent on the quality and accuracy of
Survey-based secondary data refers to existing data that was collected through surveys, but is
1. Questionnaire responses
2. Interview transcripts
3. Opinion polls
4. Attitude surveys
2. Government reports
Survey-based secondary data can be quantitative (numerical) or qualitative (text-based), and can
1. Time-saving: Secondary data is already collected, so you don't need to spend time and
3. Wide coverage: Secondary data can provide a broad coverage of topics, industries, or
populations.
4. Established reliability: Secondary data has already been collected and analyzed, so its
5. Comparability: Secondary data can be compared across different studies, industries, or time
periods.
1. Limited control: You have limited control over the data collection process, which can affect
data quality.
2. Lack of relevance: Secondary data may not be directly relevant to your research question or
objectives.
3. Outdated data: Secondary data may be outdated, which can affect its accuracy and relevance.
4. Bias and errors: Secondary data may contain biases or errors, which can affect research
findings.
5. Limited depth: Secondary data may lack depth and detail, which can limit its usefulness for in-
depth analysis.
Overall, secondary data can be a valuable resource for research, but it's essential to carefully
Data Presentation
Data presentation refers to the process of communicating data insights and findings in a clear,
concise, and meaningful way. It involves using various techniques and tools to present data in a
1. Communicate complex data insights: Clearly convey complex data findings to both technical
2. Support decision-making: Provide actionable insights that inform business decisions, policy
3. Identify trends and patterns: Highlight important trends, patterns, and correlations within the
data.
4. Tell a story: Use data to tell a compelling story that resonates with the audience.
1. Tables and charts: Use tables, bar charts, line graphs, and other visualizations to display data.
2. Graphs and plots: Utilize scatter plots, histograms, and other graphical representations to show
3. Infographics: Combine data visualizations, images, and text to create engaging and
informative graphics.
4. Reports and dashboards: Compile data insights into comprehensive reports or interactive
dashboards.
5. Presentations and storytelling: Use narrative techniques to present data insights and findings in
Tabulation is the process of organizing and presenting data in a systematic and structured format,
typically in a table or spreadsheet. It involves arranging data into rows and columns to facilitate
1. Simplify complex data: Break down large datasets into manageable and understandable parts.
2. Identify patterns and trends: Reveal relationships and correlations within the data.
periods.
4. Support analysis and decision-making: Provide a clear and concise format for data analysis
and interpretation.
Types of tabulation:
2. Cross-tabulation: Involves creating tables that show relationships between two or more
variables.
3. Multiple tabulation: Involves creating tables that show relationships between multiple
variables.
Two-Way Table
1. Two variables: A two-way table, also known as a contingency table, displays the relationship
2. Rows and columns: One variable is represented by the rows, and the other variable is
two variables.
4. Example: A table showing the relationship between gender (male/female) and education level
Three-Way Table
1. Three variables: A three-way table displays the relationship between three variables.
2. Multiple layers: A three-way table can be thought of as multiple two-way tables stacked on top
of each other.
3. Pages or layers: Each layer or page represents one level of the third variable.
4. Cell contents: Each cell contains the frequency or value associated with the combination of the
three variables.
5. Example: A table showing the relationship between gender (male/female), education level
In summary, the main difference between two-way and three-way tables is the number of
variables being analyzed. Two-way tables examine the relationship between two variables, while
Textual Form
2. Qualitative data: Often used for qualitative data, such as opinions, attitudes, or open-ended
responses.
3. Summary and analysis: Textual form provides a summary and analysis of the data,
Percentage Form
1. Numerical data: Data is presented in numerical form, with percentages used to show
proportions or rates.
2. Quantitative data: Often used for quantitative data, such as frequencies, proportions, or rates.
3. Comparison and trend analysis: Percentage form facilitates comparison and trend analysis,
4. Example: A table showing the percentage of customers who rated a product as "satisfactory"
or "unsatisfactory".