Updated Module 5 Data Analysis
Updated Module 5 Data Analysis
DATA ANALYSIS
Syllabus for Module 5
Basic data analysis: Descriptive Statistics, Univariate and Bivariate, Parametric
& Non-Parametric Tests; Null & Alternative Hypothesis, Error in Testing of
Hypothesis, Critical Region, Degrees of Freedom, One Tailed & Two Tailed Tests,
Standard Error; Procedure for Testing of Hypothesis. Parametric test, Non
parametric test (Conditions for applicability, practical applicability,
Implementation and statistical Inference of all the above tests)
Meaning of Analysis
In the context of Business Research, analysis refers to the systematic examination
and interpretation of data to extract meaningful insights that can inform decision-
making.
Data analysis
Data analysis is the application of reasoning to understand the data
that have been gathered. In its simplest form, analysis may involve
determining consistent patterns and summarizing the relevant details
revealed in the investigation. The appropriate analytical technique for
data analysis will be determined by management’s information
requirements, the characteristics of the research design, and the
nature of the data gathered. Statistical analysis may range from
portraying a simple frequency distribution to more complex
multivariate analyses approaches, such as multiple regression.
Later chapters will discuss three general categories of statistical
analysis: univariate analysis, bivariate analysis, and multivariate
analysis.
Key Aspects of Analysis
Aspect Description
Data Collection Gathering relevant data through surveys, experiments, observations, or existing
databases.
Data Cleaning Preparing the data by removing inconsistencies, errors, and irrelevant information to
ensure accuracy.
Descriptive Analysis Summarizing the basic features of the data, providing simple summaries about the
sample and the measures (e.g., mean, median, mode).
Inferential Analysis Drawing conclusions about a population based on sample data, using statistical methods
to test hypotheses (e.g., t-tests, ANOVA).
Predictive Analysis Using historical data to make predictions about future outcomes or trends through
methods like regression analysis.
Diagnostic Analysis Examining data to understand the causes of past outcomes, often by identifying patterns
or correlations.
Prescriptive Analysis Providing recommendations based on the analysis, often through simulation or
optimization techniques.
Visualization Creating charts, graphs, and dashboards to represent data visually, making it easier to
understand and communicate findings.
TYPES OF ANALYSIS
Types Description
Descriptive Analysis Summarizes and describes the main features of a dataset. It provides simple statistics (mean, median,
mode) and visualizations (charts, graphs).
Inferential Analysis Makes inferences about a population based on a sample. It uses statistical tests (e.g., t-tests, ANOVA) to
draw conclusions and test hypotheses.
Predictive Analysis Uses historical data to predict future outcomes. It employs techniques like regression analysis and machine
learning models.
Diagnostic Analysis Examines data to understand the causes of past outcomes. It identifies patterns and relationships to
explain why certain events occurred.
Prescriptive Analysis Provides recommendations for actions based on data insights. It often uses optimization and simulation
techniques to suggest the best course of action.
Exploratory Data Analysis Involves analyzing datasets to discover patterns, spot anomalies, and test hypotheses, often using visual
(EDA) methods.
Text Analysis Analyzes textual data to extract meaningful information, often using natural language processing
techniques.
Time Series Analysis Analyzes data points collected or recorded at specific time intervals to identify trends, seasonal patterns, or
cycles over time.
Comparative Analysis Compares different datasets or variables to highlight differences and similarities, often to assess
performance or outcomes.
Multivariate Analysis Analyzes multiple variables simultaneously to understand complex relationships and effects among them
(e.g., factor analysis, cluster analysi
Importance of Analysis in Business Research
Informed Decision-Making: Helps managers and stakeholders make data-driven decisions.
Identifying Trends: Uncovers market trends, customer preferences, and potential areas for growth.
Risk Management: Identifies potential risks and provides insights for mitigation strategies.
Application of Statistical Tools in Analysis
Statistical Tool Application
Descriptive Statistics Summarizes and describes the main features of a dataset using measures like mean,
median, mode, and standard deviation. Useful for initial data exploration.
Inferential Statistics Allows conclusions about a population based on sample data. Techniques include t-
tests, chi-square tests, and ANOVA. Used in hypothesis testing and confidence interval
estimation.
Regression Analysis Analyzes relationships between variables. Used for prediction and forecasting, helping
to understand how changes in one variable affect another.
Correlation Analysis Measures the strength and direction of relationships between variables. Commonly
used in market research to identify associations between factors.
ANOVA (Analysis of Compares means across multiple groups to determine if at least one group mean is
Variance) different. Useful in experiments and product testing.
Chi-Square Test Assesses the association between categorical variables. Often used in survey analysis
to understand relationships in responses.
Factor Analysis Identifies underlying relationships between variables by grouping them into factors.
Used in market research to reduce data complexity.
Cluster Analysis Segments data into distinct groups based on similarity. Used in customer segmentation
and targeting in marketing.
Time Series Analysis Analyzes data points collected over time to identify trends and seasonal patterns.
Common in sales forecasting and economic analysis.
Exposure to Software Packages Used in Data Analysis
Software
Description Key Features Applications
Package
A widely used tool for statistical User-friendly interface,
SPSS analysis, especially in social sciences extensive statistical tests, Survey analysis, market research, social science studies.
and business research. data visualization.
Drag-and-drop interface,
A data visualization tool that helps
real-time data analysis, wide Business intelligence, performance monitoring, data
Tableau users create interactive and
range of visualization storytelling.
shareable dashboards.
options.
SAS A software suite used for advanced Robust analytics capabilities, Clinical trials, predictive
analytics, business intelligence, and extensive data management analytics, data mining.
data management. features, enterprise-level
support.
MATLAB A high-level language and interactive Strong matrix operations, Engineering, scientific
environment for numerical extensive toolboxes for research, algorithm
computation, visualization, and specialized analysis. development.
programming.
Stata A statistical software package used User-friendly interface, Economics, biostatistics,
for data analysis, manipulation, and comprehensive statistical social science research.
professional graphics. functions, good for panel data.
Microsoft A business analytics tool that enables Real-time data visualization, Business performance
Power BI users to visualize data and share easy integration with other analysis, reporting, data
insights across the organization. Microsoft products, interactive exploration.
dashboards.
KNIME An open-source platform for data Visual programming interface, Data preprocessing,
analytics, reporting, and integration. extensive data manipulation machine learning, data
capabilities, supports various mining.
data sources.
Descriptive Statistics
Descriptive Statistics
Descriptive Statistics provide a concise summary of the main features of a
dataset, offering a simple overview without making any conclusions about
the data's meaning or the hypothesis being tested. They are used to
describe or summarize data in a meaningful manner.
Importance of Descriptive Statistics:
Provides insights into the data distribution, variability, and central tendency.
Significance of Descriptive Statistics in Business
Research Data analysis.
Descriptive statistics are crucial in business research as they:
Purpose Description
Summarize Data Provide a summary of the main aspects of the data.
Simplify Data Make data easier to interpret and understand.
Enable Comparison Facilitate comparison between different data sets.
Identify Patterns and Trends Reveal trends, patterns, or anomalies in the data.
Skewness Indicates the data's asymmetry around the mean. Positive skew indicates a
Measures of tail on the right, and negative skew indicates a tail on the left.
Shape
Kurtosis Describes the "tailedness" of the distribution.
Measures of Correlation Coefficient Measures the linear relationship between two variables.
Relationship
Example:
Histogram: Majority
Visualize the data Histograms, Bar
Graphical Methods of employees in 25-
distribution Charts, Pie Charts
35 age bracket
Bivariate analysis
Bivariate analysis is a statistical method used to determine the
relationship between two variables. It provides a simple and
straightforward way to study the correlation and potential causation
between two factors. By comparing two sets of data, researchers can
understand the strength and direction of the relationship, if any.
Types of Bivariate Relationships:
No Relationship:
Changes in one
variable do not predict
changes in the other.
Example
A retail store wants to understand the buying behavior of its customers.
They have data on customer age and the amount spent on their last
visit.
No Correlation: A
coefficient close to 0
indicates little to no
linear relationship.
Causation vs. Correlation:
Cross-tabulations:
• Used for categorical data.
• Shows the frequency distribution
of the categories of two variables.
Examples:
Analysis Type Example Description
Upward trend suggests
Advertising Spend vs. Sales positive relationship
Scatter Plot
Revenue between advertising spend
and sales revenue
Pearson's r value of 0.7
indicates strong positive
Pearson's Correlation Attendance vs. Final Grades
correlation between
attendance and final grades
Higher percentage of
Gender vs. Product females prefer Product A,
Cross-tabulation
Preference while males are evenly split
between products
Limitations:
Studying relationship
Examines relationship between between years of
Linear Regression
two continuous variables experience and salary in a
company
Non-Parametric Tests:
Tests that don't make stringent assumptions about the population parameters. They are often used for ordinal or
nominal data.
Assumptions: Fewer assumptions compared to parametric tests. No need for normal distribution or equal
variance.
Popular Non-Parametric Tests:
Product Testing:
• H₀: The new product design does not increase sales.
• H₁: The new product design increases sales.
Employee Training:
• H₀: The new training program has no effect on employee
productivity.
• H₁: The new training program improves employee productivity.
Errors in Testing of Hypothesis:
Errors in Testing of Hypothesis:
When testing hypotheses, there's always a risk of making incorrect conclusions.
These mistakes fall into two main categories:
Concept Definition Explanation
Probability of committing a Type I error,
Type I Error (α) Rejecting the null hypothesis when it is actually true commonly set at 0.05 (5% chance of
wrongly rejecting H₀)
Explanation
Probability of correctly rejecting the null hypothesis when Increases as the chances of a Type II
Power (1 - β)
the alternative is true error decrease
Examples in Business:
Explaining error risks to stakeholders sets the right expectations and builds
Stakeholder Communication
trust when presenting findings.
Conclusion:
Directionality:
One Tailed: Assumes the effect is only in a specific direction (either positive or negative).
Two Tailed: Does not assume a specific direction; the effect can be either positive or negative.
• Critical Region:
One Tailed: The critical region is entirely in one tail, either the right or the left.
Two Tailed: α is divided between two tails (commonly α/2 in each tail if α = 0.05).
Example:
A company has developed a new drug they believe increases recovery speed from the flu.
They want to test if the new drug's mean recovery time differs from the old drug's mean
recovery time of 5 days.
Two Tailed Test Hypothesis:
• 0H0: μ = 5 (The mean recovery time is equal to 5 days)
• Ha: μ ≠ 5 (The mean recovery time is not equal to 5 days)
One Tailed Test Hypothesis (if they believe the new drug is faster):
• H0: μ ≥ 5 (The mean recovery time is 5 days or more)
• Ha: μ < 5 (The mean recovery time is less than 5 days)
Procedure for Testing of
Hypothesis
Hypothesis Testing
The Hypothesis Testing process involves making an initial assumption, collecting
evidence, and then statistically determining if the evidence supports or contradicts
the initial assumption or hypothesis.
Importance of Hypothesis Testing:
3. Test Statistic Choose a statistical test One-sample t-test (assuming normal distribution)
6. Decision Compare computed t-value to critical t-value Reject H₀ if computed t-value > critical t-value
Generally more powerful and yield more accurate results when their assumptions
are met.
ANOVA (Analysis of Variance) Compare means of three Assesses the impact of one or more factors by
or more groups comparing means at different levels of the factor(s)
Wilcoxon Signed-Rank Compare two related samples or Non-parametric version of paired sample t-
Test repeated measures test
Kruskal-Wallis H Test Compare distributions of more than Non-parametric alternative to ANOVA
two independent samples
Spearman's Rank Measure strength and direction of Non-parametric version of Pearson's
Correlation Coefficient association between two ranked correlation coefficient
(Spearman's rho) variables
Chi-Squared Test of Test association between two Determines if frequencies observed align
Independence categorical variables with frequencies expected by chance
Example:
Test effect of new New product has a Two-tailed test Either positive or
product on sales significant effect on sales negative effect
Test effectiveness of new New training program One-tailed test Negative effect
training program reduces employee
turnover
Choice between Parametric and Non-Parametric Tests is
influenced by the nature of the data in Business Research.