Topic 6 Understanding Causality and Regression
Topic 6 Understanding Causality and Regression
• We can loosen a constraint but that typically requires other scarce resources.
• A decision on which objects to control or change (i.e. managerial
intervention) typically precede any decision on how to control or change
them.
• Understanding causality is crucial to making effective interventions.
Intervention =
Running an
advertising
campaign
Objective is to
increase this
Causal modelling
• Consider a decision on the purchase of a
new equipment: An means a causal
◦ Quality has two levels: high or low relationship
+ suggests a positive relationship
◦ High quality equipment can perform more tasks,
hence increase production productivity but the
parts are more expensive.
◦ Maintenance cost: greater the quality of the
equipment, more expensive are the parts, hence
higher maintenance cost.
Foundations for causal graphs
• Causal graphs are directed acyclic
graphs (DAGs). They have
◦ A set of vertices (or nodes) representing
variables in the model
◦ A set of edges (or links) presenting the
connections between variables.
◦ Directed path between two nodes: arrow
shows a direction from a cause to its effect.
◦ There is no circle in DAGs.
Feedback loops & time dimensions
• Consider a relationship between joy and
physical exercise.
◦ Is there any causal relationship between them?
◦ If yes, which variable is cause and which is
effect?
• How about the Number of Customers visiting the shop and Volume?
• But how about the impact of advertising and number of customers on sales?
• Also how about the effects of advertising and media hype on sales?
Causal model for assessing market value
• Now we put all elements
together, this is our causal
model for situational
assessment.
• Note that there is no (business)
goal / objective in terms of
optimisation or decision
making.
• Rather it assesses how causal
factors affect the market value.
Causal modelling for Interventions
• Example 2: instead of doing a situational assessment, you are now
asked to decide how much to spend on advertising for these
products.
◦ You need to set an objective, e.g. high market share (the proportion of sales
through your retailers to the total number sold).
◦ So the decision variable is “Advertise”.
◦ Simplify intervention decision: (1) run an advertising or (2) not doing that.
◦ Further simplify that you will know the price at the time you set “Advertise”.
Influence diagram
• Often, rectangle shape refers to strategic
option (i.e. decision point, choice variable,
value directly controlled by a strategic agent –
decision making agent)
• Hexagon shape refers to objective (e.g.
profit, value, market share, etc.). Decision are
made to optimise the objective.
• Circle shape refers to probabilistic variables
that are chance variables, uncertain
quantities, environmental factors and other
elements outside the direct control of strategic
agents.
[+] More
Advertising leads [+] High Sales
to a greater lead to greater
certainty of a certainty of a
larger number of high Market
unit sales Share
This is only an
informational [-] [-] Higher Price
link: you know leads to a greater
price when certainty of a
deciding on smaller number
Advertise of unit sales
Influence diagram
• Often, rectangle shape refers to strategic
option (i.e. decision point, choice variable,
value directly controlled by a strategic agent –
decision making agent)
• Hexagon shape refers to objective (e.g.
profit, value, market share, etc.). Decision are
made to optimise the objective.
• Circle shape refers to probabilistic variables
that are chance variables, uncertain
quantities, environmental factors and other
elements outside the direct control of strategic
agents.
[+] More
Advertising leads
to a greater [+] High Sales
certainty of a lead to greater
larger number of certainty of a
unit sales high Market
Share
This is only an
informational [-] [-] Higher Price
link: you know leads to a greater
price when certainty of a
deciding on smaller number
Advertise of unit sales
From causal diagrams to mathematical
equations
• A simplest form of empirical model would be
using regression model as below
8.000
6.000
4.000
2.000
We can
add a
0.000
3.000 3.500 4.000 4.500 5.000 5.500 6.000 6.500 7.000 7.500 8.000 trendline
Log GDP per capita
in Excel
A linear relationship:
Happiness score is dependent variable
Log GDP per capita is the independent variable
Excel Trendline Tool
• Right click on data series
and choose Add trendline
frpop-up menu
Check the boxes Display
Equatiom on on chart and
Display R-squared value on
chart
Simple linear regression using least-square
• Simple linear regression model
Y 0 1 X 8.1
• We estimate the parameters (ßs) from the sample data
Yˆ b b X 0 1 8.2
• Once estimated, we can
◦ Assess/explain if X is an important factor explaining Y,
◦ “predict” the value of Y given a specific value of X
◦ Yˆi b0 b1 X i
Least square regression
• Residuals are the observed errors
associated with estimating the value of the ei Yi Yˆi 8.3
dependent variable using the regression line.
• The best-fitting line minimizes the sum of
squares of the residuals.
Simple Linear Regression with Excel
• Using Analysis Toolpak:
◦ Data > DataAnalysis > Regression
Results: Regression Statistics (metrics)
• Multiple R:
◦ sample correlation coefficient
◦ varies from -1 to + 1
◦ negative if slope is negative
• R Square:
◦ coefficient of determinant
◦ varies from 0 (no fit) to 1 (perfect fit)
• Adjusted R Square:
◦ Adjusted R square for sample size
and number of X variables.
1
Residuals
0
6.000 7.000 8.000 9.000 10.000 11.000 12.000
-1
-2
-3
Wealth +
Happines
Freedo + s
m
Social +
Perception - + Others
suppor +
t of
corruption Health
Generosit
y
Happiness (Cantril Ladder)
“If you were in trouble, do you have
relatives or friends you can count on to The time series of healthy life
help you whenever you need them, or expectancy at birth
Happiness
not?”
8
2
.2 .4 .6 .8 1 30 40 50 60 70 80
Social support Healthy life expectancy at birth
Happiness
Happiness
donated money to a charity in the past
you do with your life?”
2
8
2
month?” on GDP per capita
.2 .4 .6 .8 1 -.4 -.2 0 .2 .4 .6
Freedom to make life choices Generosity
Life Ladder Fitted values Life Ladder Fitted values
Happiness
8
2
• The average of binary answers to two GWP
questions:
• “Is corruption widespread throughout the
government or not?” and
• “Is corruption widespread within businesses
or not?”
• Where data for government corruption are
missing, the perception of business
0 .2 .4 .6 .8 1
Perceptions of corruption corruption is used as the overall corruption-
Life Ladder Fitted values perception measure.
Multiple regression
Larger adjusted
R-squared
F-test with p-
value of 0
Every X
variables have
t-test (two
tailed) reported
in this columns
(p-value)
ANOVA for Multiple Regression
• ANOVA tests for significance of the entire model. That is, it computes
an F-statistic testing the hypotheses:
H 0 : 1 2 k 0
H1 : at least one j is not 0
Interpreting the coefficients
• 𝛼_2=0.175 p-value=0.074
If wealth (log of GDP per capita) increases by 1 unit, holding all the other
independent variables constant, the value of happiness will increase by 0.175,
significant at level of 10%
• 𝛼_3=3.55 p-value=0.000
If social support increases by 1 unit, holding all the other independent variables
constant, the value of happiness will increase by 3.55 , significant at level of 1%
Should I include a new Xi variable?
• Some argue that a good regression model should include only
significant independent variables.
◦ But not always clear exactly what will happen when we add or remove variables
from a model: variables that are (or are not) significant in one model may (or may
not) be significant in another.
◦ Should not consider dropping all insignificant variables at one time,
◦ Should take a more structured approach.
Should I include a new Xi variable?
• Using adjusted R-square
◦ Adding an independent variable to a regression model often increase the value of
R-square
◦ Adjusted R-square reflects both the number of Xi variables and sample size.
◦ Adjusted R-square may either increase or decrease when an Xi variable is added
or dropped.
◦ An increase in adjusted R-square indicates the model has improved.
◦ But some prefer models what are simpler (i.e. having less Xi variables) when only
minor differences in the adjusted R-square scores.
Systematic Model Building Approach
1. Consider causal graphs
2. Descriptive analysis & checking out for outliers in both Y and X variables
3. Correlation matrix of all available variables
4. Construct a model with all available independent variables and examine the value of
coefficients and p-values for each coefficient.
5. If p-values > 10%, consider to remove and run step 4 again. You should check adjusted R-
square again.
6. Once majority (or all) x variables are statistically significant and the signs of coefficients are
consistent with expectations, then you are closer to a good model.
7. Check all assumptions (next week learning)