0% found this document useful (0 votes)

35 views66 pages

Topic 6 Understanding Causality and Regression (Updated)

Uploaded by

k60.2111213014

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views66 pages

Topic 6 Understanding Causality and Regression (Updated)

Uploaded by

k60.2111213014

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

Topic 6 Linear Regression

Vincent Hoang (2022), Lecture 9, 10

Camn et al (2016), Chapter 7
Outline
• Understanding causality
• Linear regression
Recall Topic 4: measures of association
• Correlation: two variables are said to have a strong statistical
relationship with one another if they appear to move together.
◦ Positive or negative relationships (direction of the relationship)
◦ Strong, moderate or weak relationships (strength of the relationship)
• If cor(x,y) is positive or negative (regardless the strength of the
correlation), can we conclude …
◦ x causes y? or
◦ y causes x? or
◦ something else causes both? or
◦ anything else?
Dependence or correlation?
• Dependence:
◦ Variables are dependent on each other if the value of one variable gives
information about the distribution of the other.
◦ What are key statistics of a distribution? For example normal distribution?

• Is that statistical correlation always meaningful, especially for

prediction purposes? (i.e. predictive analytics)
• Remember that “correlation does not imply causation”
Causality
• Causality describes a relationship between two
(or more) things (phenomena, events, variables,
etc.) in which a change in one causes a change
in another.
• In this diagram, A causes B under certain
conditions. suy luan
◦ So, if we observe an effect, necessarily we can infer
there is a cause prior to the effect.
◦ If there is cause, not necessarily the effect will come
about. effect khong nhat thiet phai co

◦ But if a cause and all other certain conditions are

complete, it is very likely that the cause will produce its
effect(s).
Causal thinking & business decision making
• Two related scenarios
1 Situational assessment
◦ Consider any business situation (i.e. business problem that needs to be
solved)
◦ We would like to assess that situation, then we often ask “how did that
happen?”
◦ Often used in Root Cause Analysis
2 Interventions
Advanced analytics & root cause analysis
• The machine learning model can be trained to
◦ analyse the equipment’s data output under regular “healthy” operating
conditions,
◦ detect “anomalies” (i.e. any pattern of deviation from “healthy” conditions),
◦ to predict the “behavioural” pattern of the anomaly,
◦ if the predicted values exceed the “normal” threshold, an alert is sent.
neu cac gia tri du doan vuot qua bth, mot canh bao se duoc gui

• Applications: early detection of safety issues, machine failures, more

efficient electrical consumption, predicting quality deviation,
adjusting process to prevent material waste, etc.
Source: https://medium.datadriveninvestor.com/root-cause-analysis-in-the-age-of-industry-4-0-9516af5fb1d0
Causality & interventions
• Important business decisions involve the use of limited (scarce) resources.
• The trade-off in the form of a resource-allocation decision:
◦ Should resources (time, equipment, land, …) be devoted to project A or project B?

• We can loosen a constraint but that typically requires other scarce resources.
• A decision on which objects to control or change (i.e. managerial intervention)
typically precede any decision on how to control or change them.
• Understanding causality is crucial to making effective interventions.
Intervention =
Running an
advertising
campaign

Objective is to
increase this
Causal modelling
• Consider a decision on the purchase of a
new equipment: An  means a causal relationship
◦ Quality has two levels: high or low + suggests a positive relationship

◦ High quality equipment can perform more tasks,

hence increase production productivity but the
parts are more expensive.
◦ Maintenance cost: greater the quality of the
equipment, more expensive are the parts, hence
higher maintenance cost.
Foundations for causal graphs
do thi nhan qua la do thi chu ky co huong

• Causal graphs are directed acyclic

graphs (DAGs). They have
◦ A set of vertices (or nodes) representing
variables in the model
◦ A set of edges (or links) presenting the
connections between variables.
◦ Directed path between two nodes: arrow
shows a direction from a cause to its effect.
◦ There is no circle in DAGs.
Feedback loops & time dimensions
• Consider a relationship between joy and
physical exercise.
◦ Is there any causal relationship between them?
◦ If yes, which variable is cause and which is
effect?

• We can convert circles into directed

acyclic graphs in which we have a time
dimension.
◦ At period 0: joy is a cause leading to more
exercise
◦ At period 1: feedback from exercise (period 0) to
joy (period 1)
Structures in casual graphs
• There are three building block

Chain: one variable (X)

causes another (Y) which
causes another (Z) One variable (X) causes two Two variables (X,Y)cause a
other variables (Y & Z). third (Z).

X is a common cause for both Y Z is a common effect of both

and Z. X&Y
Chain
• Example: X learning efforts, Y employability, Z chance of getting a
job.
◦ Y depends on X for its value (hence X and Y are dependent)
◦ Z depends on Y for its value (hence Y and Z are dependent)
◦ Z depends on Y which depends on X,
◦ hence X and Z are also dependent: dependence of X and Z is due to Y being able to change.
◦ What if we hold Y constant (fixed): then changes in X are no linked to changes in Z.
Therefore statistically we say that X and Z are conditionally independent given Y.
Fork
• Example: X is temperature, Y sales of ice cream and Z sales of fan.
◦ Y depends on X for its value (X and Y are dependent)
◦ Z depends on X for its value (X and Z are dependent)
◦ We can still say that (statistically) Y and Z are (statistically) dependent because
changes in Y reflect changes in X which lead to changes in Z.
◦ If you calculate correlation values, what would you expect?
◦ Again correlation does not imply causation.
◦ Easily to see that if holding X fixed, changes in Y no longer linked to changes
in Z.
Collider
• X is competence (at work), Y is Networking , Z: Promotion (at work)
• Both X and Y are causes of Z
• X and Z (similarly Y and Z) are dependent
• X and Y are independent: they neither cause the other nor have a
common cause.
◦ However statistically we can see that if we hold Z fixed, if X change then Y
must also change in a certain way. Why?
◦ Hence we say X and Y are conditionally dependent given Z.
Observed associations
• We can observe associations between two variables in the data.
• However, these associations have two mechanisms
◦ Causal associations
◦ Non-causal associations

• So again, correlation (association) does not imply causation.

Draw assumptions before
making conclusions!
• Consider 3 variables, how
many possible causal
models?
• Statistical associations does
not imply causation.
• Hence, it is better to use
knowledge to draw
assumptions (causal graphs)
prior to making conclusions
regarding causality.
Causal modelling for market volume
• Suppose you are asked to make an assessment of the size of the
market for laptop computers.
• The following variables are relevant:
◦ Price: average price per unit
◦ Advertising: the amount of money spent on advertising products
◦ Number of Customers visiting the shop
◦ Media Hype: whether independent media sources report on or display related
products
◦ Market Volume: the total amount of goods sold for your product category
Price & Volume
• The causal relationship between Price & Volume?

• How about the Number of Customers visiting the shop and Volume?

• Any relationship between Price and Number of Customers?

Advertising & Volume
• Do you expect that higher advertising expenditure will lead to higher sales
(market volume)?

• But how about the impact of advertising and number of customers on

sales?

• Also how about the effects of advertising and media hype on sales?
Causal model for assessing market value
• Now we put all elements
together, this is our causal
model for situational
assessment.
• Note that there is no
(business) goal / objective
in terms of optimisation or
decision making.
• Rather it assesses how
causal factors affect the
market volume.
Causal modelling for Interventions
• Example 2: instead of doing a situational assessment, you are now
asked to decide how much to spend on advertising for these
products.
◦ You need to set an objective, e.g. high market share (the proportion of sales
through your retailers to the total number sold).
◦ So the decision variable is “Advertise”.
◦ Simplify intervention decision: (1) run an advertising or (2) not doing that.
◦ Further simplify that you will know the price at the time you set “Advertise”.
Influence diagram
• Often, rectangle shape refers to strategic
option (i.e. decision point, choice variable,
value directly controlled by a strategic
agent – decision making agent)
• Hexagon shape refers to objective (e.g.
profit, value, market share, etc.). Decision
are made to optimise the objective.
• Circle shape refers to probabilistic
variables that are chance variables,
uncertain quantities, environmental factors
and other elements outside the direct
control of strategic agents.
[+] More
Advertising leads [+] High Sales
to a greater lead to greater
certainty of a certainty of a high
larger number of Market Share
unit sales

This is only an
informational link: [-] [-] Higher Price
you know price leads to a greater
when deciding on certainty of a
Advertise smaller number of
unit sales
From causal diagrams to mathematical
equations
• A simplest form of empirical model would be
using regression model as below

The primes indicate

This is called
coefficients with respect to
dependent variable This is called
independent variable x
error term
• This equation fails to capture the actual
relationship among independent variables (x1
 x5)
Shortcomings
• Consider X1, X2 and X4:
associations among these
variables are clear, hence we
call that this model suffers from
multicollinearity problem.
• Also, we cannot use standard
significance tests to reliably
determine which independent
variables exert the most
influence.
A solution (not discussed further in this unit)
• Possible to use structural equation model (SEM)
(stepwise regression) via a two-stage regression.

• Stage 1

• Stage 2: using estimated value of the independent

variable obtained from stage 1 regression.
Summaries
• Causal relationships are crucial for (1) situational assessments and (2)
interventions, as part of business analytics.
• If there is a cause-and-effect relationship between two variables x and y, there is
statistical association.
• But (statistical) correlation/association does not necessarily imply causation.
• Casual thinking and graphs are very useful because
◦ They capture both causality and statistical association
◦ They assist with both situational assessment and intervention tasks in business analytics
◦ From managerial perspective, they allow identification of relevant stakeholders (agents, people,
departments, etc.) related in analytics projects as well as resources allocation.
Analytics & Happiness
• What values do business analytics deliver?
◦ Happiness/satisfaction matters every corner of our lives: overall life, work,
school, business, etc.
◦ Overall aims are to increase satisfaction.
◦ Situational analysis informs interventions: how?

• Our use of happiness case study is to illustrate regression analysis.

Your satisfaction (happiness) matters!
• Discuss the following questions from your own experience and
knowledge
◦ What makes you happy = what are the causes of your own happiness?
◦ What makes you sad = what are the causes of your own sadness?

• Draw a casual graph (with directed paths)

Happiness
and Income

Source: World Happiness Report 2024

Page 22
Life Satisfaction & Income across Countries in 2023
14.000 Let’s plot
12.000 y = 0.8263x + 4.8509 the data
R² = 0.6127

10.000
Life Ladder

8.000

6.000

4.000 We can
2.000 add a
trendline in
0.000
Excel
0.000 1.000 2.000 3.000 4.000 5.000 6.000 7.000 8.000 9.000
Log GDP per capita

Log GDP per capita Linear (Log GDP per capita)

A linear relationship:
Happiness score is dependent variable
Log GDP per capita is the independent variable
Excel Trendline Tool
• Right click on data series
and choose Add trendline
frpop-up menu
 Check the boxes Display
Equatiom on on chart and
Display R-squared value
on chart
Simple linear regression using least-square
• Simple linear regression model
Y   0  1 X   8.1
• We estimate the parameters (ßs) from the sample data
Yˆ  b0  b1 X 8.2 
• Once estimated, we can
◦ Assess/explain if X is an important factor explaining Y,
◦ “predict” the value of Y given a specific value of X
◦ Yˆ  b  b X
i 0 1 i
Least square regression
• Residuals are the observed errors
associated with estimating the value of the ei  Yi  Yˆi 8.3
dependent variable using the regression line.
• The best-fitting line minimizes the sum of
squares of the residuals.
Simple Linear Regression with Excel
• Using Analysis Toolpak:
◦ Data > DataAnalysis > Regression
Results: Regression Statistics (metrics)
• Multiple R:
◦ sample correlation coefficient
◦ varies from 0 to 1

• R Square:
◦ coefficient of determinant
◦ varies from 0 (no fit) to 1 (perfect fit)

• Adjusted R Square:
◦ Adjusted R square for sample size
and number of X variables.

• Standard error: variability

between observed and predicted
Y values
Results: Regression Statistics (metrics)
• df Degrees of freedom

• SS: sum of square

• MS: mean squared errors

Interpreting Regression Statistics
• R square = 0.613 means
that 61.3% of variation in the
happiness level are
explained by the model, in
this case by the log values
of income per capita.
• The remaining 38.7% (100%
- 61.3%) is UNEXPLAINED.
• Adjusted R-square is often
used.
F-test (Analysis of Variance)
• ANOVA conducts an F - test to determine whether variation in Y is
due to varying levels of X.
• ANOVA is used to test for significance of regression:
◦ H0: population slope coefficient = 0
◦ H1: population slope coefficient ≠ 0

• Excel reports the p-value (Significance F).

• Rejecting H0 indicates that X explains variation in Y.
Interpreting Coefficients
• Intercept: often not important
• Log GDP per capita: 3
Y   0  1 X   8.1
elements
◦ Direction of the relationship: positive
value
◦ The magnitude of the relationship:
0.742, meaning that for each one-point
increase in the Log GDP per capita, the
happiness level increase by 0.742.
◦ Statistical strength of the relationship:
Interpreting Coefficients
• Log GDP per capita:
◦ Statistical strength of the relationship
Y   0  1 X   8.1 can be assessed using the hypothesis
testing
Testing Hypotheses for Regression Coefficients
• We would like to test if the coefficient (log(GDP)) is statistically
significant from zero.
• If Coefficient (ᵦ1) = 0, what does this mean?
• If Coefficient (ᵦ1) ≠ 0, what does this mean? (you should consider
one tailed tests)
b1  0
◦ Test statistics: t  8.8 
◦ P-value approach standard error
Interpreting Coefficients
• Log GDP per capita:
◦ we can use p-value to assess two-
Y   0  1 X   8.1 tailed test.
◦ H0: β1 = 0 vs H0: β1 ≠ 0
◦ In this example, p-value is nearly zero, <
5% hence we can conclude that there is
sufficient evidence to conclude that the
true β1 is not zero. This means that there
exists a relationship between happiness
level and log of GDP per capita.
◦ We can also conduct a one tailed test.
Confidence Intervals for Regression Coefficient
• Confidence intervals (Lower
95% and Upper 95% values in
the output) provide information
about the unknown values of
the true regression
coefficients, accounting for
sampling error.
• For this example, a 95%
confidence interval for the
income variable is
[0.638;0.845].
Prediction
• Intercept: often not important

• If you know the value of Log GDP per

Y   0  1 X   8.1 capita (e.g. 7), you can predict the value
of Happiness level.

• Predicted HL for Vietnam = -1.411 +

0.742*9.392 = 5.558
Confidence Intervals & Prediction
• Although we predicted for Vietnam
-1.411 + 0.742*9.392 = 5.558
• if the true population parameters are at
the extremes of the confidence intervals,
the estimate might be as low as
-1.411 + 0.638*9.392 = 4.581
or as high as
-1.411 + 0.845*9.392 = 6.525
Residual analysis
• Residual = Actual Y value − Predicted Y value

residual
Standard residual 
standard deviation

• Outliers: standard residuals outside ± 2 or ± 3 are potential outliers

Residual Outputs – Residual Plot

Log GDP per capita Residual Plot

2.5

1.5

0.5
Residuals

0
0.000 2.000 4.000 6.000 8.000 10.000 12.000 14.000
-0.5

-1

-1.5

-2

-2.5

-3
Log GDP per capita
What are drawbacks of simple linear
regression models?
• Consider the case of happiness:
Multiple linear regression
• Consider the case study of happiness (at national level)
• What are possible “causes” and/or “factors” that can explain
variations in the Happiness level across countries?
Multiple linear regression
• A linear regression model with more than one independent variable is
called a multiple linear regression model.
Y   0  1 X 1   2 X 2     k X k   8.10 
◦ Y is the dependent variable, Xi are the independent (explanatory) variables;
◦ βi are the regression coefficients for the independent variables, ε the error term.

• We estimated the particle regression coefficients bi

Yˆ  b0  b1 X 1  b2 X 2    bk X k 8.11
We are ignoring here possible causal and
statistical relationships among
A causal graph independent variables

Wealth +
Happiness
Freedom
+
+
Social - + Others
support Perception of +
corruption Health
Generosity
“If you were in trouble, do you have
relatives or friends you can count on to The time series of healthy life
help you whenever you need them, or expectancy at birth
not?”
8

8
Happiness (Cantril Ladder)
6

6
Happiness
4

4
2

2
“Are you satisfied or dissatisfied with The national average of GWP responses
your freedom to choose what you do to the question, “Have you donated
money to a charity in the past month?”
with your life?”
on GDP per capita
8

8
6

6
Happiness

Happiness
4

4
2

2
• The average of binary answers to two GWP
8

questions:

• “Is corruption widespread throughout the

government or not?” and

Happiness

• “Is corruption widespread within businesses

or not?”
4

• Where data for government corruption are

missing, the perception of business
2

corruption is used as the overall corruption-

perception measure.
Multiple regression
•
Larger adjusted
R-squared

F-test with p-
value of 0

Every X variables
have t-test (two
tailed) reported
in this columns
(p-value)
ANOVA for Multiple Regression
• ANOVA tests for significance of the entire model. That is, it computes
an F-statistic testing the hypotheses:
H 0 : 1   2     k  0
H1 : at least one  j is not 0

P-value = 0.000
Reject H0
Interpreting the coefficients
• 𝛼_2=0.175 p-value=0.074
If wealth (log of GDP per capita) increases by 1 unit, holding all the other independent variables
constant, the value of happiness will increase by 0.175, significant at level of 10%
Or a 1% of GDP per capita will increase happiness score by 0.175/100 (0.00175), significant at
level of 10%

• 𝛼_3=3.55 p-value=0.000
If social support increases by 1 unit, holding all the other independent variables constant, the value
of happiness will increase by 3.55 , significant at level of 1%
Should I include a new Xi variable?
• Some argue that a good regression model should include only
significant independent variables.
◦ But not always clear exactly what will happen when we add or remove variables
from a model: variables that are (or are not) significant in one model may (or may
not) be significant in another.
◦ Should not consider dropping all insignificant variables at one time,
◦ Should take a more structured approach.
Should I include a new Xi variable?
• Using adjusted R-square
◦ Adding an independent variable to a regression model often increase the value of
R-square
◦ Adjusted R-square reflects both the number of Xi variables and sample size.
◦ Adjusted R-square may either increase or decrease when an Xi variable is added
or dropped.
◦ An increase in adjusted R-square indicates the model has improved.
◦ But some prefer models what are simpler (i.e. having less Xi variables) when only
minor differences in the adjusted R-square scores.
Systematic Model Building Approach
1. Consider causal graphs
2. Descriptive analysis & checking out for outliers in both Y and X variables
3. Correlation matrix of all available variables
4. Construct a model with all available independent variables and examine the value of
coefficients and p-values for each coefficient.
5. If p-values > 10%, consider to remove and run step 4 again. You should check adjusted R-
square again.
6. Once majority (or all) x variables are statistically significant and the signs of coefficients are
consistent with expectations, then you are closer to a good model.
7. Check all assumptions (next week learning)

01 - Introduction To Causality - Causal Inference For The Brave and True
No ratings yet
01 - Introduction To Causality - Causal Inference For The Brave and True
11 pages
Aleix Ruiz de Villa Robert - Causal Inference For Data Science (MEAP V04) - Manning (2023)
No ratings yet
Aleix Ruiz de Villa Robert - Causal Inference For Data Science (MEAP V04) - Manning (2023)
217 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Team Energy Vs Cir
No ratings yet
Team Energy Vs Cir
1 page
Blood Bank Harmening Chapter 10
100% (3)
Blood Bank Harmening Chapter 10
14 pages
Topic 6 Understanding Causality and Regression (Updated)
No ratings yet
Topic 6 Understanding Causality and Regression (Updated)
66 pages
Topic 6 Understanding Causality and Regression
No ratings yet
Topic 6 Understanding Causality and Regression
66 pages
SUMSEM2022-23 ITA6014 TH VL2022230701044 2023-08-08 Reference-Material-I
No ratings yet
SUMSEM2022-23 ITA6014 TH VL2022230701044 2023-08-08 Reference-Material-I
31 pages
FALLSEM2023-24 SWE2020 ETH VL2023240103291 2023-11-22 Reference-Material-II
No ratings yet
FALLSEM2023-24 SWE2020 ETH VL2023240103291 2023-11-22 Reference-Material-II
26 pages
2020 - Introduction - To - Causal - Inference - From ML Perspective
100% (1)
2020 - Introduction - To - Causal - Inference - From ML Perspective
133 pages
Data Visualisation
No ratings yet
Data Visualisation
70 pages
What Is Econometrics
No ratings yet
What Is Econometrics
34 pages
Econometrics
No ratings yet
Econometrics
147 pages
BA1 Chapter 10
No ratings yet
BA1 Chapter 10
11 pages
Correlation
No ratings yet
Correlation
7 pages
Causality
No ratings yet
Causality
22 pages
Introduction To Causal Inference-Aug25 2020-Neal
No ratings yet
Introduction To Causal Inference-Aug25 2020-Neal
61 pages
Peter Spirtes 2010
No ratings yet
Peter Spirtes 2010
20 pages
Causal AI For LQG - Ben Steiner - Apr 2023
No ratings yet
Causal AI For LQG - Ben Steiner - Apr 2023
25 pages
HR Analytics Differences
No ratings yet
HR Analytics Differences
9 pages
09 Causal Inference II: MSBA7003 Quantitative Analysis Methods
No ratings yet
09 Causal Inference II: MSBA7003 Quantitative Analysis Methods
34 pages
Regression C
No ratings yet
Regression C
48 pages
CORRELATION
No ratings yet
CORRELATION
15 pages
Causal AI Final
No ratings yet
Causal AI Final
71 pages
Unit 1: Introduction To Econometric Analysis OUTLINE: A. Economic Questions and Data: The Role of Econometrics
No ratings yet
Unit 1: Introduction To Econometric Analysis OUTLINE: A. Economic Questions and Data: The Role of Econometrics
23 pages
Six Sigma Overview - SSGB6
No ratings yet
Six Sigma Overview - SSGB6
19 pages
Inference and Intervention Causal Models For Business Analysis
No ratings yet
Inference and Intervention Causal Models For Business Analysis
349 pages
Measures of Association
No ratings yet
Measures of Association
14 pages
Causal Inference in Data Science Untangling Cause and Effect
No ratings yet
Causal Inference in Data Science Untangling Cause and Effect
5 pages
ECN224 Topic1
No ratings yet
ECN224 Topic1
20 pages
Corelation Vs Causation
100% (1)
Corelation Vs Causation
4 pages
21.1 Causality
No ratings yet
21.1 Causality
56 pages
Causal Discovery
No ratings yet
Causal Discovery
16 pages
An Introduction To Causal Inference
No ratings yet
An Introduction To Causal Inference
67 pages
Marketting Analysis
No ratings yet
Marketting Analysis
90 pages
Forecasting Errors
No ratings yet
Forecasting Errors
59 pages
Articulo en Ingles Mejor Informacion SIMPLE y ESPECIFICA
No ratings yet
Articulo en Ingles Mejor Informacion SIMPLE y ESPECIFICA
23 pages
Cause and Effect
No ratings yet
Cause and Effect
2 pages
Research Topic Selection
No ratings yet
Research Topic Selection
43 pages
Causal Inference in Statistics: An Overview
100% (1)
Causal Inference in Statistics: An Overview
51 pages
Lecture 1&2
No ratings yet
Lecture 1&2
46 pages
Introduction To Causal Modeling, Bayesian Theory and Major Bayesian Modeling Tools For The Intelligence Analyst
100% (1)
Introduction To Causal Modeling, Bayesian Theory and Major Bayesian Modeling Tools For The Intelligence Analyst
31 pages
04 - Graphical Causal Models - Causal Inference For The Brave and True
No ratings yet
04 - Graphical Causal Models - Causal Inference For The Brave and True
13 pages
MIS BA 20232024 Notes Chapter02
No ratings yet
MIS BA 20232024 Notes Chapter02
8 pages
Casual Inference Important
No ratings yet
Casual Inference Important
12 pages
BA - II Session
No ratings yet
BA - II Session
62 pages
Correlation Analysis
No ratings yet
Correlation Analysis
10 pages
Evans Analytics2e PPT 01
No ratings yet
Evans Analytics2e PPT 01
48 pages
Causal Inference Extended Tutorial
No ratings yet
Causal Inference Extended Tutorial
189 pages
Causal Inferences
No ratings yet
Causal Inferences
10 pages
STATISTICS Grand Viva
No ratings yet
STATISTICS Grand Viva
28 pages
Unit 12
No ratings yet
Unit 12
11 pages
Causal Inference in Statistics: An Overview
100% (2)
Causal Inference in Statistics: An Overview
51 pages
Important Notes in Correlation
No ratings yet
Important Notes in Correlation
9 pages
Prof. Dr. Moustapha Ibrahim Salem Mansourms@alexu - Edu.eg 01005857099
No ratings yet
Prof. Dr. Moustapha Ibrahim Salem Mansourms@alexu - Edu.eg 01005857099
34 pages
AAAI-2023 教程用于因果推断的机器学习
No ratings yet
AAAI-2023 教程用于因果推断的机器学习
145 pages
5 Tricks When AB Testing Is Off The Table by Emily Glassberg Sands Teconomics Medium
No ratings yet
5 Tricks When AB Testing Is Off The Table by Emily Glassberg Sands Teconomics Medium
2 pages
Heizer Om10 ch04
No ratings yet
Heizer Om10 ch04
34 pages
Lec 13
No ratings yet
Lec 13
35 pages
WK 6 Scatterdiagram and Correlation Excel
No ratings yet
WK 6 Scatterdiagram and Correlation Excel
12 pages
Stats Ch.13 Linear Regression
No ratings yet
Stats Ch.13 Linear Regression
42 pages
Unit - 1
No ratings yet
Unit - 1
32 pages
Topic 7 Regression (Cont.)
No ratings yet
Topic 7 Regression (Cont.)
47 pages
Group 3 Tyre Report - Question 4
No ratings yet
Group 3 Tyre Report - Question 4
21 pages
Lean Manufacturing and Operational Efficiency of Nestle Nigeria Plc. Using Data Envelopment Analysis (DEA)
No ratings yet
Lean Manufacturing and Operational Efficiency of Nestle Nigeria Plc. Using Data Envelopment Analysis (DEA)
29 pages
Case Iima 2020 000018
No ratings yet
Case Iima 2020 000018
26 pages
2023 Palarong Pampaaralan Dance Sports Guidelines
No ratings yet
2023 Palarong Pampaaralan Dance Sports Guidelines
4 pages
Notes - The New Deal - Text PP
No ratings yet
Notes - The New Deal - Text PP
2 pages
SAVEMA2018 Ing - High Quality
No ratings yet
SAVEMA2018 Ing - High Quality
12 pages
Prisoner Diving Gear
No ratings yet
Prisoner Diving Gear
2 pages
Review Till Priliminary
No ratings yet
Review Till Priliminary
56 pages
Autumn and Summer in A Frozen Fire
No ratings yet
Autumn and Summer in A Frozen Fire
2 pages
2016 Gulfstream G650ER Brochure
No ratings yet
2016 Gulfstream G650ER Brochure
37 pages
DSM V Adhd
No ratings yet
DSM V Adhd
1 page
Review of Literature: The Indian Contract Act, 1872
No ratings yet
Review of Literature: The Indian Contract Act, 1872
3 pages
Wa 971053
No ratings yet
Wa 971053
2 pages
Amended PRA JVGuidelines
No ratings yet
Amended PRA JVGuidelines
28 pages
Letter To Governor
No ratings yet
Letter To Governor
4 pages
5月雅思聽力預測串講
No ratings yet
5月雅思聽力預測串講
5 pages
All Those Rules Cimt June 2020
No ratings yet
All Those Rules Cimt June 2020
27 pages
BANK OF TANZANIA - Circular
No ratings yet
BANK OF TANZANIA - Circular
1 page
COVID BoE Amended Complaint and PI
No ratings yet
COVID BoE Amended Complaint and PI
564 pages
Green Economy Presentation
No ratings yet
Green Economy Presentation
17 pages
412 MM CH32
No ratings yet
412 MM CH32
78 pages
TRB 29 Landslide and Engineering Practice PDF
100% (2)
TRB 29 Landslide and Engineering Practice PDF
255 pages
in Philosophy
No ratings yet
in Philosophy
12 pages
Lowara SV Series
No ratings yet
Lowara SV Series
68 pages
CHEMISTRY Grade 9 Retake
No ratings yet
CHEMISTRY Grade 9 Retake
8 pages
SET 2 BROADCST TV and RADIO Signals SET 2
No ratings yet
SET 2 BROADCST TV and RADIO Signals SET 2
3 pages
Board 4-CHN
100% (23)
Board 4-CHN
30 pages
d04634 41 Value Sheet Nortrol Mu
100% (1)
d04634 41 Value Sheet Nortrol Mu
8 pages
Chapter 1
No ratings yet
Chapter 1
35 pages
Activins in Adipogenesis and Obesity: Review
No ratings yet
Activins in Adipogenesis and Obesity: Review
4 pages
Purchase Order
No ratings yet
Purchase Order
1 page

Topic 6 Understanding Causality and Regression (Updated)

Uploaded by

Topic 6 Understanding Causality and Regression (Updated)

Uploaded by

Topic 6 Linear Regression

Vincent Hoang (2022), Lecture 9, 10

• Is that statistical correlation always meaningful, especially for

◦ But if a cause and all other certain conditions are

• Applications: early detection of safety issues, machine failures, more

◦ High quality equipment can perform more tasks,

• Causal graphs are directed acyclic

• We can convert circles into directed

Chain: one variable (X)

X is a common cause for both Y Z is a common effect of both

• So again, correlation (association) does not imply causation.

• Any relationship between Price and Number of Customers?

• But how about the impact of advertising and number of customers on

The primes indicate

• Stage 2: using estimated value of the independent

• Our use of happiness case study is to illustrate regression analysis.

• Draw a casual graph (with directed paths)

Source: World Happiness Report 2024

Log GDP per capita Linear (Log GDP per capita)

• Standard error: variability

• SS: sum of square

• MS: mean squared errors

• Excel reports the p-value (Significance F).

• If you know the value of Log GDP per

• Predicted HL for Vietnam = -1.411 +

• Outliers: standard residuals outside ± 2 or ± 3 are potential outliers

Log GDP per capita Residual Plot

• We estimated the particle regression coefficients bi

• “Is corruption widespread throughout the

government or not?” and

• “Is corruption widespread within businesses

• Where data for government corruption are

corruption is used as the overall corruption-

You might also like