0% found this document useful (0 votes)

8 views

Topic 6 Understanding Causality and Regression

Uploaded by

racieanhdao5203

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Topic 6 Understanding Causality and Regression

Uploaded by

racieanhdao5203

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 66

Topic 6 Linear Regression

Vincent Hoang (2022), Lecture 9, 10

Camn et al (2016), Chapter 7
Recall Topic 4: measures of association
• Correlation: two variables are said to have a strong statistical relationship
with one another if they appear to move together.
◦ Positive or negative relationships (direction of the relationship)
◦ Strong, moderate or weak relationships (strength of the relationship)

• If cor(x,y) is positive or negative (regardless the strength of the correlation),

can we conclude …
◦ x causes y? or
◦ y causes x? or
◦ something else causes both? or
◦ anything else?
Dependence or correlation?
• Dependence:
◦ Variables are dependent on each other if the value of one variable gives
information about the distribution of the other.
◦ What are key statistics of a distribution? For example normal distribution?

• Is that statistical correlation always meaningful, especially for

prediction purposes? (i.e. predictive analytics)
• Remember that “correlation does not imply causation”
Causality
• Causality describes a relationship between two (or
more) things (phenomena, events, variables, etc.) in
which a change in one causes a change in another.
• In this diagram, A causes B under certain
conditions.
◦ So, if we observe an effect, necessarily we can infer there
is a cause prior to the effect.
◦ If there is cause, not necessarily the effect will come about.
◦ But if a cause and all other certain conditions are complete,
it is very likely that the cause will produce its effect(s).
Causal thinking & business decision making
• Two related scenarios
1 Situational assessment
◦ Consider any business situation (i.e. business problem that needs to be
solved)
◦ We would like to assess that situation, then we often ask “how did that
happen?”
◦ Often used in Root Cause Analysis
2 Interventions
Advanced analytics & root cause analysis
• The machine learning model can be trained to
◦ analyse the equipment’s data output under regular “healthy” operating
conditions,
◦ detect “anomalies” (i.e. any pattern of deviation from “healthy” conditions),
◦ to predict the “behavioural” pattern of the anomaly,
◦ if the predicted values exceed the “normal” threshold, an alert is sent.

• Applications: early detection of safety issues, machine failures, more

efficient electrical consumption, predicting quality deviation,
adjusting process to prevent material waste, etc.
Source: https://medium.datadriveninvestor.com/root-cause-analysis-in-the-age-of-industry-4-0-9516af5fb1d0
Causality & interventions
• Important business decisions involve the use of limited (scarce) resources.
• The trade-off in the form of a resource-allocation decision:
◦ Should resources (time, equipment, land, …) be devoted to project A or project B?

• We can loosen a constraint but that typically requires other scarce resources.
• A decision on which objects to control or change (i.e. managerial
intervention) typically precede any decision on how to control or change
them.
• Understanding causality is crucial to making effective interventions.
Intervention =
Running an
advertising
campaign

Objective is to
increase this
Causal modelling
• Consider a decision on the purchase of a
new equipment: An  means a causal
◦ Quality has two levels: high or low relationship
+ suggests a positive relationship
◦ High quality equipment can perform more tasks,
hence increase production productivity but the
parts are more expensive.
◦ Maintenance cost: greater the quality of the
equipment, more expensive are the parts, hence
higher maintenance cost.
Foundations for causal graphs
• Causal graphs are directed acyclic
graphs (DAGs). They have
◦ A set of vertices (or nodes) representing
variables in the model
◦ A set of edges (or links) presenting the
connections between variables.
◦ Directed path between two nodes: arrow
shows a direction from a cause to its effect.
◦ There is no circle in DAGs.
Feedback loops & time dimensions
• Consider a relationship between joy and
physical exercise.
◦ Is there any causal relationship between them?
◦ If yes, which variable is cause and which is
effect?

• We can convert circles into directed acyclic

graphs in which we have a time dimension.
◦ At period 0: joy is a cause leading to more
exercise
◦ At period 1: feedback from exercise (period 1) to
joy (period 1)
Structures in casual graphs
• There are three building block

Chain: one variable (X)

causes another (Y) which
causes another (Z) One variable (X) causes two Two variables (X,Y)cause a
other variables (Y & Z). third (Z).

X is a common cause for both Z is a common effect of both

Y and Z. X&Y
Chain
• Example: X learning efforts, Y employability, Z chance of getting a
job.
◦ Y depends on X for its value (hence X and Y are dependent)
◦ Z depends on Y for its value (hence Y and Z are dependent)
◦ Z depends on Y which depends on X,
◦ hence X and Z are also dependent: dependence of X and Z is due to Y being able to change.
◦ What if we hold Y constant (fixed): then changes in X are no linked to changes in Z.
Therefore statistically we say that X and Z are conditionally independent given Y.
Fork
• Example: X is temperature, Y sales of ice cream and Z sales of fan.
◦ Y depends on X for its value (X and Y are dependent)
◦ Z depends on X for its value (X and Z are dependent)
◦ We can still say that (statistically) Y and Z are (statistically) dependent because
changes in Y reflect changes in X which lead to changes in Z.
◦ If you calculate correlation values, what would you expect?
◦ Again correlation does not imply causation.
◦ Easily to see that if holding X fixed, changes in Y no longer linked to changes
in Z.
Collider
• X is competence (at work), Y is Networking , Z: Promotion (at work)
• Both X and Y are causes of Z
• X and Z (similarly Y and Z) are dependent
• X and Y are independent: they neither cause the other nor have a
common cause.
◦ However statistically we can see that if we hold Z fixed, if X change then Y
must also change in a certain way. Why?
◦ Hence we say X and Y are conditionally dependent given Z.
Observed associations
• We can observe associations between two variables in the data.
• However, these associations have two mechanisms
◦ Causal associations
◦ Non-causal associations

• So again, correlation (association) does not imply causation.

Draw assumptions before
making conclusions!
• Consider 3 variables, how
many possible causal models?
• Statistical associations does
not imply causation.
• Hence, it is better to use
knowledge to draw
assumptions (causal graphs)
prior to making conclusions
regarding causality.
Causal modelling for market volume
• Suppose you are asked to make an assessment of the size of the
market for laptop computers.
• The following variables are relevant:
◦ Price: average price per unit
◦ Advertising: the amount of money spent on advertising products
◦ Number of Customers visiting the shop
◦ Media Hype: whether independent media sources report on or display related
products
◦ Market Volume: the total amount of goods sold for your product category
Price & Volume
• The causal relationship between Price & Volume?

• How about the Number of Customers visiting the shop and Volume?

• Any relationship between Price and Number of Customers?

Advertising & Volume
• Do you expect that higher advertising expenditure will lead to higher sales
(market volume)?

• But how about the impact of advertising and number of customers on sales?

• Also how about the effects of advertising and media hype on sales?
Causal model for assessing market value
• Now we put all elements
together, this is our causal
model for situational
assessment.
• Note that there is no (business)
goal / objective in terms of
optimisation or decision
making.
• Rather it assesses how causal
factors affect the market value.
Causal modelling for Interventions
• Example 2: instead of doing a situational assessment, you are now
asked to decide how much to spend on advertising for these
products.
◦ You need to set an objective, e.g. high market share (the proportion of sales
through your retailers to the total number sold).
◦ So the decision variable is “Advertise”.
◦ Simplify intervention decision: (1) run an advertising or (2) not doing that.
◦ Further simplify that you will know the price at the time you set “Advertise”.
Influence diagram
• Often, rectangle shape refers to strategic
option (i.e. decision point, choice variable,
value directly controlled by a strategic agent –
decision making agent)
• Hexagon shape refers to objective (e.g.
profit, value, market share, etc.). Decision are
made to optimise the objective.
• Circle shape refers to probabilistic variables
that are chance variables, uncertain
quantities, environmental factors and other
elements outside the direct control of strategic
agents.
[+] More
Advertising leads [+] High Sales
to a greater lead to greater
certainty of a certainty of a
larger number of high Market
unit sales Share

This is only an
informational [-] [-] Higher Price
link: you know leads to a greater
price when certainty of a
deciding on smaller number
Advertise of unit sales
Influence diagram
• Often, rectangle shape refers to strategic
option (i.e. decision point, choice variable,
value directly controlled by a strategic agent –
decision making agent)
• Hexagon shape refers to objective (e.g.
profit, value, market share, etc.). Decision are
made to optimise the objective.
• Circle shape refers to probabilistic variables
that are chance variables, uncertain
quantities, environmental factors and other
elements outside the direct control of strategic
agents.
[+] More
Advertising leads
to a greater [+] High Sales
certainty of a lead to greater
larger number of certainty of a
unit sales high Market
Share

This is only an
informational [-] [-] Higher Price
link: you know leads to a greater
price when certainty of a
deciding on smaller number
Advertise of unit sales
From causal diagrams to mathematical
equations
• A simplest form of empirical model would be
using regression model as below

The primes indicate

This is called
coefficients with respect to
dependent variable This is
independent variable x
called error
• This equation fails to capture the actual term
relationship among independent variables (x1
 x6)
Shortcomings
• Consider X1, X2 and X4:
associations among these
variables are clear, hence we call
that this model suffers from
multicollinearity problem.
• Also, we cannot use standard
significance tests to reliably
determine which independent
variables exert the most influence.
A solution (not discussed further in this unit)
• Possible to use structural equation model (SEM)
(stepwise regression) via a two-stage regression.
• Stage 1

• Stage 2: using estimated value of the

independent variable obtained from stage 1
regression.
Summaries
• Causal relationships are crucial for (1) situational assessments and (2)
interventions, as part of business analytics.
• If there is a cause-and-effect relationship between two variables x and y, there is
statistical association.
• But (statistical) correlation/association does not necessarily imply causation.
• Casual thinking and graphs are very useful because
◦ They capture both causality and statistical association
◦ They assist with both situational assessment and intervention tasks in business analytics
◦ From managerial perspective, they allow identification of relevant stakeholders (agents,
people, departments, etc.) related in analytics projects as well as resources allocation.
Analytics & Happiness
• What values do business analytics deliver?
◦ Happiness/satisfaction matters every corner of our lives: overall life, work,
school, business, etc.
◦ Overall aims are to increase satisfaction.
◦ Situational analysis informs interventions: how?

• Our use of happiness case study is to illustrate regression analysis.

Your satisfaction (happiness) matters!
• Discuss the following questions from your own experience and
knowledge
◦ What makes you happy = what are the causes of your own happiness?
◦ What makes you sad = what are the causes of your own sadness?

• Draw a casual graph (with directed paths)

Happiness
and Income

Source: World Happiness Report 2024

Page 22
Life Satisfaction & Income across Countries in 2023
Let’s plot
14.000
the data
12.000
f(x) = 0.826265860289798 x + 4.85093160342794
10.000 R² = 0.612735502632602
Life Ladder

8.000
6.000
4.000
2.000
We can
add a
0.000
3.000 3.500 4.000 4.500 5.000 5.500 6.000 6.500 7.000 7.500 8.000 trendline
Log GDP per capita
in Excel

Log GDP per capita Linear (Log GDP per capita)

Linear (Log GDP per capita) Linear (Log GDP per capita)

A linear relationship:
Happiness score is dependent variable
Log GDP per capita is the independent variable
Excel Trendline Tool
• Right click on data series
and choose Add trendline
frpop-up menu
 Check the boxes Display
Equatiom on on chart and
Display R-squared value on
chart
Simple linear regression using least-square
• Simple linear regression model
Y   0  1 X   8.1
• We estimate the parameters (ßs) from the sample data
Yˆ  b  b X 0 1 8.2 
• Once estimated, we can
◦ Assess/explain if X is an important factor explaining Y,
◦ “predict” the value of Y given a specific value of X
◦ Yˆi  b0  b1 X i
Least square regression
• Residuals are the observed errors
associated with estimating the value of the ei  Yi  Yˆi 8.3
dependent variable using the regression line.
• The best-fitting line minimizes the sum of
squares of the residuals.
Simple Linear Regression with Excel
• Using Analysis Toolpak:
◦ Data > DataAnalysis > Regression
Results: Regression Statistics (metrics)
• Multiple R:
◦ sample correlation coefficient
◦ varies from -1 to + 1
◦ negative if slope is negative

• R Square:
◦ coefficient of determinant
◦ varies from 0 (no fit) to 1 (perfect fit)

• Adjusted R Square:
◦ Adjusted R square for sample size
and number of X variables.

• Standard error: variability between

observed and predicted Y values
Interpreting Regression Statistics
• R square = 0.613 means that
61.3% of variation in the
happiness level are explained
by the model, in this case by
the log values of income per
capita.
• The remaining 38.7% (100% -
61.3%) is UNEXPLAINED.
• Adjusted R-square is often
used.
F-test (Analysis of Variance)
• ANOVA conducts an F - test to determine whether variation in Y is
due to varying levels of X.
• ANOVA is used to test for significance of regression:
◦ H0: population slope coefficient = 0
◦ H1: population slope coefficient ≠ 0

• Excel reports the p-value (Significance F).

• Rejecting H0 indicates that X explains variation in Y.
Interpreting Coefficients
• Intercept: often not important
• Log GDP per capita: 3
Y   0  1 X   8.1
elements
◦ Direction of the relationship: positive
value
◦ The magnitude of the relationship:
0.742, meaning that for each one-point
increase in the Log GDP per capita, the
happiness level increase by 0.742.
◦ Statistical strength of the relationship:
Interpreting Coefficients
• Log GDP per capita:
◦ Statistical strength of the relationship
Y   0  1 X   8.1 can be assessed using the hypothesis
testing
Testing Hypotheses for Regression Coefficients
• We would like to test if the coefficient (log(GDP)) is statistically
significant from zero.
• If Coefficient (ᵦ1) = 0, what does this mean?

• If Coefficient (ᵦ1) ≠ 0, what does this mean? (you should consider

one tailed tests) b1  0
◦ Test statistics: t  8.8
standard error
◦ P-value approach
Interpreting Coefficients
• Log GDP per capita:
◦ we can use p-value to assess two-
Y   0  1 X   8.1 tailed test.
◦ H0: β1 = 0 vs H0: β1 ≠ 0
◦ In this example, p-value is nearly zero, <
5% hence we can conclude that there is
sufficient evidence to conclude that the
true β1 is not zero. This means that there
exists a relationship between happiness
level and log of GDP per capita.
◦ We can also conduct a one tailed test.
Confidence Intervals for Regression Coefficient
• Confidence intervals (Lower
95% and Upper 95% values in
the output) provide information
about the unknown values of
the true regression
coefficients, accounting for
sampling error.
• For this example, a 95%
confidence interval for the
income variable is
[0.638;0.845].
Prediction
• Intercept: often not important
• If you know the value of Log GDP per
Y   0  1 X   8.1 capita (e.g. 7), you can predict the value
of Happiness level.
• Predicted HL for Vietnam = -1.411 +
0.742*9.392 = 5.558
Confidence Intervals & Prediction
• Although we predicted for Vietnam
-1.411 + 0.742*9.392 = 5.558
• if the true population parameters are at
the extremes of the confidence intervals,
the estimate might be as low as
-1.411 + 0.638*9.392 = 4.581
or as high as
-1.411 + 0.845*9.392 = 6.525
Residual analysis
• Residual = Actual Y value − Predicted Y value
residual
Standard residual 
standard deviation

• Outliers: standard residuals outside ± 2 or ± 3 are potential outliers

Residual Outputs – Residual Plot

Log GDP per capita Residual Plot

1
Residuals

0
6.000 7.000 8.000 9.000 10.000 11.000 12.000

-1

-2

-3

Log GDP per capita

What are drawbacks of simple linear
regression models?
• Consider the case of happiness:
Multiple linear regression
• Consider the case study of happiness (at national level)
• What are possible “causes” and/or “factors” that can explain
variations in the Happiness level across countries?
Multiple linear regression
• A linear regression model with more than one independent variable is
called a multiple linear regression model.
Y   0  1 X 1   2 X 2     k X k   8.10 
◦ Y is the dependent variable, Xi are the independent (explanatory) variables;
◦ βi are the regression coefficients for the independent variables, ε the error term.

• We estimated the particle regression coefficients bi

Yˆ  b0  b1 X 1  b2 X 2    bk X k 8.11
We are ignoring here possible causal
and statistical relationships among
A causal graph independent variables

Wealth +
Happines
Freedo + s
m
Social +
Perception - + Others
suppor +
t of
corruption Health
Generosit
y
Happiness (Cantril Ladder)
“If you were in trouble, do you have
relatives or friends you can count on to The time series of healthy life
help you whenever you need them, or expectancy at birth

Happiness
not?”
8

2
.2 .4 .6 .8 1 30 40 50 60 70 80
Social support Healthy life expectancy at birth

Life Ladder Fitted values Life Ladder Fitted values

“Are you satisfied or dissatisfied The national average of GWP
with your freedom to choose what responses to the question, “Have you

Happiness
Happiness
donated money to a charity in the past
you do with your life?”

2
8

2
month?” on GDP per capita

.2 .4 .6 .8 1 -.4 -.2 0 .2 .4 .6
Freedom to make life choices Generosity
Life Ladder Fitted values Life Ladder Fitted values
Happiness
8

2
• The average of binary answers to two GWP
questions:
• “Is corruption widespread throughout the
government or not?” and
• “Is corruption widespread within businesses
or not?”
• Where data for government corruption are
missing, the perception of business
0 .2 .4 .6 .8 1
Perceptions of corruption corruption is used as the overall corruption-
Life Ladder Fitted values perception measure.
Multiple regression
Larger adjusted
R-squared

F-test with p-
value of 0

Every X
variables have
t-test (two
tailed) reported
in this columns
(p-value)
ANOVA for Multiple Regression
• ANOVA tests for significance of the entire model. That is, it computes
an F-statistic testing the hypotheses:
H 0 : 1   2     k  0
H1 : at least one  j is not 0
Interpreting the coefficients
• 𝛼_2=0.175 p-value=0.074
If wealth (log of GDP per capita) increases by 1 unit, holding all the other
independent variables constant, the value of happiness will increase by 0.175,
significant at level of 10%

• 𝛼_3=3.55 p-value=0.000
If social support increases by 1 unit, holding all the other independent variables
constant, the value of happiness will increase by 3.55 , significant at level of 1%
Should I include a new Xi variable?
• Some argue that a good regression model should include only
significant independent variables.
◦ But not always clear exactly what will happen when we add or remove variables
from a model: variables that are (or are not) significant in one model may (or may
not) be significant in another.
◦ Should not consider dropping all insignificant variables at one time,
◦ Should take a more structured approach.
Should I include a new Xi variable?
• Using adjusted R-square
◦ Adding an independent variable to a regression model often increase the value of
R-square
◦ Adjusted R-square reflects both the number of Xi variables and sample size.
◦ Adjusted R-square may either increase or decrease when an Xi variable is added
or dropped.
◦ An increase in adjusted R-square indicates the model has improved.
◦ But some prefer models what are simpler (i.e. having less Xi variables) when only
minor differences in the adjusted R-square scores.
Systematic Model Building Approach
1. Consider causal graphs
2. Descriptive analysis & checking out for outliers in both Y and X variables
3. Correlation matrix of all available variables
4. Construct a model with all available independent variables and examine the value of
coefficients and p-values for each coefficient.
5. If p-values > 10%, consider to remove and run step 4 again. You should check adjusted R-
square again.
6. Once majority (or all) x variables are statistically significant and the signs of coefficients are
consistent with expectations, then you are closer to a good model.
7. Check all assumptions (next week learning)

Statistics For Absolute Beginners (Second Edition) (Oliver Theobald
0% (1)
Statistics For Absolute Beginners (Second Edition) (Oliver Theobald
144 pages
Aleix Ruiz de Villa Robert - Causal Inference For Data Science (MEAP V04) - Manning (2023)
No ratings yet
Aleix Ruiz de Villa Robert - Causal Inference For Data Science (MEAP V04) - Manning (2023)
217 pages
Case Problem 1 Heavenly Chocolates Website Transactions
No ratings yet
Case Problem 1 Heavenly Chocolates Website Transactions
6 pages
Evans Analytics2e PPT 01
No ratings yet
Evans Analytics2e PPT 01
47 pages
Business Analytics Mod 5
No ratings yet
Business Analytics Mod 5
43 pages
Topic 6 Understanding Causality and Regression (Updated)
No ratings yet
Topic 6 Understanding Causality and Regression (Updated)
66 pages
Topic 6 Understanding Causality and Regression (updated)
No ratings yet
Topic 6 Understanding Causality and Regression (updated)
66 pages
SUMSEM2022-23 ITA6014 TH VL2022230701044 2023-08-08 Reference-Material-I
No ratings yet
SUMSEM2022-23 ITA6014 TH VL2022230701044 2023-08-08 Reference-Material-I
31 pages
FALLSEM2023-24 SWE2020 ETH VL2023240103291 2023-11-22 Reference-Material-II
No ratings yet
FALLSEM2023-24 SWE2020 ETH VL2023240103291 2023-11-22 Reference-Material-II
26 pages
Data Visualisation
No ratings yet
Data Visualisation
70 pages
2020 - Introduction_to_Causal_Inference - From ML Perspective
No ratings yet
2020 - Introduction_to_Causal_Inference - From ML Perspective
133 pages
Evans Analytics2e PPT 01
No ratings yet
Evans Analytics2e PPT 01
48 pages
Lecture 1&2
No ratings yet
Lecture 1&2
46 pages
BA1 Chapter 10
No ratings yet
BA1 Chapter 10
11 pages
What is Econometrics
No ratings yet
What is Econometrics
34 pages
Six Sigma Overview - SSGB6
No ratings yet
Six Sigma Overview - SSGB6
19 pages
Econometrics
No ratings yet
Econometrics
147 pages
Marketting Analysis
No ratings yet
Marketting Analysis
90 pages
Chapter 1: Introduction To Business Analytics
No ratings yet
Chapter 1: Introduction To Business Analytics
14 pages
Buy ebook Causal Inference for Data Science MEAP Alex Ruiz De Villa cheap price
100% (5)
Buy ebook Causal Inference for Data Science MEAP Alex Ruiz De Villa cheap price
50 pages
Topic 1 - Introduction To Business Analytics
No ratings yet
Topic 1 - Introduction To Business Analytics
47 pages
Introduction_to_Causal_Inference-Aug25_2020-Neal
No ratings yet
Introduction_to_Causal_Inference-Aug25_2020-Neal
61 pages
Causal Inference Extended Tutorial
No ratings yet
Causal Inference Extended Tutorial
189 pages
Causality
No ratings yet
Causality
22 pages
Inference and Intervention Causal Models For Business Analysis
No ratings yet
Inference and Intervention Causal Models For Business Analysis
349 pages
Causal AI For LQG - Ben Steiner - Apr 2023
No ratings yet
Causal AI For LQG - Ben Steiner - Apr 2023
25 pages
HR Analytics Differences
No ratings yet
HR Analytics Differences
9 pages
IE U-2 Combined Notes_47215958_2024_12_04_20_49
No ratings yet
IE U-2 Combined Notes_47215958_2024_12_04_20_49
107 pages
STATISTICS Grand Viva
No ratings yet
STATISTICS Grand Viva
28 pages
WK 6 Scatterdiagram and Correlation Excel
No ratings yet
WK 6 Scatterdiagram and Correlation Excel
12 pages
File 1704445511 0009750 Unit-1 PPT 01
No ratings yet
File 1704445511 0009750 Unit-1 PPT 01
41 pages
Chapter 1 - Introduction To Business Analytics
No ratings yet
Chapter 1 - Introduction To Business Analytics
48 pages
Chapter 01-Introduction To Business Analytics
No ratings yet
Chapter 01-Introduction To Business Analytics
48 pages
DKP Advanced Excel and R - Session 1-2
No ratings yet
DKP Advanced Excel and R - Session 1-2
52 pages
Measures+of+Association
No ratings yet
Measures+of+Association
14 pages
Chapter 1
No ratings yet
Chapter 1
50 pages
Unit 1: Introduction To Econometric Analysis OUTLINE: A. Economic Questions and Data: The Role of Econometrics
No ratings yet
Unit 1: Introduction To Econometric Analysis OUTLINE: A. Economic Questions and Data: The Role of Econometrics
23 pages
Module - 03
No ratings yet
Module - 03
28 pages
Heizer Om10 ch04
No ratings yet
Heizer Om10 ch04
34 pages
Predictive Analytics and Business Applications
No ratings yet
Predictive Analytics and Business Applications
28 pages
1. BA - II Session
No ratings yet
1. BA - II Session
62 pages
Causal Inference in Data Science Untangling Cause and Effect
No ratings yet
Causal Inference in Data Science Untangling Cause and Effect
5 pages
CORRELATION
No ratings yet
CORRELATION
15 pages
Qem S2
No ratings yet
Qem S2
27 pages
Causal Inference for Data Science MEAP Alex Ruiz De Villa instant download
100% (1)
Causal Inference for Data Science MEAP Alex Ruiz De Villa instant download
45 pages
Unit 12
No ratings yet
Unit 12
11 pages
1 Forecasting
No ratings yet
1 Forecasting
31 pages
Compsa - Reviewer
No ratings yet
Compsa - Reviewer
5 pages
Forecasting Errors
No ratings yet
Forecasting Errors
59 pages
Big Data SYBBA(CA)
No ratings yet
Big Data SYBBA(CA)
12 pages
09 Causal Inference II: MSBA7003 Quantitative Analysis Methods
No ratings yet
09 Causal Inference II: MSBA7003 Quantitative Analysis Methods
34 pages
Biostatistics - Data and Its Types
No ratings yet
Biostatistics - Data and Its Types
11 pages
Research Topic Selection
No ratings yet
Research Topic Selection
43 pages
QC Tools
No ratings yet
QC Tools
51 pages
12 and 13 Big Data and AI
No ratings yet
12 and 13 Big Data and AI
26 pages
Articulo en Ingles Mejor Informacion SIMPLE y ESPECIFICA
No ratings yet
Articulo en Ingles Mejor Informacion SIMPLE y ESPECIFICA
23 pages
M 4@BEMicroeconomics
No ratings yet
M 4@BEMicroeconomics
45 pages
Introduction To Business Statistics: Data, Types of Variables, Levels of Measurement, Data Sources, Types of Statistics
No ratings yet
Introduction To Business Statistics: Data, Types of Variables, Levels of Measurement, Data Sources, Types of Statistics
16 pages
Big Data Chapter 3
No ratings yet
Big Data Chapter 3
29 pages
DOC-20240827-WA0007.
No ratings yet
DOC-20240827-WA0007.
43 pages
Accounting, Maths and Computing Principles for Business Studies V11
From Everand
Accounting, Maths and Computing Principles for Business Studies V11
Clive W. Humphris
No ratings yet
(Ebook) Research Methods and Statistics: A Critical Thinking Approach by Sherri L. Jackson ISBN 9781305257795, 1305257790 download pdf
100% (2)
(Ebook) Research Methods and Statistics: A Critical Thinking Approach by Sherri L. Jackson ISBN 9781305257795, 1305257790 download pdf
82 pages
Signalling Maintenance Benchmarking Report
No ratings yet
Signalling Maintenance Benchmarking Report
57 pages
Armstrong Et Al. (2010)
No ratings yet
Armstrong Et Al. (2010)
22 pages
2nd Year
No ratings yet
2nd Year
11 pages
DSM 101 b6bgm7tf
No ratings yet
DSM 101 b6bgm7tf
2 pages
Burton I. - The Quantitative Revolution and TheoreticalGeography
No ratings yet
Burton I. - The Quantitative Revolution and TheoreticalGeography
12 pages
Int 354 ML-1
No ratings yet
Int 354 ML-1
4 pages
MGSC2207Fall2024UpdatedSectionAB
No ratings yet
MGSC2207Fall2024UpdatedSectionAB
8 pages
Business Mathematics and Statistics Asim Kumar Manna All Chapters Instant Download
100% (3)
Business Mathematics and Statistics Asim Kumar Manna All Chapters Instant Download
76 pages
Reading 7 Introduction To Linear Regression
No ratings yet
Reading 7 Introduction To Linear Regression
5 pages
OM-02 Cost Behavior With Regression Analysis
No ratings yet
OM-02 Cost Behavior With Regression Analysis
4 pages
Zoology PG
No ratings yet
Zoology PG
50 pages
Multiple Regression Methodology and Applications
No ratings yet
Multiple Regression Methodology and Applications
7 pages
JAAUTH - Volume 24 - Issue 2 - Pages 367-382
No ratings yet
JAAUTH - Volume 24 - Issue 2 - Pages 367-382
16 pages
Bba2s Decision Techniques For Business 104 Jul 2023
No ratings yet
Bba2s Decision Techniques For Business 104 Jul 2023
2 pages
Instant Access to Modern Information Systems 1st Edition Christos Kalloniatis ebook Full Chapters
100% (8)
Instant Access to Modern Information Systems 1st Edition Christos Kalloniatis ebook Full Chapters
47 pages
[email protected]
No ratings yet
[email protected]
11 pages
Elementary Statidtics in Social Research 12th Edition, (Ebook PDF)download
100% (2)
Elementary Statidtics in Social Research 12th Edition, (Ebook PDF)download
50 pages
Krauss Orth 2021 Work Experiences and Self Esteem Development A Meta Analysis of Longitudinal Studies
No ratings yet
Krauss Orth 2021 Work Experiences and Self Esteem Development A Meta Analysis of Longitudinal Studies
21 pages
67. SC III F. Validation of Analytical Procedures - British Pharmacopoeia
No ratings yet
67. SC III F. Validation of Analytical Procedures - British Pharmacopoeia
5 pages
34 Chaos Identification Through The Autocorrelation Function Indicator (ACFI)
No ratings yet
34 Chaos Identification Through The Autocorrelation Function Indicator (ACFI)
17 pages
Machine_Learning_Lab
No ratings yet
Machine_Learning_Lab
46 pages
JSO (Test- 13) Paid
No ratings yet
JSO (Test- 13) Paid
5 pages
Elements and Principles of Data Analysis
No ratings yet
Elements and Principles of Data Analysis
27 pages
The Ambiguous Intentions Hostility Questionnaire AIHQ A New Measure For Evaluating Hostile Social Cognitive Biases in Paranoia
No ratings yet
The Ambiguous Intentions Hostility Questionnaire AIHQ A New Measure For Evaluating Hostile Social Cognitive Biases in Paranoia
17 pages
Child Health Questionnaire Paper Good
No ratings yet
Child Health Questionnaire Paper Good
9 pages
53 244 1 PB
No ratings yet
53 244 1 PB
11 pages
De Wolf y Van IJzendoorn, 1997
No ratings yet
De Wolf y Van IJzendoorn, 1997
22 pages
Hindi Translation and Validation of Scales For Subjective Well-Being, Locus of Control and Spiritual Well-Being
No ratings yet
Hindi Translation and Validation of Scales For Subjective Well-Being, Locus of Control and Spiritual Well-Being
8 pages

Topic 6 Understanding Causality and Regression

Uploaded by

Topic 6 Understanding Causality and Regression

Uploaded by

Topic 6 Linear Regression

Vincent Hoang (2022), Lecture 9, 10

• If cor(x,y) is positive or negative (regardless the strength of the correlation),

• Is that statistical correlation always meaningful, especially for

• Applications: early detection of safety issues, machine failures, more

• We can convert circles into directed acyclic

Chain: one variable (X)

X is a common cause for both Z is a common effect of both

• So again, correlation (association) does not imply causation.

• Any relationship between Price and Number of Customers?

The primes indicate

• Stage 2: using estimated value of the

• Our use of happiness case study is to illustrate regression analysis.

• Draw a casual graph (with directed paths)

Source: World Happiness Report 2024

Log GDP per capita Linear (Log GDP per capita)

• Standard error: variability between

• Excel reports the p-value (Significance F).

• If Coefficient (ᵦ1) ≠ 0, what does this mean? (you should consider

• Outliers: standard residuals outside ± 2 or ± 3 are potential outliers

Log GDP per capita Residual Plot

Log GDP per capita

• We estimated the particle regression coefficients bi

Life Ladder Fitted values Life Ladder Fitted values

You might also like