Lecture Notes 2.2 Quantitative - Reasoning Estimation
Lecture Notes 2.2 Quantitative - Reasoning Estimation
Estimation Techniques
Presented by Dr Sourav Sen Gupta
1
© 2021 Nanyang Technological University, Singapore. All Rights Reserved.
Case Study
Are you paying as is
expected to buy your
new house?
2
© 2021 Nanyang Technological University, Singapore. All Rights Reserved.
Quantitative Reasoning
Desired insights on the problem Suppose you find a
A. Does the price of the house at
all depend on these features? 1710 sq. ft. 5-year-old
B. Is the quoted price reasonable good quality (7 of 10)
given the features of the house? house at $208,500.
Steps to obtain the desired insights
• How to frame concrete numerical questions? Does this mean you
• How to identify tools and data for analysis? landed a good deal?
• How to build models to analyse the data?
• How to analyse the results you obtain?
(Write down what you think.)
3
© 2021 Nanyang Technological University, Singapore. All Rights Reserved.
Identify Your Data
What type of data is relevant?
• Binary : Is this a good deal or a bad deal? Single house deal : YES/NO
• Continuous : What is the final sale price? Single house deal : $208,500
4
© 2021 Nanyang Technological University, Singapore. All Rights Reserved.
Formulate Your Question
What if we estimate Price naively? Surveyed Houses
• Generic estimate = Mean (Price) = 182517 Features : Age, Area, Quality
• How wrong can this estimate be in general? Response : Price (to be estimated)
• What is your confidence on this estimate? Number of data samples = 500 houses
5
© 2021 Nanyang Technological University, Singapore. All Rights Reserved.
0
5
10
15
20
25
30
35
40
<30000 or (blank)
30000-39999
40000-49999
60000-69999
70000-79999
80000-89999
90000-99999
100000-109999
110000-119999
120000-129999
130000-139999
140000-149999
150000-159999
160000-169999
170000-179999
180000-189999
190000-199999
200000-209999
182517
210000-219999
220000-229999
230000-239999
240000-249999
2 SD = 2 x 78603
250000-259999
68%
260000-269999
270000-279999
280000-289999
290000-299999
300000-309999
310000-319999
320000-329999
330000-339999
340000-349999
350000-359999
360000-369999
370000-379999
380000-389999
390000-399999
Considering only the response
400000-409999
410000-419999
Generic estimate = Mean (Price) = 182517
420000-429999
430000-439999
69
77
36
16
91
31
Age
1077
1774
2090
1694
1362
2198
1717
1786
1262
1710
Area
Surveyed Houses
Let’s Go With the Naïve Estimate…
5
7
7
8
5
8
7
7
6
7
Quality
Features : Age, Area, Quality
Response : Price (to be estimated)
Number of data samples = 500 houses
Price
118000
129900
200000
307000
143000
250000
140000
223500
181500
208500
6
Let’s Use Another Feature
Considering Area versus Price Surveyed Houses
How strongly related are these variables? Features : Age, Area, Quality
600000 Response : Price (to be estimated)
Number of data samples = 500 houses
500000
7
© 2021 Nanyang Technological University, Singapore. All Rights Reserved.
Let’s Try Every Feature
Considering Quality versus Price Surveyed Houses
How strongly related are these variables? Features : Age, Area, Quality
600000 Response : Price (to be estimated)
Number of data samples = 500 houses
500000
8
© 2021 Nanyang Technological University, Singapore. All Rights Reserved.
Let’s Try Every Feature
Considering Age versus Price Surveyed Houses
How strongly related are these variables? Features : Age, Area, Quality
600000 Response : Price (to be estimated)
Number of data samples = 500 houses
500000
200000
CORR = - 0.55 8 2198 8 250000
16 1362 5 143000
3 1694 8 307000
100000
36 2090 7 200000
77 1774 7 129900
0
69 1077 5 118000
0 20 40 60 80 100 120 140
9
© 2021 Nanyang Technological University, Singapore. All Rights Reserved.
Let’s Try a Better Estimate …
Estimate Price using Area Surveyed Houses
Estimate of price = a ´ Area + b (linear model) Features : Age, Area, Quality
600000 Response : Price (to be estimated)
Number of data samples = 500 houses
500000
10
© 2021 Nanyang Technological University, Singapore. All Rights Reserved.
Linear Model for Estimate
Core concept : Regression Surveyed Houses
Price = a ´ Area + b ´ Quality + c ´ Age + d Features : Age, Area, Quality
Response : Price (to be estimated)
11
© 2021 Nanyang Technological University, Singapore. All Rights Reserved.
Result of Our Analysis
What we did is Regression Analysis Surveyed Houses
Price = a ´ Area + b ´ Quality + c ´ Age + d Features : Age, Area, Quality
Response : Price (to be estimated)
a = 68.88630627 b = 24687.45549 Number of data samples = 500 houses
c = −527.6842017 d = −54195.25107
Age Area Quality Price
5 1710 7 208500
12
© 2021 Nanyang Technological University, Singapore. All Rights Reserved.
Analysis of Our Result
What we did is Regression Analysis The Big Picture
Price = a ´ Area + b ´ Quality + c ´ Age + d
Real-life application: Predicting
or estimating the price of a house.
a = 68.88630627 b = 24687.45549
c = −527.6842017 d = −54195.25107 Data collection: Collect samples of
multiple houses with their features.
Treat Standard Error as the “margin of error” to Did we reduce the uncertainty?
determine the confidence interval for estimation,
where 2 ´ SE interval has 68% of confidence. Yes, the margin of error in price
estimation reduced by around half.
13
© 2021 Nanyang Technological University, Singapore. All Rights Reserved.
Pause and Ponder
Which country is the
happiest one in the
entire world?
14
© 2021 Nanyang Technological University, Singapore. All Rights Reserved.
Acknowledgements