0% found this document useful (0 votes)
29 views16 pages

Lecture Notes 2.2 Quantitative - Reasoning Estimation

Uploaded by

pranav.garg1006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views16 pages

Lecture Notes 2.2 Quantitative - Reasoning Estimation

Uploaded by

pranav.garg1006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

CC0002 Navigating the Digital World

Module 2: Quantitative Reasoning Techniques

Estimation Techniques
Presented by Dr Sourav Sen Gupta

1
© 2021 Nanyang Technological University, Singapore. All Rights Reserved.
Case Study
Are you paying as is
expected to buy your
new house?

• How do you know for sure?


• Can it be over/under-priced?
• How do you estimate the price?

2
© 2021 Nanyang Technological University, Singapore. All Rights Reserved.
Quantitative Reasoning
Desired insights on the problem Suppose you find a
A. Does the price of the house at
all depend on these features? 1710 sq. ft. 5-year-old
B. Is the quoted price reasonable good quality (7 of 10)
given the features of the house? house at $208,500.
Steps to obtain the desired insights
• How to frame concrete numerical questions? Does this mean you
• How to identify tools and data for analysis? landed a good deal?
• How to build models to analyse the data?
• How to analyse the results you obtain?
(Write down what you think.)

3
© 2021 Nanyang Technological University, Singapore. All Rights Reserved.
Identify Your Data
What type of data is relevant?
• Binary : Is this a good deal or a bad deal? Single house deal : YES/NO
• Continuous : What is the final sale price? Single house deal : $208,500

How much data do you need?


• Is it sufficient to have a single data point? Single house : 208500
• Is it required to have a million data points? Multiple houses : 208500, 181500,
223500, 140000, …
Do you want an estimation?
• Which features do you need for a house? Features : Age, Area, Quality
• Is it possible to get data for all features? Response : Price (to be estimated)

4
© 2021 Nanyang Technological University, Singapore. All Rights Reserved.
Formulate Your Question
What if we estimate Price naively? Surveyed Houses
• Generic estimate = Mean (Price) = 182517 Features : Age, Area, Quality
• How wrong can this estimate be in general? Response : Price (to be estimated)
• What is your confidence on this estimate? Number of data samples = 500 houses

Age Area Quality Price


Which feature is the strongest? 5 1710 7 208500

• Does age determine the price of a house? 31 1262 6 181500


7 1786 7 223500
• Or does area have more effect on the price? 91 1717 7 140000
• Or is it quality that affects the price most? 8 2198 8 250000
16 1362 5 143000
3 1694 8 307000
Suddenly, things look more complicated! J 36 2090 7 200000
77 1774 7 129900
69 1077 5 118000

5
© 2021 Nanyang Technological University, Singapore. All Rights Reserved.
0
5
10
15
20
25
30
35
40
<30000 or (blank)
30000-39999
40000-49999
60000-69999
70000-79999
80000-89999
90000-99999
100000-109999
110000-119999
120000-129999
130000-139999
140000-149999
150000-159999
160000-169999
170000-179999
180000-189999
190000-199999
200000-209999

182517
210000-219999
220000-229999
230000-239999
240000-249999
2 SD = 2 x 78603
250000-259999

68%
260000-269999
270000-279999
280000-289999
290000-299999
300000-309999
310000-319999
320000-329999
330000-339999
340000-349999
350000-359999
360000-369999
370000-379999
380000-389999
390000-399999
Considering only the response

400000-409999
410000-419999
Generic estimate = Mean (Price) = 182517

420000-429999
430000-439999

© 2021 Nanyang Technological University, Singapore. All Rights Reserved.


440000-449999
470000-479999
500000-509999
550000-559999
3
8
7
5

69
77
36
16
91
31
Age

1077
1774
2090
1694
1362
2198
1717
1786
1262
1710
Area
Surveyed Houses
Let’s Go With the Naïve Estimate…

5
7
7
8
5
8
7
7
6
7
Quality
Features : Age, Area, Quality
Response : Price (to be estimated)
Number of data samples = 500 houses

Price

118000
129900
200000
307000
143000
250000
140000
223500
181500
208500

6
Let’s Use Another Feature
Considering Area versus Price Surveyed Houses
How strongly related are these variables? Features : Age, Area, Quality
600000 Response : Price (to be estimated)
Number of data samples = 500 houses
500000

Age Area Quality Price


400000
5 1710 7 208500
31 1262 6 181500
300000 7 1786 7 223500
91 1717 7 140000
8 2198 8 250000
200000
16 1362 5 143000
3 1694 8 307000
100000
CORR = 0.76 36 2090 7 200000
77 1774 7 129900
0
69 1077 5 118000
0 500 1000 1500 2000 2500 3000 3500 4000

7
© 2021 Nanyang Technological University, Singapore. All Rights Reserved.
Let’s Try Every Feature
Considering Quality versus Price Surveyed Houses
How strongly related are these variables? Features : Age, Area, Quality
600000 Response : Price (to be estimated)
Number of data samples = 500 houses
500000

Age Area Quality Price


400000
5 1710 7 208500
31 1262 6 181500
300000 7 1786 7 223500
91 1717 7 140000
8 2198 8 250000
200000
16 1362 5 143000
3 1694 8 307000
100000
CORR = 0.81 36 2090 7 200000
77 1774 7 129900
0
69 1077 5 118000
0 2 4 6 8 10 12

8
© 2021 Nanyang Technological University, Singapore. All Rights Reserved.
Let’s Try Every Feature
Considering Age versus Price Surveyed Houses
How strongly related are these variables? Features : Age, Area, Quality
600000 Response : Price (to be estimated)
Number of data samples = 500 houses
500000

Age Area Quality Price


400000
5 1710 7 208500
31 1262 6 181500
300000 7 1786 7 223500
91 1717 7 140000

200000
CORR = - 0.55 8 2198 8 250000
16 1362 5 143000
3 1694 8 307000
100000
36 2090 7 200000
77 1774 7 129900
0
69 1077 5 118000
0 20 40 60 80 100 120 140

9
© 2021 Nanyang Technological University, Singapore. All Rights Reserved.
Let’s Try a Better Estimate …
Estimate Price using Area Surveyed Houses
Estimate of price = a ´ Area + b (linear model) Features : Age, Area, Quality
600000 Response : Price (to be estimated)
Number of data samples = 500 houses
500000

Age Area Quality Price


400000
5 1710 7 208500
31 1262 6 181500
300000
2 SE 7 1786 7 223500
= 2 x 51503 91 1717 7 140000
8 2198 8 250000
200000
16 1362 5 143000
3 1694 8 307000
100000
36 2090 7 200000
77 1774 7 129900
0
69 1077 5 118000
0 500 1000 1500 2000 2500 3000 3500 4000

10
© 2021 Nanyang Technological University, Singapore. All Rights Reserved.
Linear Model for Estimate
Core concept : Regression Surveyed Houses
Price = a ´ Area + b ´ Quality + c ´ Age + d Features : Age, Area, Quality
Response : Price (to be estimated)

Mechanism: Analytic or Algorithmic Number of data samples = 500 houses

Model trained on samples produces coefficients


Age Area Quality Price
a = 68.88630627 b = 24687.45549 5 1710 7 208500

c = −527.6842017 d = −54195.25107 31 1262 6 181500


7 1786 7 223500
91 1717 7 140000

Accuracy: Standard Error 8


16
2198
1362
8
5
250000
143000
Average error in estimating price = 36512 3 1694 8 307000
36 2090 7 200000
Are all features important? Study p-values.
77 1774 7 129900
69 1077 5 118000

11
© 2021 Nanyang Technological University, Singapore. All Rights Reserved.
Result of Our Analysis
What we did is Regression Analysis Surveyed Houses
Price = a ´ Area + b ´ Quality + c ´ Age + d Features : Age, Area, Quality
Response : Price (to be estimated)
a = 68.88630627 b = 24687.45549 Number of data samples = 500 houses
c = −527.6842017 d = −54195.25107
Age Area Quality Price
5 1710 7 208500

Test on a single data sample 31 1262 6 181500


7 1786 7 223500
Price estimate 91 1717 7 140000

= 68.88630627 ´ 1710 + 24687.45549 ´ 7 8 2198 8 250000


16 1362 5 143000
− 527.6842017 ´ 5 − 54195.25107 = 233774 3 1694 8 307000
Actual price = 208500 36 2090 7 200000
77 1774 7 129900
Error in estimation = 25274 (12%)
69 1077 5 118000

12
© 2021 Nanyang Technological University, Singapore. All Rights Reserved.
Analysis of Our Result
What we did is Regression Analysis The Big Picture
Price = a ´ Area + b ´ Quality + c ´ Age + d
Real-life application: Predicting
or estimating the price of a house.
a = 68.88630627 b = 24687.45549
c = −527.6842017 d = −54195.25107 Data collection: Collect samples of
multiple houses with their features.

Prediction or estimation Model building: Assuming the price


is linearly dependent on all these
Given the features, fit them in the linear model. features, build the linear model.

Treat Standard Error as the “margin of error” to Did we reduce the uncertainty?
determine the confidence interval for estimation,
where 2 ´ SE interval has 68% of confidence. Yes, the margin of error in price
estimation reduced by around half.

13
© 2021 Nanyang Technological University, Singapore. All Rights Reserved.
Pause and Ponder
Which country is the
happiest one in the
entire world?

• Which features does it depend on?


• How can you estimate “happiness”?
• Is there a significant margin of error?

14
© 2021 Nanyang Technological University, Singapore. All Rights Reserved.
Acknowledgements

Arranged in order of appearance

• metamorworks. (2020). Artificial intelligence concept [Photograph]. iStockphoto LP. https://www.istockphoto.com/photo/ai-concept-deep-learning-gui-gm1223789411-359601344


• ProjectManhattan. (2014). Houses in Singapore [Photograph]. Wikimedia Commons. https://commons.wikimedia.org/wiki/File:Terraced_houses_at_Serangoon_Terrace,_Singapore.jpg. CC BY-SA 3.0.
• avanti_photo. (n.d.). Globe [Photograph]. Envato Elements Pty Ltd. https://elements.envato.com/world-atlas-globe-map-7EEWC35

© 2021 Nanyang Technological University, Singapore. All Rights Reserved.


16
© 2021 Nanyang Technological University, Singapore. All Rights Reserved.

You might also like