Assign4 Gam

This document describes building an additive model to predict workers' wages using their age, year, and education level. An additive model was created using a generalized additive model (GAM) with 6 splines for age and year and 5 splines for education. Partial dependency plots show the influence of each feature on wages. The model was validated by comparing actual and predicted test set values and analyzing residuals and correlation, with low R2 scores indicating the model could be improved by adding more features.

Uploaded by

Chelsi Gondalia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views

Assign4 Gam

Uploaded by

Chelsi Gondalia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Building an Additive Model

Chelsi Gondalia
10/25/2021
In this study, we are working with a dataset that contains the wage of workers and some of their
demographic data like age, year, marital status, education, and several others. Overall, we have 3000
records and 9 features. The objective of this study is to build an additive model to predict the wages of
workers.
In our model we are focusing on three features: age, year and education, the theoretical representation of
this model is given by Equation (1).
𝑤𝑎𝑔𝑒𝑖 = 𝛽0 + 𝑓1 (𝑦𝑒𝑎𝑟𝑖 ) + 𝑓2 (𝑎𝑔𝑒𝑖 ) + 𝑓3 (𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛𝑖 ) + 𝜀𝑖
The dataset was split into train and test sets. The test set size was set to contain 25% of the total data. The
train set was then used to build our additive model in python using the function GAM. For the age and
year, we used n_splines=6. This means that we used 6 splines or knots in each of the smoothing functions
that was fitted. For education, we used n_splines=5. The results of the GAM can be interpreted with the
help of partial dependency plots shown in Figure 1. It should be noted that spline function allows for
smoothing of the curves in Figure 1. The plots are visualization of how each feature (on the x-axis)
influences our response variable- wages (on the y-axis). The dotted lines around each of the solid curves
represent the 95% confidence intervals. For the feature “year”, the wage increases overall with one
peculiar drop between 2007 and 2008. For the feature “age”, the wage increases with a steep slope until
~48 years and then begins to decline. The feature “education” seems to have a fairly linear relationship
with wage.

Figure 1. Partial dependency plots with confidence intervals.

Now that we have a fair understanding of our additive model and is features, it must be validated. The test
set predictions and actual values are plotted in Figure 2 for comparison. It is evident that our model is
poor at predicting wages that beyond 175. To further validate the model, the residual distribution, which
is assumed to be reasonably normal with a trailing right end, is plotted in Figure 3.

Figure 2. Comparing actual test set values to GAM predictions.

Figure 3. Distribution of residuals.

Lastly, we want to check for the correlation between our actual data points and the GAM predictions.
This comparison for both the test and the train sets can be viewed in Table 1. As we can see, the overall
R2 score is quite low indicating that our model is not efficient. We could improve this model by adding
more features.
Table 1. R2 score for test and train sets.

Test set Train set

2
R 0.32 0.29

MatLab Modelling of Differential Protection Relay
100% (4)
MatLab Modelling of Differential Protection Relay
10 pages
Atm Machine FSM
No ratings yet
Atm Machine FSM
8 pages
Project-Predictive Modeling-Rajendra M Bhat
100% (3)
Project-Predictive Modeling-Rajendra M Bhat
14 pages
Mini Project - Machine Learning - Tejas Nayak
No ratings yet
Mini Project - Machine Learning - Tejas Nayak
65 pages
Data Interpretation Guide For All Competitive and Admission Exams
From Everand
Data Interpretation Guide For All Competitive and Admission Exams
Mohmmad Khaja Shareef
2.5/5 (6)
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Fluids Lab Report
100% (3)
Fluids Lab Report
19 pages
Opening Black Boxes: How To Leverage Explainable Machine Learning
No ratings yet
Opening Black Boxes: How To Leverage Explainable Machine Learning
11 pages
Machine Learning 2
No ratings yet
Machine Learning 2
45 pages
Simple Linear Regression in Machine Learning
No ratings yet
Simple Linear Regression in Machine Learning
7 pages
Regression Dataset Example
No ratings yet
Regression Dataset Example
14 pages
06 - Grouped and Dummy Regression - Causal Inference For The Brave and True
No ratings yet
06 - Grouped and Dummy Regression - Causal Inference For The Brave and True
5 pages
Credit Risk Project
No ratings yet
Credit Risk Project
11 pages
AIDS - DM Using Python - Lab Programs
No ratings yet
AIDS - DM Using Python - Lab Programs
19 pages
Exp 1
No ratings yet
Exp 1
6 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
30 pages
Sberbank Project Report
No ratings yet
Sberbank Project Report
19 pages
Notes Topic 2.6 Competing Function Model Validation (1)
No ratings yet
Notes Topic 2.6 Competing Function Model Validation (1)
4 pages
UnivariateRegression Summary
No ratings yet
UnivariateRegression Summary
36 pages
ssrn-3526707
No ratings yet
ssrn-3526707
5 pages
Module 3.1
No ratings yet
Module 3.1
25 pages
SSRN Id3990877
No ratings yet
SSRN Id3990877
8 pages
Project paarth (1) (1)
No ratings yet
Project paarth (1) (1)
21 pages
AI Lec 3
No ratings yet
AI Lec 3
36 pages
PGP25116 - Soubhagya - Dash - DPolynomial Regression
No ratings yet
PGP25116 - Soubhagya - Dash - DPolynomial Regression
4 pages
CSC 240 HW 2
No ratings yet
CSC 240 HW 2
5 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
ML Book Notes
No ratings yet
ML Book Notes
9 pages
What Are The Differences Between Supervised and Unsupervised Learning?
No ratings yet
What Are The Differences Between Supervised and Unsupervised Learning?
21 pages
vertopal.com_C1_W2_Lab04_FeatEng_PolyReg_Soln
No ratings yet
vertopal.com_C1_W2_Lab04_FeatEng_PolyReg_Soln
5 pages
DATA SCIENCE iNTERVIEW QUESTION
No ratings yet
DATA SCIENCE iNTERVIEW QUESTION
42 pages
BDA MSC It
No ratings yet
BDA MSC It
35 pages
Data Science Interview Questions
No ratings yet
Data Science Interview Questions
50 pages
DADM Unit 5 Programs
No ratings yet
DADM Unit 5 Programs
63 pages
Simple Regression Model Fitting
No ratings yet
Simple Regression Model Fitting
5 pages
Salary Prediction
No ratings yet
Salary Prediction
4 pages
Stats 1
No ratings yet
Stats 1
3 pages
Linear Regression Example
No ratings yet
Linear Regression Example
28 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
AP Precalc 2.6
No ratings yet
AP Precalc 2.6
4 pages
20dit073 Jay Prajapati ML
No ratings yet
20dit073 Jay Prajapati ML
68 pages
Linear Regression Hands-On
No ratings yet
Linear Regression Hands-On
27 pages
FRA Milestone 1
No ratings yet
FRA Milestone 1
33 pages
Linear Regression Example
No ratings yet
Linear Regression Example
26 pages
shsconf_cdems2023_03013
No ratings yet
shsconf_cdems2023_03013
5 pages
u1 p2 2
No ratings yet
u1 p2 2
66 pages
Linear Regression2
No ratings yet
Linear Regression2
9 pages
Polynomial Regression From Scratch in Python - by Rashida Nasrin Sucky - Towards Data Science
No ratings yet
Polynomial Regression From Scratch in Python - by Rashida Nasrin Sucky - Towards Data Science
1 page
003-FIN7790 (Part2)
No ratings yet
003-FIN7790 (Part2)
162 pages
Module3-Fitting A Model To Data
No ratings yet
Module3-Fitting A Model To Data
57 pages
Xplore Feature Engineering
No ratings yet
Xplore Feature Engineering
9 pages
Model_learning_steps
No ratings yet
Model_learning_steps
12 pages
Analysis and Prediction of House Prices by Linear Regression Model
No ratings yet
Analysis and Prediction of House Prices by Linear Regression Model
91 pages
EDA - Exploratory Data Analysis
No ratings yet
EDA - Exploratory Data Analysis
16 pages
Predictive Modeling Project
No ratings yet
Predictive Modeling Project
16 pages
week_11_features_additive
No ratings yet
week_11_features_additive
19 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
Special Topic: Missing Values
No ratings yet
Special Topic: Missing Values
25 pages
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Learn Excel Functions: Count, Countif, Sum and Sumif
From Everand
Learn Excel Functions: Count, Countif, Sum and Sumif
Rajan
5/5 (4)
Student Solutions Manual for Mathematics for Economics, fourth edition
From Everand
Student Solutions Manual for Mathematics for Economics, fourth edition
Michael Hoy
No ratings yet
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
From Everand
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
Idea Link
No ratings yet
G8 - Math - PPT - Direct and Inverse Proportion - W011 - AY2022-2023
No ratings yet
G8 - Math - PPT - Direct and Inverse Proportion - W011 - AY2022-2023
96 pages
Worksheet For Wholeness Work Basic Process 9-19-2018
No ratings yet
Worksheet For Wholeness Work Basic Process 9-19-2018
1 page
Boris Stoyanov - The Dynamics of D-Branes With Dirac-Born-Infeld and Chern-Simons/Wess-Zumino Actions
No ratings yet
Boris Stoyanov - The Dynamics of D-Branes With Dirac-Born-Infeld and Chern-Simons/Wess-Zumino Actions
58 pages
Health Statistics Revision Questions
100% (2)
Health Statistics Revision Questions
8 pages
2010 Aime-Ii
No ratings yet
2010 Aime-Ii
5 pages
Devore Wadsworth
No ratings yet
Devore Wadsworth
2 pages
Period PV of 1 at 10% PV of Ordinary Annuity of 1 at 10%
No ratings yet
Period PV of 1 at 10% PV of Ordinary Annuity of 1 at 10%
1 page
GradeIX-WinterVacationAssignments2081@19d053ddc8814f449cbdf15fb99e2845
No ratings yet
GradeIX-WinterVacationAssignments2081@19d053ddc8814f449cbdf15fb99e2845
3 pages
Objective mathematics 3rd Edition J K Sharma pdf download
100% (2)
Objective mathematics 3rd Edition J K Sharma pdf download
77 pages
Sem 2
No ratings yet
Sem 2
128 pages
Phy 101 Note 22-23
No ratings yet
Phy 101 Note 22-23
8 pages
Intermediate (IPC) Course Paper 3A: Cost Accounting CA. Dharmendra Gupta
No ratings yet
Intermediate (IPC) Course Paper 3A: Cost Accounting CA. Dharmendra Gupta
64 pages
Mix Pure Math 1-2-3 Markscheme
No ratings yet
Mix Pure Math 1-2-3 Markscheme
5 pages
Factoring Polynomials Perfect Square Trinomials
No ratings yet
Factoring Polynomials Perfect Square Trinomials
3 pages
Grinding Operations Design Overview
100% (1)
Grinding Operations Design Overview
50 pages
Program of Stack Using Array
No ratings yet
Program of Stack Using Array
9 pages
The Role of Spatial Agglomeration in A Structural Model of Innovation, Productivity and Export: A Firm-Level Analysis
No ratings yet
The Role of Spatial Agglomeration in A Structural Model of Innovation, Productivity and Export: A Firm-Level Analysis
24 pages
TheAr 1 Chapter Summaries
No ratings yet
TheAr 1 Chapter Summaries
8 pages
CE6102 - 6 - Geometric Nonlinearity - Large Strain and Large Deformation Problems
No ratings yet
CE6102 - 6 - Geometric Nonlinearity - Large Strain and Large Deformation Problems
16 pages
Tecnomatix: Plant
No ratings yet
Tecnomatix: Plant
13 pages
Assignment-1 Solution
0% (1)
Assignment-1 Solution
5 pages
(Ebook) Quantum information theory by Mark M. Wilde ISBN 9781107034259, 1107034256 download
No ratings yet
(Ebook) Quantum information theory by Mark M. Wilde ISBN 9781107034259, 1107034256 download
47 pages
Regression Notes PDF
No ratings yet
Regression Notes PDF
32 pages
Chapter 06
No ratings yet
Chapter 06
48 pages
Mb0049 Unit 05-Slm
No ratings yet
Mb0049 Unit 05-Slm
31 pages
Section 6 - Local Effects 6.2 Drifting at Projections and Obstructions
No ratings yet
Section 6 - Local Effects 6.2 Drifting at Projections and Obstructions
2 pages
Laying Out A Curve by Deflection Angle
100% (1)
Laying Out A Curve by Deflection Angle
4 pages