Assign4 Gam
Assign4 Gam
Chelsi Gondalia
10/25/2021
In this study, we are working with a dataset that contains the wage of workers and some of their
demographic data like age, year, marital status, education, and several others. Overall, we have 3000
records and 9 features. The objective of this study is to build an additive model to predict the wages of
workers.
In our model we are focusing on three features: age, year and education, the theoretical representation of
this model is given by Equation (1).
𝑤𝑎𝑔𝑒𝑖 = 𝛽0 + 𝑓1 (𝑦𝑒𝑎𝑟𝑖 ) + 𝑓2 (𝑎𝑔𝑒𝑖 ) + 𝑓3 (𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛𝑖 ) + 𝜀𝑖
The dataset was split into train and test sets. The test set size was set to contain 25% of the total data. The
train set was then used to build our additive model in python using the function GAM. For the age and
year, we used n_splines=6. This means that we used 6 splines or knots in each of the smoothing functions
that was fitted. For education, we used n_splines=5. The results of the GAM can be interpreted with the
help of partial dependency plots shown in Figure 1. It should be noted that spline function allows for
smoothing of the curves in Figure 1. The plots are visualization of how each feature (on the x-axis)
influences our response variable- wages (on the y-axis). The dotted lines around each of the solid curves
represent the 95% confidence intervals. For the feature “year”, the wage increases overall with one
peculiar drop between 2007 and 2008. For the feature “age”, the wage increases with a steep slope until
~48 years and then begins to decline. The feature “education” seems to have a fairly linear relationship
with wage.