Machine Learning Project Presentation
Machine Learning Project Presentation
Project
Samuel Odulaja
Background
● Kaggle Dataset
○ Contains around 1400 house prices and associated predictors
● 79 explanatory variables describing aspects of residential homes in Ames, Iowa
● Using advanced regression techniques, predict the final price of each home
Concatenate training and testing features
● Concatenated features so
that we don’t have to
impute missing values,
transform features, etc.
○ Did this for both
training and test sets
● Removed houses with
ground living area greater
than 4,500 sq.ft from the
training sets
SalePrice Distribution
SalePrice Transformation
● Fitted a linear model to the training data and removed examples with a studentized residual
greater than 3
Define random search
● Overall the models did well with Gradient Boosting performing the best.
○ Ridge: 0.0778
○ Lasso: 0.0796
○ SVR: 0.0712
○ LGBM: 0.0640
○ GBM: 0.0436
Creating Predictions and RSME
● Stored the predictions of the based learners and stacked ensemble in a list
● Averaged the predictions and gave a weight of 0.13 to the based learners and .35 to the stacked
ensemble
● RSME: 0.3848232
Conclusions