tutorial-time-series-forecasting-with-xgboost
tutorial-time-series-forecasting-with-xgboost
February 5, 2025
2 Data
The data we will be using is hourly power consumption data from PJM. Energy consumtion has
some unique charachteristics. It will be interesting to see how prophet picks them up.
Pulling the PJM East which has data from 2002-2018 for the entire east region.
3 Train/Test Split
Cut off the data after 2015 to use as our validation set.
[ ]: split_date = '01-Jan-2015'
pjme_train = pjme.loc[pjme.index <= split_date].copy()
pjme_test = pjme.loc[pjme.index > split_date].copy()
1
[ ]: _ = pjme_test \
.rename(columns={'PJME_MW': 'TEST SET'}) \
.join(pjme_train.rename(columns={'PJME_MW': 'TRAINING SET'}), how='outer') \
.plot(figsize=(15,5), title='PJM East', style='.')
X = df[['hour','dayofweek','quarter','month','year',
'dayofyear','dayofmonth','weekofyear']]
if label:
y = df[label]
return X, y
return X
2
splits.
[ ]: _ = plot_importance(reg, height=0.9)
[ ]: _ = pjme_all[['PJME_MW','MW_Prediction']].plot(figsize=(15, 5))
[ ]: f, ax = plt.subplots(1)
f.set_figheight(5)
f.set_figwidth(15)
_ = pjme_all[['MW_Prediction','PJME_MW']].plot(ax=ax,
style=['-','.'])
ax.set_ylim(0, 60000)
ax.set_xbound(lower='07-01-2015', upper='07-08-2015')
plot = plt.suptitle('First Week of July Forecast vs Actuals')
3
Our MAPE error is 8.9%
[ ]: mean_squared_error(y_true=pjme_test['PJME_MW'],
y_pred=pjme_test['MW_Prediction'])
[ ]: mean_absolute_error(y_true=pjme_test['PJME_MW'],
y_pred=pjme_test['MW_Prediction'])
I like using mean absolute percent error because it gives an easy to interperate percentage showing
how off the predictions are. MAPE isn’t included in sklearn so we need to use a custom function.
[ ]: mean_absolute_percentage_error(y_true=pjme_test['PJME_MW'],
y_pred=pjme_test['MW_Prediction'])
Notice anything about the over forecasted days? - #1 worst day - July 4th, 2016 - is a holiday. -
#3 worst day - December 25, 2015 - Christmas - #5 worst day - July 4th, 2016 - is a holiday.
Looks like our model may benefit from adding a holiday indicator.
The best predicted days seem to be a lot of october (not many holidays and mild weather) Also
early may
4
style=['-','.'])
ax.set_ylim(0, 60000)
ax.set_xbound(lower='08-13-2016', upper='08-14-2016')
plot = plt.suptitle('Aug 13, 2016 - Worst Predicted Day')
[ ]: f, ax = plt.subplots(1)
f.set_figheight(5)
f.set_figwidth(10)
_ = pjme_all[['MW_Prediction','PJME_MW']].plot(ax=ax,
style=['-','.'])
ax.set_ylim(0, 60000)
ax.set_xbound(lower='10-03-2016', upper='10-04-2016')
plot = plt.suptitle('Oct 3, 2016 - Best Predicted Day')
[ ]: f, ax = plt.subplots(1)
f.set_figheight(5)
f.set_figwidth(10)
_ = pjme_all[['MW_Prediction','PJME_MW']].plot(ax=ax,
style=['-','.'])
ax.set_ylim(0, 60000)
ax.set_xbound(lower='08-13-2016', upper='08-14-2016')
plot = plt.suptitle('Aug 13, 2016 - Worst Predicted Day')
11 Up next?
• Add Lag variables
• Add holiday indicators.
• Add weather data source.