Linear Regression Analysis of Energy Consumption Data
Linear Regression Analysis of Energy Consumption Data
Consumption Data
We get a lot of questions along the lines of "how do I do this using degree days?" It's very
common for the answers to involve linear regression analysis.
There are many text books and online resources that explain what linear regression analysis
is... But the theory can get a little heavy going... So we wrote this short article to explain just
the basics of regression analysis using energy consumption figures and degree days.
Most buildings follow a weekly routine, which means that weekly energy-consumption data
is typically a good option for regression analysis. Although the occupancy of the building
and the heating patterns might vary throughout the week, the patterns are usually fairly
consistent from one week to the next.
Monthly data is usually OK too, but it's rarely as good as weekly data, because the days of
the week don't line up with calendar months (e.g. one month might have 5 weekends, the next
might have 4). Unless your building is heated and used in the same way on weekends as it is
on weekdays, these calendar mismatches will cause inaccuracies in your calculations.
If you have detailed interval energy consumption data (typically readings taken automatically
every 60 minutes or less), you can use our Energy Lens software to turn it into daily, weekly,
or monthly kWh data.
You might not have much choice about what energy-consumption data you use for your
correlation. But you should try to get data for at least a few periods of measured energy
consumption. If you've got daily, weekly, or monthly data, try to cover at least one full
heating or cooling season. If you've got annual data, try to cover at least a few years of
consumption.
If you're using meter readings provided by your utility or energy supplier, make sure not to
use any estimated meter readings. Estimated readings are no use at all for this analysis!
You will need one degree-day figure for each period of measured energy consumption. If
your periods of measured energy consumption are irregular, you'll need to get daily degree
days and sum them together to make a total for each period.
Our Degree Days.net tool enables you to access data in a variety of timescales, including
daily data (which you can sum into figures matching any period spanning multiple days).
1. Start of period (you don't need this for the correlation, but it'll probably be useful for
keeping track of the data).
2. Degree days (either HDD or CDD, depending on whether you're investigating heating
or cooling energy consumption).
3. kWh (or BTU, or litres of oil, or whatever units your records of energy consumption
have).
You can then use the second and third columns of data to plot an X-Y scatter chart of HDD
(or CDD) against energy consumption.
2. For the "Type" select "Linear" (we're doing linear regression analysis).
3. Check the boxes to "Display equation on chart" and "Display R-squared value on
chart". (In Excel 2003 and below you'll need to click the "Options" tab to find these
checkboxes.)
The "x" corresponds to the degree days (HDD or CDD, whichever you used).
The figure that multiplies the x (3.317 in the example chart above) represents the
gradient of the trend line.
The constant at the end (53.505 in the example chart above) is the intercept. It
represents the point at which the trend line crosses the y axis. In theory this should
represent the "baseload energy consumption" (see here for more on this). Generally you
would expect this to be zero or positive (more on this later).
Most importantly the equation enables you to estimate kWh from degree days. By plugging
a known HDD or CDD figure into the equation you can calculate the predicted energy
consumption for the period that the HDD/CDD covered. You can then compare the predicted
energy consumption with the actual energy consumption for that period. You would typically
do this to see whether the energy efficiency has got better or worse than it was in the period
that you did the original regression analysis for.
The example chart above shows a pretty good correlation. In this instance you don't need the
R2 value to see that - it's clear from looking at the chart... But R2 is useful for assessing the
strength of a correlation objectively.
The optimal base temperature varies from building to building. It's difficult to estimate the
correct base temperature accurately for any particular building using logic alone, so it can be
helpful to make a rough estimate and then try correlating kWh with degree days calculated to
various base temperatures around that point. R2 gives a way to compare the strength of the
different correlations.
In theory, the base temperature that produces the highest R2 should be the optimal base
temperature of the building. However, it doesn't always work out quite so perfectly, because
of the other factors that make degree-day-based analysis less than perfect. Nonetheless,
testing various base temperatures can give you a useful indication. It shouldn't replace your
intuition of what the base temperature should be, approximately, but it can help you to decide
on what exact number to use. Generally speaking, the better your correlations (the higher
your R2 values), the more faith you can reasonably place in the numbers.
INTERCEPT: this gives you the intercept of the trend line - in this case it should
represent the baseload energy consumption.
The screenshot below shows a spreadsheet containing one set of energy-consumption data,
multiple sets of degree days (all with different base temperatures), and gradient, intercept,
and R2 values for each energy/degree-days correlation:
Using Excel's SLOPE, INTERCEPT, and RSQ functions to compare correlations with
different base temperatures
The above spreadsheet can be a little overwhelming on first glance, but it's clearer when
broken down into steps:
1. Make a rough estimate of the building's base temperature (you can find some basic
information and guidance here).
2. Select your estimated base temperature on Degree Days.net, check the "Include base
temperatures nearby" box, and generate and download the degree days. You'll notice
that the data has been calculated to a range of base temperatures either side of your
estimate. It'll look a lot like the spreadsheet above, but without the blue cells, which are
the ones we added in ourselves (of course you don't need to make them blue unless you
want to).
3. Add your energy-consumption figures to the right of the last column of degree days.
(The energy-consumption periods need to match the periods that you generated the
degree days for.)
4. Under the last column of degree days, use the SLOPE formula, selecting your energy-
consumption values as the known_y's (these are the values in column "O" in the
screenshot above).
5. After selecting the known_y's, hit F4 to make Excel insert $ symbols in front of the
row and column references. The $ symbols "fix" the referenced cells so that you can
copy the formula without the referenced cells changing. (Strictly speaking you only need
the $ symbols in front of the column references, to fix the column on the energy data,
but in this instance it doesn't hurt to have them in front of the row references as well.)
6. Next, for the known_x's part of the formula, select the degree days from the last
column of degree-day data (column "N" in the screenshot above). You don't want any $
symbols here.
7. You should have a formula something like the one you can see at the top of the image
above.
8. Now, take the cell that you entered the formula into, and copy and paste it across the
row, filling the cells under each column of degree-day data.
9. Do similarly in the two rows underneath, using the INTERCEPT and RSQ functions
instead of SLOPE. You need to specify the known_y's and known_x's in the same way
for all three functions.
10. You should now be able to see the gradient, intercept (baseload), and R2 value for
each base temperature's correlation. You can use these values to help you decide upon
the base temperature to use for the building.
If you're new to using $ symbols and functions/formulas in Excel, it's likely that you found
some of the steps above a little confusing. It's well worth taking the time to learn how those
features of Excel work - they make it possible to do all sorts of things in seconds that would
take minutes or hours otherwise.
Things get more messy when the heating or cooling consumption that you are analyzing is
metered together with other energy uses. If those other energy uses are significant, you
should expect a significant baseload energy consumption (positive intercept). If you have a
good idea of how much energy those other energy uses consume, you can compare your
predicted figure with the intercept values to look for the base temperatures with intercepts
that fit your expectations.
Do bear in mind that "baseload" is a fuzzy concept, and it often varies throughout the year
(meaning it's not really a "baseload" at all). Analysis is less precise when heating and cooling
aren't metered separately from each other and from everything else. Do what you can with the
figures you have, but don't be surprised if the numbers don't line up as neatly as you hope.
Also take a look at the R2 values. In theory, the base temperature of the building should be
the base temperature with the highest R2 value. That's in theory though... In reality the
various inaccuracies in degree-day-based analysis tend to muddy things up, and can cause
misleading figures. Use the R2 values as an indication rather than an absolute, especially if
your correlations aren't strong (i.e. low R2 values across the board).
If the numbers indicate that the optimal base temperature might be higher or lower than the
range that you have tested, you should probably download more degree days with
higher/lower base temperatures so that you can include them in your analysis.
However, the above method doesn't work properly for irregular periods of consumption, like
those gathered from records of oil deliveries...
When records of energy consumption cover periods of various lengths, the baseload energy
consumption depends on the lengths in question. If the baseload is 20 kWh for a 1 week
period, it will be 40 kWh for a 2 week period, and 60 kWh for a 3 week period.
(If the above statement doesn't make sense, you might be confusing kWh with kW... Many
people do! If you're in any doubt, take a look at our article on kW and kWh - it explains both
units in detail.)
Because different months have different lengths, using a constant figure for baseload kWh
causes slight inaccuracies in correlations of monthly energy consumption with monthly
degree days. The more irregular your consumption records, the greater these inaccuracies
become.
Fortunately there's another approach that works just as well for irregular data as it does for
regular data:
To explain this, let's consider the example data that we used previously... Previously we
simply correlated the kWh with the HDD, but the improved method involves a correlation of
the kWh per day with the HDD per day... Here's how we might arrange this data in a
spreadsheet:
Let's explain each column in turn:
A. The start date of each period, as before. We've added on an extra date at the bottom
because it makes it easier for us to calculate the number of days in the last period (see
the explanation for column D).
D. The number of days in each period. This is easy to calculate in Excel... To calculate
the number of days in the period in row 2, take the start date of the period in row 3, and
subtract the start date of the period in row 2. The screenshot above shows the formula at
the top so you can see what's going on. Once you've calculated the first value, you can
copy the formula right down the column to calculate the number of days in each period.
E. The average HDD-per-day for each period. Calculate this by dividing the HDD by the
number of days.
F. The average kWh-per-day for each period. Calculate this by dividing the kWh by the
number of days.
Once we have the figures we can create a scatter chart, just like before, except with HDD-
per-day and kWh-per-day instead of HDD and kWh:
Like before, we can also add a trendline and the equation of that trendline (see the chart
above).
The figure that multiplies the x (3.3333 in the example chart above) represents the
gradient of the trend line.
The constant at the end (1.6813 in the example chart above) represents the point at
which the trend line crosses the y axis. In theory this should now be equivalent to the
average baseload energy consumption per day.
This equation is very similar to the one described earlier, and you can apply it similarly. Just
remember that x, y, and the constant, are per-day figures. Once you've calculated the energy
consumption per day from an HDD-per-day figure, you can of course multiply it by the
number of days in the period to work out the predicted kWh over the whole of the period.
3e) Follow the remaining steps described above, but using the calculated HDD-per-
day figures and kWh-per-day figures instead of the original HDD and kWh figures.
The screenshot below shows one way in which you could organize the data. It might be a
little neater to put the HHD/day and kWh/day figures to the right of the original figures, but
putting them below makes it easier to fit them all in a screenshot:
Using Excel's SLOPE, INTERCEPT, and RSQ functions to compare correlations of
kWh-per-day with HDD-per-day across a range of base temperatures
In Excel, the trick to calculating HHD/day figures across all periods and base temperatures is
to:
Calculate the HHD/day for the first period and the lowest base temperature, using
carefully-placed $ references so that the formula will copy well across multiple columns
(base temperatures) and rows (periods).
In the example above, cell B22 contained the formula "= B8 / $P8". This meant that the
"Days" column P was fixed in the formula, but everything else in the formula was relative. So
copying the formula across the base temperatures and down the periods worked as desired.
Provided you understood the previous examples, you should hopefully find this slightly-
modified method pretty straightforward. But please let us know if anything is unclear. We
appreciate that these instructions might be a little intimidating to anyone unfamiliar with
Excel formulas and so on, but we're trying to make them as accessible as possible!