0% found this document useful (0 votes)
110 views13 pages

Linear Regression Analysis of Energy Consumption Data

Uploaded by

vinaleema
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
110 views13 pages

Linear Regression Analysis of Energy Consumption Data

Uploaded by

vinaleema
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Linear Regression Analysis of Energy

Consumption Data
We get a lot of questions along the lines of "how do I do this using degree days?" It's very
common for the answers to involve linear regression analysis.

There are many text books and online resources that explain what linear regression analysis
is... But the theory can get a little heavy going... So we wrote this short article to explain just
the basics of regression analysis using energy consumption figures and degree days.

First things first, do you know what a degree day is?


Before diving into regression analysis, it rather helps if you understand what a degree day
actually is... This article provides a good introduction.

Getting the raw data


To do linear regression analysis, you need to correlate energy-consumption data with degree-
day data:

Getting the energy consumption data


You might have detailed interval data from a smart meter but more likely you'll have weekly
or monthly records of energy consumption that you've collected yourself, or energy bills from
a utility or energy supplier.

Most buildings follow a weekly routine, which means that weekly energy-consumption data
is typically a good option for regression analysis. Although the occupancy of the building
and the heating patterns might vary throughout the week, the patterns are usually fairly
consistent from one week to the next.

Monthly data is usually OK too, but it's rarely as good as weekly data, because the days of
the week don't line up with calendar months (e.g. one month might have 5 weekends, the next
might have 4). Unless your building is heated and used in the same way on weekends as it is
on weekdays, these calendar mismatches will cause inaccuracies in your calculations.

If you have detailed interval energy consumption data (typically readings taken automatically
every 60 minutes or less), you can use our Energy Lens software to turn it into daily, weekly,
or monthly kWh data.
You might not have much choice about what energy-consumption data you use for your
correlation. But you should try to get data for at least a few periods of measured energy
consumption. If you've got daily, weekly, or monthly data, try to cover at least one full
heating or cooling season. If you've got annual data, try to cover at least a few years of
consumption.

If you're using meter readings provided by your utility or energy supplier, make sure not to
use any estimated meter readings. Estimated readings are no use at all for this analysis!

Getting the degree-day data


If you're investigating heating energy consumption you'll want heating degree days; if you're
investigating cooling energy consumption you'll want cooling degree days.

You will need one degree-day figure for each period of measured energy consumption. If
your periods of measured energy consumption are irregular, you'll need to get daily degree
days and sum them together to make a total for each period.

Our Degree Days.net tool enables you to access data in a variety of timescales, including
daily data (which you can sum into figures matching any period spanning multiple days).

Correlating energy usage with degree days


Above we explained how to get the two sets of data (energy consumption and degree days).
Next you need to correlate these two sets of data...

The method that follows is ideal if your periods of measured energy


consumption are all the same length (like weeks, which are all 7 days
long)... But there's a slightly more complicated method that we'll
introduce shortly that works slightly better for monthly data and
that is usually necessary for regression analysis of irregular periods
of consumption.

Our explanation of the improved method is based on this one, so


please do go through this method first.
You might want to start by making 3 columns in Excel (or whatever spreadsheet software
package you have):

1. Start of period (you don't need this for the correlation, but it'll probably be useful for
keeping track of the data).

2. Degree days (either HDD or CDD, depending on whether you're investigating heating
or cooling energy consumption).

3. kWh (or BTU, or litres of oil, or whatever units your records of energy consumption
have).

For an example, see the screenshot to the right.

You can then use the second and third columns of data to plot an X-Y scatter chart of HDD
(or CDD) against energy consumption.

Enhancing the scatter chart


Once you've made the basic scatter chart, there are a few important extras that you'll probably
want: a trend line, an equation, and an R2 value. To get these using Excel:

1. Right-click one of the data points and select "Add Trendline...".

2. For the "Type" select "Linear" (we're doing linear regression analysis).

3. Check the boxes to "Display equation on chart" and "Display R-squared value on
chart". (In Excel 2003 and below you'll need to click the "Options" tab to find these
checkboxes.)

You should end up with something like the chart below:


What does the equation mean?
 The "y" corresponds to the kWh (or BTU, or whatever your units of energy
consumption are).

 The "x" corresponds to the degree days (HDD or CDD, whichever you used).

 The figure that multiplies the x (3.317 in the example chart above) represents the
gradient of the trend line.

 The constant at the end (53.505 in the example chart above) is the intercept. It
represents the point at which the trend line crosses the y axis. In theory this should
represent the "baseload energy consumption" (see here for more on this). Generally you
would expect this to be zero or positive (more on this later).

Most importantly the equation enables you to estimate kWh from degree days. By plugging
a known HDD or CDD figure into the equation you can calculate the predicted energy
consumption for the period that the HDD/CDD covered. You can then compare the predicted
energy consumption with the actual energy consumption for that period. You would typically
do this to see whether the energy efficiency has got better or worse than it was in the period
that you did the original regression analysis for.

What does the R2 show?


The R2 value is basically a measure of how good the correlation is. The closer the R2 value is
to 1, the better the correlation. A good correlation between degree days and energy
consumption indicates that the methodology is sound (the main pitfalls in degree-day analysis
have been avoided or corrected for), and that the heating/cooling system is working well (the
"control" of the system is good). In other words, the higher the R2, the better.
Generally speaking, an R2 of 0.75 indicates an reasonable correlation between energy
consumption and degree days. 0.9 or above is very good. An R2 much below 0.7 or so is
likely an indication that the heating control is either very poor, or that the analysis
methodology needs to be improved (e.g. wrong base temperature, irregular building
occupancy that hasn't been corrected for, heating/cooling metered together with other energy
consumption that varies considerably throughout the year).

The example chart above shows a pretty good correlation. In this instance you don't need the
R2 value to see that - it's clear from looking at the chart... But R2 is useful for assessing the
strength of a correlation objectively.

Using formulae to calculate the gradient, intercept, and R2, to help


determine the optimal base temperature
As explained in this article, the base temperature of the degree days makes a big difference to
how well they correlate with the energy consumption of any particular building.

The optimal base temperature varies from building to building. It's difficult to estimate the
correct base temperature accurately for any particular building using logic alone, so it can be
helpful to make a rough estimate and then try correlating kWh with degree days calculated to
various base temperatures around that point. R2 gives a way to compare the strength of the
different correlations.

In theory, the base temperature that produces the highest R2 should be the optimal base
temperature of the building. However, it doesn't always work out quite so perfectly, because
of the other factors that make degree-day-based analysis less than perfect. Nonetheless,
testing various base temperatures can give you a useful indication. It shouldn't replace your
intuition of what the base temperature should be, approximately, but it can help you to decide
on what exact number to use. Generally speaking, the better your correlations (the higher
your R2 values), the more faith you can reasonably place in the numbers.

The 3 relevant Excel functions are:

 SLOPE: this gives you the gradient of the trend line.

 INTERCEPT: this gives you the intercept of the trend line - in this case it should
represent the baseload energy consumption.

 RSQ: this gives you the R2 value.


What's great about these functions is that you can quickly apply them to multiple correlations,
using degree days with a range of base temperatures. Its a question of copying functions
across a spreadsheet rather than creating lots of individual charts.

The screenshot below shows a spreadsheet containing one set of energy-consumption data,
multiple sets of degree days (all with different base temperatures), and gradient, intercept,
and R2 values for each energy/degree-days correlation:

Using Excel's SLOPE, INTERCEPT, and RSQ functions to compare correlations with
different base temperatures

The above spreadsheet can be a little overwhelming on first glance, but it's clearer when
broken down into steps:

1. Make a rough estimate of the building's base temperature (you can find some basic
information and guidance here).

2. Select your estimated base temperature on Degree Days.net, check the "Include base
temperatures nearby" box, and generate and download the degree days. You'll notice
that the data has been calculated to a range of base temperatures either side of your
estimate. It'll look a lot like the spreadsheet above, but without the blue cells, which are
the ones we added in ourselves (of course you don't need to make them blue unless you
want to).
3. Add your energy-consumption figures to the right of the last column of degree days.
(The energy-consumption periods need to match the periods that you generated the
degree days for.)

4. Under the last column of degree days, use the SLOPE formula, selecting your energy-
consumption values as the known_y's (these are the values in column "O" in the
screenshot above).

5. After selecting the known_y's, hit F4 to make Excel insert $ symbols in front of the
row and column references. The $ symbols "fix" the referenced cells so that you can
copy the formula without the referenced cells changing. (Strictly speaking you only need
the $ symbols in front of the column references, to fix the column on the energy data,
but in this instance it doesn't hurt to have them in front of the row references as well.)

6. Next, for the known_x's part of the formula, select the degree days from the last
column of degree-day data (column "N" in the screenshot above). You don't want any $
symbols here.

7. You should have a formula something like the one you can see at the top of the image
above.

8. Now, take the cell that you entered the formula into, and copy and paste it across the
row, filling the cells under each column of degree-day data.

9. Do similarly in the two rows underneath, using the INTERCEPT and RSQ functions
instead of SLOPE. You need to specify the known_y's and known_x's in the same way
for all three functions.

10. You should now be able to see the gradient, intercept (baseload), and R2 value for
each base temperature's correlation. You can use these values to help you decide upon
the base temperature to use for the building.

If you're new to using $ symbols and functions/formulas in Excel, it's likely that you found
some of the steps above a little confusing. It's well worth taking the time to learn how those
features of Excel work - they make it possible to do all sorts of things in seconds that would
take minutes or hours otherwise.

Interpreting the intercept (baseload) and R2 values


The intercept (baseload) should be roughly zero or positive. For base temperatures that are
too high, you may see a negative intercept (so you can take that as an indication that you
should be looking towards lower base temperatures).

Things get more messy when the heating or cooling consumption that you are analyzing is
metered together with other energy uses. If those other energy uses are significant, you
should expect a significant baseload energy consumption (positive intercept). If you have a
good idea of how much energy those other energy uses consume, you can compare your
predicted figure with the intercept values to look for the base temperatures with intercepts
that fit your expectations.

Do bear in mind that "baseload" is a fuzzy concept, and it often varies throughout the year
(meaning it's not really a "baseload" at all). Analysis is less precise when heating and cooling
aren't metered separately from each other and from everything else. Do what you can with the
figures you have, but don't be surprised if the numbers don't line up as neatly as you hope.

Also take a look at the R2 values. In theory, the base temperature of the building should be
the base temperature with the highest R2 value. That's in theory though... In reality the
various inaccuracies in degree-day-based analysis tend to muddy things up, and can cause
misleading figures. Use the R2 values as an indication rather than an absolute, especially if
your correlations aren't strong (i.e. low R2 values across the board).

If the numbers indicate that the optimal base temperature might be higher or lower than the
range that you have tested, you should probably download more degree days with
higher/lower base temperatures so that you can include them in your analysis.

Improved correlation method for irregular data


Correlating energy usage with degree days, as described above, works well when all the
energy-consumption records cover identical periods of time. It's ideal for linear regression
analysis of daily or weekly data.

However, the above method doesn't work properly for irregular periods of consumption, like
those gathered from records of oil deliveries...

What's wrong with correlating energy usage with degree days


covering irregular periods?
The problem is with the baseload energy consumption. The method above assumes that the
baseload is a constant number, but this assumption only makes sense if the periods of
consumption are all the same length.

When records of energy consumption cover periods of various lengths, the baseload energy
consumption depends on the lengths in question. If the baseload is 20 kWh for a 1 week
period, it will be 40 kWh for a 2 week period, and 60 kWh for a 3 week period.

Baseload energy consumption can't be expressed as a constant unless


the length of the period is also a constant.

(If the above statement doesn't make sense, you might be confusing kWh with kW... Many
people do! If you're in any doubt, take a look at our article on kW and kWh - it explains both
units in detail.)

Because different months have different lengths, using a constant figure for baseload kWh
causes slight inaccuracies in correlations of monthly energy consumption with monthly
degree days. The more irregular your consumption records, the greater these inaccuracies
become.

Fortunately there's another approach that works just as well for irregular data as it does for
regular data:

The slightly-more-complicated solution that works well for


irregular data
Instead of correlating energy consumption with degree days,
correlate energy consumption per day with degree days
per day.

To explain this, let's consider the example data that we used previously... Previously we
simply correlated the kWh with the HDD, but the improved method involves a correlation of
the kWh per day with the HDD per day... Here's how we might arrange this data in a
spreadsheet:
Let's explain each column in turn:

A. The start date of each period, as before. We've added on an extra date at the bottom
because it makes it easier for us to calculate the number of days in the last period (see
the explanation for column D).

B. The HDD of each period, as before.

C. The kWh of each period, as before.

D. The number of days in each period. This is easy to calculate in Excel... To calculate
the number of days in the period in row 2, take the start date of the period in row 3, and
subtract the start date of the period in row 2. The screenshot above shows the formula at
the top so you can see what's going on. Once you've calculated the first value, you can
copy the formula right down the column to calculate the number of days in each period.

E. The average HDD-per-day for each period. Calculate this by dividing the HDD by the
number of days.

F. The average kWh-per-day for each period. Calculate this by dividing the kWh by the
number of days.

Once we have the figures we can create a scatter chart, just like before, except with HDD-
per-day and kWh-per-day instead of HDD and kWh:
Like before, we can also add a trendline and the equation of that trendline (see the chart
above).

What does the equation mean now?


 The "y" corresponds to the average kWh-per-day (or average BTU per day, or average
whatever-your-units-of-energy-consumption-are per day). As before, this includes both
weather-dependent and non-weather-dependent (baseload) energy consumption.

 The "x" corresponds to the average degree days per day.

 The figure that multiplies the x (3.3333 in the example chart above) represents the
gradient of the trend line.

 The constant at the end (1.6813 in the example chart above) represents the point at
which the trend line crosses the y axis. In theory this should now be equivalent to the
average baseload energy consumption per day.

This equation is very similar to the one described earlier, and you can apply it similarly. Just
remember that x, y, and the constant, are per-day figures. Once you've calculated the energy
consumption per day from an HDD-per-day figure, you can of course multiply it by the
number of days in the period to work out the predicted kWh over the whole of the period.

Using this modified method to help determine the optimal base


temperature
We have already explained the process for investigating the effect of base temperature on the
regression analysis (see here). We used kWh figures and HDD figures above, but it's easy to
apply the same approach using kWh-per-day figures and HDD-per-day figures. We need to
insert a few additional steps between steps 3 and 4 above:
 3b) Calculate the number of days in each of our recorded periods (each month in our
example).

 3c) Translate the HDD figures into HDD-per-day figures.

 3d) Translate the kWh figures into kWh-per-day figures.

 3e) Follow the remaining steps described above, but using the calculated HDD-per-
day figures and kWh-per-day figures instead of the original HDD and kWh figures.

The screenshot below shows one way in which you could organize the data. It might be a
little neater to put the HHD/day and kWh/day figures to the right of the original figures, but
putting them below makes it easier to fit them all in a screenshot:
Using Excel's SLOPE, INTERCEPT, and RSQ functions to compare correlations of
kWh-per-day with HDD-per-day across a range of base temperatures

In Excel, the trick to calculating HHD/day figures across all periods and base temperatures is
to:

 Calculate the HHD/day for the first period and the lowest base temperature, using
carefully-placed $ references so that the formula will copy well across multiple columns
(base temperatures) and rows (periods).

 Copy that cell across all base temperatures and periods.

In the example above, cell B22 contained the formula "= B8 / $P8". This meant that the
"Days" column P was fixed in the formula, but everything else in the formula was relative. So
copying the formula across the base temperatures and down the periods worked as desired.

Provided you understood the previous examples, you should hopefully find this slightly-
modified method pretty straightforward. But please let us know if anything is unclear. We
appreciate that these instructions might be a little intimidating to anyone unfamiliar with
Excel formulas and so on, but we're trying to make them as accessible as possible!

Specify and Generate Your Degree Days Now

You might also like