0% found this document useful (0 votes)
62 views

Building Predictive Models For Dynamic Line Rating Using Data - Github

This document is a master's thesis that builds predictive models for dynamic line ratings using data science techniques. The thesis aims to predict dynamic line rating capacity on two days ahead and one day ahead using machine learning models. It analyzes weather and transmission line data to calculate dynamic line ratings historically and for forecasted time periods. Models were built using techniques like polynomial regression, generalized additive models, artificial neural networks, and support vector machines. The best performing models achieved error rates of 9% for one day ahead predictions and 11% for two days ahead, meeting the project requirements. The models and analysis can help increase renewable energy integration and transmission efficiency.

Uploaded by

malekpour_ahmad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views

Building Predictive Models For Dynamic Line Rating Using Data - Github

This document is a master's thesis that builds predictive models for dynamic line ratings using data science techniques. The thesis aims to predict dynamic line rating capacity on two days ahead and one day ahead using machine learning models. It analyzes weather and transmission line data to calculate dynamic line ratings historically and for forecasted time periods. Models were built using techniques like polynomial regression, generalized additive models, artificial neural networks, and support vector machines. The best performing models achieved error rates of 9% for one day ahead predictions and 11% for two days ahead, meeting the project requirements. The models and analysis can help increase renewable energy integration and transmission efficiency.

Uploaded by

malekpour_ahmad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

EXAMENSARBETE INOM ELEKTROTEKNIK,

AVANCERAD NIVÅ, 30 HP
STOCKHOLM, SVERIGE 2016

Building predictive models for


dynamic line rating using data
science techniques

NICOLAE DOBAN

KTH
SKOLAN FÖR ARKITEKTUR OCH SAMHÄLLSBYGGNAD
TRITA TRITA-EE 2016:059

www.kth.se
ABSTRACT

The traditional power systems are statically rated and sometimes renewable
energy sources (RES) are curtailed in order not to exceed this static rating. The RES
are curtailed because of their intermittent character and therefore, it is difficult to
predict their output at specific time periods throughout the day. Dynamic Line Rating
(DLR) technology can overcome this constraint by leveraging the available weather
data and technical parameters of the transmission line.
The main goal of the thesis is to present prediction models of Dynamic Line
Rating (DLR) capacity on two days ahead and on one day ahead. The models are
evaluated based on their error rate profiles. DLR provides the capability to up-rate the
line(s) according to the environmental conditions and has always a much higher
profile than the static rating. By implementing DLR a power utility can increase the
efficiency of the power system, decrease RES curtailment and optimize their
integration within the grid.
DLR is mainly dependent on the weather parameters and specifically, in large
wind speeds and low ambient temperature, the DLR can register the highest profile.
Additionally, this is especially profitable for the wind energy producers that can both,
produce more (until pitch control) and transmit more in high wind speeds periods
with the same given line(s), thus increasing the energy efficiency.
The DLR was calculated by employing modern Data Science and Machine
Learning tools and techniques and leveraged historical weather and transmission line
data provided by SMHI and Vattenfall respectively. An initial phase of Exploratory
Data Analysis (EDA) was developed to understand data patterns and relationships
between different variables, as well as to determine the most predictive variables for
DLR. All the predictive models and data processing routines were built in open source
R and are available on GitHub.
There were three types of models built: for historical data, for one day-ahead
and for two days-ahead time-horizons. The models built for both time-horizons
registered a low error rate profile of 9% (for day-ahead) and 11% (for two days-
ahead). As expected, the predictive models built on historical data were more accurate
with an error as low as 2%-3%.
In conclusion, the implemented models met the requirements set by
Vattenfall of maximum error of 20% and they can be applied in the control room for
that specific line. Moreover, predictive models can also be built for other lines if the
required data is available. Therefore, this Master Thesis project’s findings and
outcomes can be reproduced in other power lines and geographic locations in order to
achieve a more efficient power system and an increased share of RES in the energy
mix.
Keywords. Dynamic Line Rating, Data Science, Exploratory Data
Analysis, Predictive Modeling, Energy Efficiency, Renewable Energy
Sources, Power system planning and operations , Reproducible
Contents
Abstract ....................................................................................................................................2
Keywords. ................................................................................................................................3
1 Introduction....................................................................................................................6
1.1 Background ...........................................................................................................6
1.2 Naum pilot project [11] .......................................................................................8
1.3 DLR impacts .........................................................................................................9
1.4 Purpose of the Master Thesis ......................................................................... 10
1.5 Goals and Objectives of the Master Thesis .................................................. 10
1.6 Delimitation of study ........................................................................................ 11
2 Literature Research..................................................................................................... 12
2.1 Overview ............................................................................................................. 12
2.2 The papers researched ...................................................................................... 12
3 Method and Material .................................................................................................. 15
3.1 Data Vizualization ............................................................................................. 15
3.2 Exploratory data analysis ................................................................................. 15
3.3 DLR calculation according to IEEE standard ............................................. 16
3.4 DLR calculation for summer period .............................................................. 17
3.5 Data Collected from Vattenfall ....................................................................... 18
3.6 Data collected from SMHI .............................................................................. 18
3.7 Data Cleansing, Curation and Logic .............................................................. 19
3.8 Predictive analysis.............................................................................................. 20
Polynomial regression ............................................................................................... 20
General Additive Model (GAM) ............................................................................. 21
Artificial Neural Network (ANN)........................................................................... 21
Support vector machines (SVM) ............................................................................. 21
4 Visualizaton of EDA ................................................................................................. 23
4.1 Temporal variation ............................................................................................ 23
4.2 Scatterplots ......................................................................................................... 27
4.3 Correlation analysis ........................................................................................... 28
4.4 Time-resolution comparison ........................................................................... 30
4.5 Conclusions on EDA ....................................................................................... 31
5 Predictive Modeling ................................................................................................... 32
5.1 Preamble ............................................................................................................. 32
5.2 Building and validating the predictive models ............................................. 32
5.3 Predictive Models for historical observations .............................................. 33
5.4 Predictive Models considering the forecasted data ..................................... 36
EDA for forecasted data .......................................................................................... 37
Predictive Models for day and two-day ahead in advance .................................. 38
5.5 Synthesis of Predictive Modeling ................................................................... 43
6 Discussions .................................................................................................................. 45
6.1 Predicting NormalDLR from forecasted data ............................................. 45
6.2 Predicting NormalDLR from Historical data .............................................. 46
6.3 improving models’ precisions ......................................................................... 46
7 Conclusions ................................................................................................................. 48
7.1 Outcomes ........................................................................................................... 48
7.2 Reproducibility ................................................................................................... 49
8 Suggestions on Future Work .................................................................................... 50
9 References .................................................................................................................... 51
1 INTRODUCTION
1.1 BACKGROUND
Sweden has one of the best geographical conditions on an worldwide scale for leveraging renewable energy and it has a
minimum annual renewable energy share for electricity production of approximately 45% [1] [2] (48% according to [3]).
However, IEA’s statistics show that since 2005 the renewable share for electricity production amounts for an average of
55% [4]. In the past, hydro-power was the main contributor but now the government wants to fulfill country's wind
energy potential [5]. However, adding wind turbines’ intermittent capacity will influence system's operation with the risk
of overloading them.

Since the first electrical power line and system were built, the electrical power sector has grown to an enormous level of
intensity, until recently with little regard towards the environmental impact of this growth. Thus the electrical power
sector, along with other industries, contributed to significant environmental changes. One of those environmental
changes is the global warming of the planet.

There’re many initiatives world-wide to address the issue of global warming. For example in Europe, the European
Commission set the 2020 goals to limit the temperature rise to 2oC (relative to the year of 1990) by limiting the CO2
emissions to 450 ppm in the EU. According to them, the emissions of CO2 should decrease by 20%, the efficiency of
the power system should increase by 20% and the renewable energy sources’ (RES) share in the energy mix should
increase by 20%. They are planning to achieve that by reducing the EU greenhouse gas emissions by 20% from 1990
levels, raising the share of EU energy consumption produced from renewable resources to 20%, and improving the EU's
energy efficiency by 20%.The reference year is 1990. To achieve these goals, EU must focus more on incentivizing and
investing in RES research and development. [6] [7]

Evidently, countries will be required to install more RES capacity that should integrate seamlessly with the current
energy technologies (energy producing, transmission & distribution, consumption and storage equipment). Also once
installed, the RES producing units must operate at their full potential because otherwise the energy produced by RES
would be lost/in vain. The wind turbines are not installed in the cities due to their low public acceptance while the PV
panels can be found very often on the rooftops of the city buildings and in the countryside. Once known as
“consumers”, a new actor in the power market has emerged and is called “prosumers”- which is a word-combination of
“producer” and “consumers”. The prosumers represent sparse distributed energy production and consumption units i.e.
they inject and absorb power from different nodes of the network, in various geographical locations, which in turn
increases the load of the lines. The emergence of this new actor in the power market imposed several changes in the
power flow and power network’s operation. Namely, connecting RES units to the traditional power networks made the
power flow to become bidirectional and that in turn caused the need for adaptive algorithms for protection schemes, as
well as re-adjustments of lines' capacities.

Additional challenges for power systems have to do with connecting RES to the power grid. RES cast problems in the
planning, operation and optimization of the power system. A thorough and comprehensive analysis of RES connection
must be undertaken and there are several considerations that must be taken into account. The intermittency of the RES
dictates the need for designing and sizing an appropriate storage system, so that power system would not collapse under
a cloudy or no-windy weather conditions. Also, an optimized system operation between different energy producing,
transmission & distribution, storage and consuming technologies must be assured at all times with a high reliability,
efficiency and safety levels.

Other challenges caused by the intermittent nature of RES relate to optimal power planning and operation, which
influence the overall power market. There are efforts channeled towards predicting the power production profiles from
wind and PV, but the weather prognosis inaccuracy makes the RES prediction not reliable. Furthermore, the problem
with the congested power lines tends to curtail some of the extra energy produced by RES at some points in time and
that clearly contradicts the 2020 goals. The RES curtailment occurs at specific points in time when the produced energy
from RES exceeds the momentary system load and therefore, the operator cannot do anything else than curtail the
energy from RES, i.e. reduce the energy output from RES. Hence, an increased transmission capacity is needed to be
installed in order to transmit and distribute the additional energy coming from the intermittent RES units.

Traditional power systems are built for the worst case scenarios, either very high or very low temperatures. [8] Given the
intrinsic mechanical and physical properties of the conductors and equipment's elements and components, the power
6
systems are over-designed to withstand those worst case scenarios' conditions.. The worst case scenario gives the input
parameters to calculate a static rating. Such input standard parameters can be: air temperature of 25-35oC, 0.5 m/s of
perpendicular wind and 1000 W/m2 of solar irradiation. [8] These parameters are time-dependent. In response, the
rating of a line is a seasonal quasi-constant parameter for the most of the power systems which only can change for the
different seasons: winter, summer and spring/fall. Thus, the traditionally rated overdesigned power systems operate
within the safe operational range or, sometimes, much more below in order not to jeopardize network's stability and
reliability.

Dynamic Line Rating (DLR) represents a real-time or forecasted change in the line’s/system’s capacity according to its
physical and mechanical properties, weather and temporal parameters. DLR can be used to optimize the power flow of
RES, so that its curtailment will be minimal or avoided entirely because of the increased line capacity. Also, due to the
intermittent character of RES, DLR can help into increasing grid’s capacity factor in sunny or windy weather since the
wind has a larger influence on the DLR than the solar radiation [8]. That would lead to an increased share of subsidized
RES in the energy mix, thus lowering the overall electricity prices. Also, the power systems using DLR would be better
managed and perform in peak-load periods allowing for an optimized operation, with higher efficiency, reliability and
lower grid cost operation. Additionally with increased RES share in the energy mix, the fossil fuels’ share will decrease
allowing for a “cleaner” energy generation mix with less emissions.

Another consequence of introducing DLR, is that increasing the system’s capacity would allow to defer the need of
expanding the current power systems, bringing significant capital and operational savings to power utilities. Avoiding
capital expenditures is also important considering that the RES lead to overall lower electricity prices, so a positive ROI
(Return On Investment) of any capacity expansion project will be more difficult to obtain in this environment i.e. even
though if there were plans to connect more RES units to the grid (with the goal of decreasing CO2 levels and electricity
prices) the capital expenses would have to account also for the new equipment, generation units, regulatory costs, etc.
which could increase the payback time of the investment unlike implementing DLR technology which does not require
any grid expansion. Additionally, the power utilities would avoid a decrease in system’s reliability that would be
introduced by any grid expansion, which would entail installation of new lines, transformers and other system equipment
i.e. the system relies on the operational reliability of its components so, the more components there are in the system –
the more likely it will fail meaning the system with N-components will be more reliable than the system with N+1
components.

The current on a line is determined by the capacity which is determined by the production and consumption. The line
capacity is influenced by both, physical and mechanical parameters (temperature and sag) for which the line was (over-
designed to withstand worst weather conditions (highest and lowest temperature).

Past and present (with some exceptions) practices show that transmission systems (lines, cables and equipment) are
operated with static ratings, but in reality the rating is not constant. In reality, the rating depends on the environment
conditions which influence the transmission system's performance. More specifically, the rating is determined by the
highest conductor temperature without deterioration and the lowest height of conductor from the ground (sag). An
increase in conductor temperature will cause its expansion and consequently it will approach the ground (sag). Also, the
phenomena of creep and load cause line damage. Therefore, by design, the power lines and poles are built in such a way
that the sag will not exceed the limit under the highest conductor temperature. The air temperature around the
conductor influences the current intensity that flows through the conductor, and that in turn determines the conductor
temperature and line sag. Consequently, the colder the air the larger the current that can flow through the line.
Extracting heat from the conductor is influenced mostly by wind direction and speed and less by sun irradiation, air
temperature. [8]

Dynamic Line Rating (DLR) represents the possibility of up-rating the existing power lines based on real-time or
predicted weather conditions with the aim to increase the power transmitted and therefore, the efficiency. Static ratings
represent a low percentage of the lines' limit capacity and DLR helps TSOs and DSOs to increase this percentage
without damaging and jeopardizing the reliable and efficient grid operation.
Temperature in Sweden can drop to temperatures well below 0oC [9] which make it a very good candidate for
implementing DLR. Since the line capacity is determined by its temperature and consequently by its sag, a line can have a
higher rating in a colder environment. Also, the wind speed and direction will influence the rating: a perpendicular cold
wind to the line will cool down the conductor. Studies illustrate an exponential dependency between the radial
temperature difference and wind attack angle: the larger the wind attack angle the bigger the radial temperature
difference [10]. Hence, a low ambient temperature with perpendicular wind attack angle will enable transporting a larger
7
capacity on lines. Also since the low temperatures cause a higher line capacity, it would be possible to transport more
energy produced by the wind turbines (and thus increasing its load factors) given wind speeds equal or above the wind
turbines’ thresholds. In the present project it was found that the wind speed are not highly correlated with the air
temperature (Figure 13)

It’s important to note that line’s lifetime is not affected by transmitting an additional capacity as long as the limits are not
violated, so those should be monitored and controlled by specialized equipment. The primordial limit is the thermal
conductor threshold: it should not be exceeded and the natural conductor convection cooling is helping this
phenomenon. [8]
There were considered two time-horizons for making the predictions: one- and two-days ahead predicionts. Both the
day-ahead and the two-day ahead predictions would allow the power market actors to calculate its supply, demand, end-
of-day electricity price and quantity based on the predicted and updated merit order. This in return would hedge the
energy consumption risk and the predicted power could be also used as an ancillary service. The more predicted power
generated by the wind turbines would decrease the emissions that would have been produced by the fossil fuels
technologies. Knowing one and two days in advance the power flows on a line could increase its operational reliability
and the overall, systems stability and security.
The wind turbine’s technical-economic and power market’s financial aspects could be taken into consideration in a more
proper way given the day-ahead and the two-day ahead DLR predictions. And of course, increasing the line rating would
increase the efficiency of the line and of the entire power system to a lesser extent (because the there is only one line
monitored) given that the transported power increases on the same power line. The increased system efficiency would
cancel out or postpone in time the capital and operational expenses of the new lines and components to be installed.
1.2 NAUM PILOT PROJECT [11]
Vattenfall, through its Naum pilot project, wanted to assess the performance and the potential impacts of deploying
DLR on a 44 kV overhead line which connected a large installed wind capacity of 80 MW to the power grid. Specifically,
Vattenfall wanted to assess the DLR potential in the high-wind speed periods of time since they have the largest impact
on the conductor temperature and large energy amounts can be produced by the wind turbines in those time-intervals.

Further wind turbine installations would impose congestion threats and to avoid them, the power grid infrastructure
should be updated in that region which would result in larger investment, operational and maintenance costs. The
deployment of wind power turbines is also constrained by the facilities located in the neighborhood such as airports

Vattenfall found that installing additional wind turbines would exceed the present line's capacity in the high-wind speed
time intervals with decreased load. This wouldn't be the case if the wind cooling effect had been calculated. Moreover,
the wind-cooling effect would impact the conductor temperature and directly - the line rating constantly as opposed to
exceeding line's capacity, which rarely happens on an annual basis.

In order to successfully assess the potential of DLR on an overhead line, Vattenfall has set a number of prerequisites:
• The sensor should be able to connect to the power lines without shutting these lines down
• The sensor must measure the weather characteristics and compute the DLR
• The communication infrastructure must securely and reliably transmit the datastreams of the measured and
computed parameters to Vattenfall's control center.

Therefore, Vattenfall installed the line sensor and weather station sensor (USi, http://www.usi-power.com/) in the
hottest point on the line where the sun radiation is maximum and the wind impact is the lowest i.e. where the line is
subjected to the maximum thermal, and therefore mechanical, stress. Additionally, the sensor connection to the line does
not intrude in the line's operational reliability and moreover, is guaranteed by the low-voltage link and the
communication system is available.

The line sensor and weather station collects and transmits to the control room the measured and the calculated
parameters. The measured parameters are: the weather parameters (solar radiation, air temperature, wind speed and
direction) as well as the operational related ones (conductor temperature and load current). The calculated parameter is
the DLR itself calculated using the IEEE-738 standard. [12]

8
Analyzing the data from the Naum project, it was found that the nominal rating of the overhead line was several times
exceeded due to the wind-cooling contribution. The highest ratio between the dynamic and the nominal rating was
registered to be 1.8. Although the DLR is not yet operationally deployed, there are DLR tests scheduled.
1.3 DLR IMPACTS
The DLR with no doubt brings to the table great possibilities and opportunities but why now is the right moment of
applying it to the transmission and distributions systems? Because the traditional systems had rather constant
unidirectional power flows. Nowadays, the prosumers, distributed generation (DG) and intermittent green energy units
have changed the power flows and capacities to a great extent. The former actors which were affected and still are,
circuit type and consumption profiles, in combination with the new ones, renewable energy share and prosumers'
activity, dictate the power flow evolution. [8]
The newly-established infrastructure permits to increase the grid's capacity to match a certain level of consumption.
Previously, the system had to provide a high capacity every day but nowadays the situation is changed. It must operate
with various flow scenarios together with renewable energy units connected often far from the power concentrated grid
points. Due to expected increase in energy demand and citizens' reluctance of having generation units in their proximity,
the power system's planning has become more difficult to deal with. Additionally, obtaining authorizations for upgrading
transmission lines is very time-consuming. Therefore, DLR is seen as an optimal solution for these challenges. [8] [13]
These challenges are applicable for networks connected with renewable [14] decentralized units to higher voltage grids;
to the enhanced power flows between neighboring country districts which must cover greater areas in order to achieve
an economic production scheme. [8]
When one is confronted with the problem of connecting off- and onshore wind farms to the grid, DLR peculiarly can
provide the assistance. In addition, DLR can cause a minimum wind generation curtailment which in turn can multiply
the grid's capacity by an order of two which will result in postponing installing new (wind) capacity i.e. DLR will allow
the grid to consume more wind power from the same number of wind turbines without the need of installing new ones..
[8]
These technologies can leverage the ability of controlling the power flows i.e. TSOs and DSOs can optimize the
fluctuating capacities to match the varying consumption patterns. On another hand, control technologies are quite
expensive but the payback time can be drastically reduced by using DLR given the increased capacity we can achieve
with it. [8]
The financial outcomes [14] of DLR are quite considerable: the project Twenties within 7th Framework Programme for
Research and Technological Development sponsored by EU concluded that the revenues can be as high as 250 million
euros with 10 million euros capital investment given an increase in border power exchange of 20%- a contingency that is
easily obtainable with DLR and FACTS technologies. [8]
An indispensable characteristic that DLR must encompass is the ability to predict during the day or day(s) in advance so
that the power markets would operate in an optimal manner i.e. knowing time in advance the line capacities, the TSOs
and DSOs would schedule their assets’ operations accordingly (dispatch their energy generation units accordingly).
Furthermore knowing the power flows on the lines, would empower TSOs and DSOs with information related to grid
configuration, possible failures and import/export contingencies. Although, the tendency is to provide a real-time (RT)
power system monitoring and operation, the biggest part of the system management relies on hours or day(s) in advance
decisions. If the forecasting is not available and/or it is of a poor quality for various reasons (inaccurate method,
parameters' uncertainties, etc.) which DLR is contingent upon, building DLR prediction models becomes difficult and it
is only possible to deliver real-time results. [8]
DLR can also aid the advanced protection systems. It has been observed that triggering relays prematurely might result
in a chain reaction that would lead to outages. With the development of the novel adaptive protection systems in Smart
Grids, dynamic ratings can be used solely to provide highest safety compared to static ratings. Since the DLR operates
with RT line and equipment parameters, the outage frequency would be greatly decreased. [8]

All these would incentivize Vattenfall distribution and transmission system to implement these models within their grid,
supplying them with data from the sensors installed on their lines. The result is expected to achieve several important
outcomes:
• Decrease the curtailment of RES which would
– Decrease the CO2 emissions within the energy mix
– Increase the RES shares and decrease the fossil fuels’ shares in the energy mix and
9
– Decrease the electricity prices (given that the RES are subsidized)
• Increase the grid utilization factor which would
– Increase the transmission and distribution systems’ efficiencies
– Abandon or postpone the plans of expanding the power grid with more lines, equipment, eventual
maintenance and labor which would account for larger investment and variable costs.

As can be seen from above there are a lot of advantages that can be achieved if DLR predictive models are deployed
throughout the electricity system(s). However, one must also assess the weather parameters before considering
implementing predictive DLR. Also, one must research similar projects: he/she must take into account if there were
implemented predictive DLR methods in locations similar to the desired one.

1.4 PURPOSE OF THE MASTER THESIS


The purpose of the thesis is to build predictive models for DLR in order to assess the potential of applying dynamic
ratings on the overhead line during the high wind speed time-periods. The time-horizon of the predictive models
considered are day-ahead and two-day ahead ones. The reason behind them is to empower Vattenfall with information
on the power lines’ flows so that they could operate and dispatch their assets in an optimal manner.
The thesis project’s scope aligns perfectly well with the Naum pilot project’s development. The measured and
calculated data collected and transmitted to Vattenfall datacenter would provide the material for the later data
analysis steps. Additionally, this thesis would show a great increase in the power grid transmission capacity using
DLR due to the benefic weather parameters. The increased grid usage coefficient will cancel out the congestion
problem as well as make possible a potential increase of installed wind power generation units at the site where the
sensor is located. A higher share of renewables and, in this case – wind, will affect the merit order of the power
market, will decrease the fossil fuels’ share in the energy mix, will decrease the emissions and the electricity price
(given that renewables are subsidized, in the latter).
Since Vattenfall would like to leverage Sweden’s wind potential by installing more wind turbines in the appropriate
locations but it is not possible now with the present static ratings which do not take into account the wind-cooling
effect. Specifically, the thesis project will design a solution predict the DLR (given the monitored variables by the line
sensor and weather station) on the line where the power sensor is installed.
This solution will encompass and will be based on an analytic predictive model that will leverage several sources of data,
such as meteorological forecasts and data from sensors installed on the line connected to a wind farm. This data will
enable daily monitoring and feeding to the model the parameters of interest (ambient & conductor temperatures, wind
speed & direction, solar intensity, line rating) and eventually, the simulations will assess a reliable system operation. The
final analytic model should be robust and simple enough to be deployable in an operational environment.
A considerable stress is focused on day-ahead and two-day ahead prediction of DLR. The prediction must be
operationally reliable enough with the highest error rate of 20%, the limit required by Vattenfall. The error rate is
measured by the mean average percentage error (MAPE) indicator. The day-ahead DLR predictions would overall
optimize the power market flows via allowing a larger line capacity, higher RES share in the energy mix (and directly,
decrease the consumer electricity price given the subsidized RES) which would make possible to transport more wind
power during the high-wind speed periods of time. The two-day ahead DLR predictions would optimize the power
flows on a larger time-scale as well as allow a smoother operation of the power flows and maintenance scheduling.
1.5 GOALS AND OBJECTIVES OF THE MASTER THESIS
The solution will have two components (or modules).
• Predictive: Connecting data from data sources (like Swedish Meteorological and Hydrological Institute,
SMHI); forecast (temperature, wind direction & speed) with historical seasonal line ratings data, solar
intensity and wind power output forecast to model and predict the future line performance (to make sure
it will operate in the admissible range and notify when the line will risk a failure).
• Benchmarking and evaluation of different models/algorithms will be based on several criteria, such as
data needed (expensive/inexpensive), model’s speed of execution, reliability and accuracy of predictions,
easiness to understand.

10
1.6 DELIMITATION OF STUDY
In order to achieve plausible and legitimate results, the thesis project was delimited to specific aspects such as:
• Geographical boundaries
Since the available data was collected only from one location, the study and research is undertaken in a
strictly-defined geographical location in the South-West of Sweden.
• Expected outcome character of the project
The thesis is delimited to increase line’s capacity without jeopardizing system’s reliability. Therefore, no
economic analysis of implementing DLR on a system level is performed. Also, no power flow or grid design
calculations/simulations were performed.

11
2 LITERATURE RESEARCH
2.1 OVERVIEW
After an exhaustive and thorough literature research, there were selected a number of scientific papers that would be
helpful in identifying the key-contributors to DLR and the methods of calculating and/or determining the DLR. Most of
the articles do not have a large amount of citations so I tried to filter them by applying several criteria: year of
publication, number of citations, the article’s content and relevancy to my problem. The first appearance of DLR issue
was found in the early ‘80s which reported about Dynamic Thermal Rating of the lines. They were using old-fashioned
algorithms that are now quite obsolete. Subjectively, some of the papers were acknowledged with but not taken into
account since they do would not have helped me in achieving my goals. On the other hand, the new articles with new
algorithms are valuable given they novelty in the techniques and models used. The articles that covered other topics
(such as: design & installation & field experience, protection schemes or the economic aspects of the power grid when
using DLR) than DLR operational reliability were not researched thoroughly. And lastly, the last criterion was to try
researching the papers which deal with the transmission and distribution systems
2.2 THE PAPERS RESEARCHED
A paper from 2006 proposes a particular DLR algorithm together with a Monte Carlo weather simulation model. The
authors tried to model using Monte Carlo simulation the wind direction and speed, ambient temperature, solar radiation
and the DLR data. These results will be used to compute the heat balance equation which in return will determine the
conductor temperature. So, in fact the algorithm predicts the conductor temperature by Monte Carlo simulation. The
resulted DLR values are benchmarked against IEEE standard ones as well as against a direct DLR calculation and
actually the values are quite similar on an hourly time basis- the average percentage error rate is registered under 10-15%.
[15]
The next paper discusses the implementation of an Artificial Neural Network (ANN) model- a dedicated sliding window
online learning algorithm with echo state network. This algorithm is compared against the IEEE DLR standard and
achieves a quite accurate performance- the prediction error is almost null on the test dataset. The analyzed parameters in
the algorithm are: conductor current and heat capacity, ambient temperature, wind speed and solar radiation. [16]
A paper from 2012 compares three types of dynamic thermal line rating predictive algorithms: CIGRE, Partial Least
Squares (PLS) regression and ANN. The required parameters are: wind data, solar irradiation and ambient temperature.
The paper compares the models against several criteria: automatic or manual model development, time and coefficients’
calibrations, precision, user-friendliness and others. The results show a smaller error for the ANN model but all models’
prediction error increases with increasing the time in advance of the prediction , i.e. the smallest error is registered for
the t+1 time-step compared to t+2, t+3, etc. [17]
A Belgian study reports an algorithm which calculates the DLR for a 150 kV overhead line taking into account the
vibration frequency of the conductor. The paper also compares different predictive models considering: only weather
data and weather data with mechanical characteristic of the line (sag and tension). According to the article, the
mechanical parameters penalize the DLR calculate only using the weather data. That represents quite an interesting
finding. The model achieves current values almost five times as compared to the static line rating. The algorithm can
forecast on a maximum time-step of 60 hours and has a confidence-interval of 98%. [18]
Another paper that compares PLS and CIGRE models was undertaken considering Northern Ireland’s power system
and specifically a line of 110 kV/650 A The model analyzes the prediction errors at five special points. The accuracy of
the PLS model is by far greater than of CIGRE’s standard: on an average, the CIGRE’s error is 6.4 larger than the PLS’s
one. Additionally on an hourly-time basis, the PLS ampacity profile was found to be higher than CIGRE’s in every time
slot. [19]
The next paper considers as reference the Germany power system of 220 and 380 kV. It reports the contribution of the
weather data to increasing DLR profile. The predictive model is built using the CIGRE standard. The CIGRE standard
is based on the average ambient temperature which is calculated in an iterative way. The algorithms is complex and
meticulous and it considers also: power grid topology, optimal economic dispatch, environmental condition forecast and
others. The results of the paper are quite noticeable: there is a minimum to no amount of renewable energy curtailed, the
huge DLR potential over SLR (like in the other papers) and an optimal power distribution and transmission between six
zones in Germany. [13]
The last paper researched dealt with N-1 secure operation given that DLR is implemented. Particularly, the DLR
calculations relied only on a probability function of forecasted weather data which considered a time and a special
12
coordinate system. Later on, the results were classified in different scenarios so that any user could achieve tangible
conclusions. The authors determined the thermal modeling of the line by using IEEE standard as well assessed the
importance of each weather parameter on the conductor temperature (like in other papers above). The outcome of the
paper analyzed different aspects of the DLR implementation such as: potential increase in grid utilization, costs of
operating with N-1 components within a Monte Carlo simulation. The conclusion and an interesting finding regarding
the predictions is that the prediction error increases with increasing the time-steps the DLR values are calculated for (just
like in [17]): prediction error for t+3 will be larger than for t+1. Another interesting fact is that the scenarios with the
most parameters can achieve the largest DLR values. [13] On the other hand, the interpretability and user-friendliness of
these methods, which can play a huge role in its eventual implementation, is not thoroughly discussed in this paper
The paper [13] also elaborated on the economic benefits of implementing DLR and on increased grid utilization factor.
Thanks to the additional capacity, the operational costs are reduced but the security must be also enhanced which can be
achieved if there is a compromise found between cost and risk. The SLR scenario has the lowest failure risk but it has
the highest operational cost. Whereas, the DLR case has a larger frequency of uncertain events and much smaller the
operational cost. Consequently if one wants to increase the transmission network utilization without increasing the
operational risk, he/she should be aware of the online weather parameters. [13]
There were also found and researched papers that reported the influence and correlations of mechanical properties
(tensions and sag) on DLR but since there was not enough data collected from Vattenfall AB related to sag and tensions,
the findings of these articles are not included in the current report.
Below the Table 1 illustrates the summary of the research papers.
Purpose of Study Variables Predictive models Time-ahead Error rate
prediction
[15] Prediction of Conductor Own method with: +1 hour 0.09% - 6%
Dynamic Line Rating temperature, wind Time-series and
Based on Assessment direction, solar Monte Carlo
Risk by Time Series radiation, ambient Simulation of
Weather Model temperature, wind weather parameters
speed
[16] Real-Time Conductor current, Artificial Neural Real-time (RT) Normalized mean
Dynamic Thermal ambient temperature, Network (ANN). squared error
Rating Evaluation of wind speed, air heat Sliding window (SW) (NMSE): 0 – 0.8
Overhead Power capacity, solar online learning
Lines based on radiation algorithm. Echo state
Online Adaptation of network (ESN)
Echo State Networks
[17] Modelling and Wind speed, ANN, Partial Least Not known exactly. Mean Squared Error
Prediction conductor Squared (PLS) Known: t+1, t+2, (MSE): 0.8 – 2.5
Techniques for temperature, wind t+3
Dynamic Overhead direction, solar
Line Rating radiation, current
[18] Dynamic line Wind speed and Own algorithm Real-time, +4 hours, Relative Error: 0% -
rating and ampacity other weather data, +48 hours and 70%
forecasting - the keys line Sag maximum +60 hour
to optimize power
line assets with the
integration of RES
[19] Experimentally Ambient PLS 5 minutes and RT MSE: 0.6 - 14
validated partial least temperature, winds
squares model speed, wind direction,
solar radiation, line
current and
conductor

13
temperature.
[13] Impacts Of Conductor Own algorithm of A lab simulation. Average percentage
Dynamic Line Rating temperature., economic optimal Time resolution: 15 error: 0.014%
On Power Dispatch ambient temperature, power dispatch minutes
Performance And solar radiation, wind
Grid Integration Of angle
Renewable Energy
Sources

Table 1. Summary of the researched papers


After an exhaustive and a thorough literature research, it was concluded that several weather parameters are of a great
importance when calculating DLR. These are: Wind Speed, Ambient Temperature, Solar Radiation and Wind Direction.
The Conductor Temperature is influenced by them as well as by the lines’ Current Intensity values. Low ambient
temperature and solar radiation, high wind speed and perpendicular wind are cooling down the conductor providing
greater potential for DLR. On the other hand, high ambient temperature and solar radiation together with low wind
speed are increasing the conductor temperature and are decreasing the DLR.

14
3 METHOD AND MATERIAL
3.1 DATA VIZUALIZATION
There are several approaches used to visualize and to explore the data. Boxplot is a useful graphical representation of a
dataset. It consists of quartiles i.e. three points in a dataset that divide it in four equal sub-groups of the initial dataset.
The first quartile is located at the middle between the minimum and median values of the dataset. The second quartile
represents the median of the dataset and the third quartile is located at the middle between the median and the
maximum value of the dataset. The lines extending outside the quartiles are called whiskers. They show how data
fluctuates above upper quartiles and lower the bottom one. Thus, it additionally shows the range of the dataset. An
example of boxplot is shown in Figure 1.
The points lower and upper the maximum and minimum lines are called outliers. An outlier represents a measurement, a
calculation or an observation that is far away from the rest of the observations. Its presence might have different reasons
such as volatile characteristic of the measurement or even computational or measurement error(s). Outliers can result by
accident in any dataset but they will always illustrate the underlying or hidden experimental/computational error.
Another reason outliers can appear is because that the dataset could have a heavy-tailed distribution (probability
distributions with non-exponentially bounded tails). Both reasons can occur in a data analysis and adequate and robust
models must be chosen in order to account for the outliers' influence.

Figure 1. An example of vertical boxplot (here the variation of NormalDLR in November)

Scatterplots are used to visualize the dependency of one variable in respect to another one in Cartesian coordinates. The
scatterplots consist mainly of two axes: x-axis and y-axis even though, scatterplots with two y-axes and one x-axis can be
found as well. The scatterplots in the current project would try to illustrate, mainly, the dependency of NormalDLR
against the rest of the variables given its importance to the project. The NormalDLR is the name used by the line sensor
installation to calculate the DLR using the IEEE standard [12].
3.2 EXPLORATORY DATA ANALYSIS
Exploratory Data Analysis (EDA) is a method of looking at data in both, graphical and non-graphical ways. These
methods are undertaken in order to:
o identify important parameters/variables
o discover known or hidden patterns
o get a general view on the dataset
o recognize deviations and inconsistencies
o evaluate fundamental hypothesizes
o build highly interpretable models
o identify the variables that explain the model the most.
o set bounds to the dataset for its later comprehensive analysis

15
Thus, EDA is performed with the aim to acquire an overview of the main datasets’ features. EDA may or may not
employ a statistical analysis but first of all EDA is performed in order to get an insight about the data which usual
models and/or hypothesizes may not reveal themselves.
EDA is a very useful approach to initially analyze the data simply because it is much easier to visualize it by looking at
plots and statistical graphics rather than going through hundreds of thousands of observations of more than then
observations (as in the case of this project). Usually, people find it laborious, tiresome and suppressing looking at endless
tables and spreadsheets. Instead, EDA does all this boring and huge amount of work for us rather quickly and is
showing us the data in an interpretable and understandable way by focusing on important findings and partially hiding
the not so important results. Additionally, EDA sets the fundamental rules of how the data analysis should be
undertaken.
In the current thesis project, EDA was done using boxplots, correlation tables and graphics as well as scatterplots of the
most important variables on the minute-based datasets. Boxplots were built according to time-series and the scatterplots
depict the evolution of NormalDLR and Current versus weather data and conductor temperature.
Firstly, EDA is performed on the dataset without analysis of the updated dataset with DLR calculation according to
CIGRE. The reasons are to isolate the analysis by calculating the DLR using different standards. The updated dataset
would cover calculation by both standards and would have different results from the initial dataset with only IEEE
standard calculations. The analysis of the updated dataset (with CIGRE standard) is found in the last section of the
current paragraph).
3.3 DLR CALCULATION ACCORDING TO IEEE STANDARD
Vattenfall AB is using IEEE standard to calculate the NormalDLR. IEEE standards’ formulas are as follow below. The
general formula represents a thermal balance expressed via heating and cooling fluxes:
● Heating from Joule losses from the current flowing through the conductor, ImaxR(TC)
● Heating from solar radiation, qs
● Cooling by natural convection, qc
● Radiative cooling, qr

𝑞𝑞𝑐𝑐 + 𝑞𝑞𝑟𝑟 − 𝑞𝑞𝑠𝑠


𝐼𝐼𝑚𝑚𝑚𝑚𝑚𝑚 = �
𝑅𝑅(𝑇𝑇𝑐𝑐 )
The standard then gives a more elaborated formula which takes into account the hidden parameters, not shown in the
above formula. Imax represents the current rating and the DLR is the maximum value between the ones in the accolade.

0.52
⎧ 𝐷𝐷 ∗ ρ𝑓𝑓 ∗ 𝑣𝑣
��1.01 + 0.0371 ∗ � � � ∗ �𝑘𝑘𝑓𝑓 ∗ 𝐾𝐾𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 ∗ ΔT �
⎪ µ𝑓𝑓


�𝑅𝑅(𝑇𝑇𝑐𝑐 )
⎨ 0.6
𝐷𝐷 ∗ ρ𝑓𝑓 ∗ 𝑣𝑣
⎪ ��0.0119 ∗ � � � ∗ �𝑘𝑘𝑓𝑓 ∗ 𝐾𝐾𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 ∗ ΔT �
⎪ µ𝑓𝑓

⎩ �𝑅𝑅(𝑇𝑇𝑐𝑐 )

The top equation from the accolade is used for small wind speeds whereas the bottom one is used for large ones. But
generally, at any wind speed the higher value of DLR is used.
The parameters of the formulas are:
qc, qr, qs – convective and radiative cooling, solar heating (W/m2)
R(Tc) – conductor resistance at conductor temperature Tc (Ω)
16
ΔT = Tc – Ta; difference between conductor and air temperatures (oC)
D – conductor diameter (m)
ρf – air density at Tf where Tf = 0.5*(Tc + Ta) (kg/m3)
v – wind speed (m/s)
μf – air dynamic viscosity at Tf from thermodynamic tables (m2/s)
kf – air thermal conductivity at Tf from thermodynamic tables (W/(m* K))
Kangle – parameter represents the angle between conductor axis and incoming air

The air density has a non-linear characteristic relative to the air temperature and it depends on air pressure and air
humiditiy.

𝑝𝑝
∗ (1 + 𝑥𝑥)
𝑇𝑇 ∗ 𝑅𝑅𝑎𝑎
ρ=
𝑥𝑥 ∗ 𝑅𝑅𝑤𝑤
1+
𝑅𝑅𝑎𝑎
where
ρ is the air density in kg/m3
Ra = 286.9 - the individual gas constant air (J/kg K)
Rw = 461.5 - the individual gas constant water vapor (J/kg K)
x = specific humidity or humidity ratio (kg/kg)
p = pressure in the humid air (Pa)
On the other hand, air thermal conductivity and air dynamic viscosity have a linear dependency relative to the air
temperature and a linear model can be fit- which I did when I calculated the DLR for the summer period. Kangle
represents a trigonometric function of the wind direction angle between the conductor axis and the wind attack angle.
3.4 DLR CALCULATION FOR SUMMER PERIOD
The power sensor acquired by Vattenfall carried out all the measurements and DLR calculations and therefore,
Vattenfall didn't calculate DLR itself. The DLR calculation for summer period would have allowed to track the patterns
and variations within different seasons and months so a DLR calculation for the summer period is desired. There were
efforts channeled to try calculating the DLR for the summer period according to IEEE standard and validate my results
with the calculations done by Vattenfall (for Fall, Winter and beginning of Spring) but the results obtained were
inconsistent when compared to Vattenfall ones. The reasons lies in the complex DLR calculation process which takes
into account a great amount of variables in diverse forms. Furthermore, IEEE standard was elaborated by an
engineering team within a time-period proficient in technological problems related to line thermal balance whereas the
current project’s scope was to employ machine learning techniques in order to forecast DLR. The computational
complexity ranges from the air parameters to the solar radiation heating which accounts for:
• Number of the day
• Local hour angle of the object
• Sidereal time
• Ascension time
• Solar azimuth angle
• Declination
• Solar hour, etc.

In conclusion, as advised by the supervisors, it was given up calculating the DLR for the summer period according to the
IEEE standard and the focus was set on the next steps of the analysis.
17
3.5 DATA COLLECTED FROM VATTENFALL
The Vattenfall’s Naum DLR set-up consists of a power line sensor and weather station installed on a 140 kV line which
collects the following data: wind speed (m/s), wind direction (o), solar irradiation (W/m2), air temperature (oC), current
intensity values (A), conductor temperature (oC), line, sag, conductor tension. The conductor’s parameters are: 140 kV,
910 FeAl (Orre). construction temperature 50°C, standard rate 1297 A (50°C conductor temperature and 10°C ambient
temperature, 0,6 m/s wind).
The large data set was collected in Excel format from Vattenfall AB contained around 400,000 observations with more
than 10 variables. The data had a fine granularity of minute-based and the time period was July, 1st 2014 – March, 31st
2015. The variables measured are: Date, Time, Solar Radiation, Wind Direction, Ambient Temperature, Wind Speed,
Conductor Temperature and Current Intensity. The calculated variables are CTM DLR, Normal DLR. CTM DLR
represents a specific calculation algorithm for DLR whereas Normal DLR is calculated by Vattenfall AB using the
IEEE-738 standard. The rest of the dataset cover Tension, Sag, Inclinometer and Low Point- but, there were some
constant errors with these datasets, as reported by Vattenfall and therefore, they won’t be modelled furthermore.
However, DLR calculations (Normal and CTM) are available only starting from September, 15th 2014. Thus, the
complete dataset package consist of 6.5 months minute-based observations.
The additional dataset (air pressure and humidity) was collected at a later stage in order to calculate the DLR according
to IEEE standard which is discussed in a later section.
Luckily the Naum location has a rather large DLR potential thanks for its positive “DLR-friendly” weather conditions:
● Average Solar Radiation of 76.25 W/m2
● Average Ambient Temperature of 7.979oC
● Average Wind Speed of 1.984 m/s
● Average Wind Direction of 159.423o

Naturally, the weather conditions found in the north of Sweden would be more DLR-friendly in respect to the air
temperature but the problem must account also for other aspects such as: wind power potential in different Swedish
regions as well as the electricity prices and demand in those regions. The same judgment can be applied also in the
coastal regions of Sweden given a higher off-shore wind power potential.

3.6 DATA COLLECTED FROM SMHI


The past prognosed weather data was collected from SMHI (http://www.smhi.se/). This data consisted of wind data
(from Måseskär weather station), global radiation (from Nordkoster Sol weather station) and air temperature (from
Rörastrand weather station). The wind data was characterized by two vectors which expressed the wind speed on two
directions. The global radiation data represented a cumulative metric whereas the air temperature was given in Kelvin.
The wind and air temperature data was prognosed every six hours (0, 6, 12, 18) for six separated time-horizons (+24,
+30, +36, +42, +48). The global radiation data was prognosed every six hours for seven separated time-horizons (+18,
+24, +30, +36, +42, +48). The prognosed data cover the following time period: January, 1st 2014 – March, 31st 2015.
SMHI has also provided instructions of how to operate and to transform the data. After the appropriate transformation
have been performed, plausible wind speed values were retrieved as opposed to wind direction values- which appeared
to be erroneous. The global radiation also presented erroneous values after the transformations – this situation was
communicated to SMHI and they admitted there was a mistake in the unit of measure. However, after changing the unit
of measure the erroneous values problem for global radiation has not been solved. The air temperature was the easiest to
transform since it was presented in Kelvin units. The +24H and +48H predictive models will be built using the wind
speed and the air temperature prognosed values collected from SMHI. Wind speed has the largest impact on DLR
whereas air temperature influences the DLR but not to a large extent as wind speed. The wind speed and the air
temperature weather stations are located at a distance of 35.2 km and respectively, 22.91 km from the line sensor and
weather station sensor.

18
3.7 DATA CLEANSING, CURATION AND LOGIC
The Excel file received from Vattenfall AB consisted of 274 days minute-based observations of weather data. Also apart
from weather data, it covered 198 days of minute-based calculations on NormalDLR data, calculated with IEEE
standard because NormalDLR started to be calculated and registered since September, 15th 2014.
After collecting the values a data curation was undertaken. The missing values were interpolated linearly and cubically so
that not a single observation would be lost. There were also several errors in the measured parameters:
● Wind speed of 171 m/s (July, 9th 2014) at time=15:23 given that it was 3.163 m/s one minute before and 3.148
m/s one minute after
● Negative Wind Speed observation was excluded (January, 21st 2015 at time=5.53).
● Ambient temperature of 61.757oC (August, 3rd 2014) at time= 17.54 given that it was 24.793oC one minute
before and 24.677oC one minute after.
The erroneous data was interpolated and the summary of the new dataset can be seen in Table 2. Also the observations
with strange values were either, interpolated or averaged. So, a dedicated logic scheme for each parameters was applied
to the entire dataset. I set a logic on Solar Radiation which would equalize the very small and negative values to zero.
The reason I attributed “zero” values for far smaller than zero observations is for decreasing the calculation time. The
same logic was used in the case of Wind Direction but with an additional condition of not exceeding the 360o value.
There have been analyzed the historical air temperature profiles in the region where the line sensor and weather station
is installed and came up with lower and top boundaries: “-45oC” and “+40oC”. The observations that were exceeded this
range were interpolated linearly. The same logic as in the case of solar radiation was chosen for the negative and very
small wind speed values. Also, analyzing the wind speed in the region for the past years, the 10m/s wind speed threshold
for the top boundary was chosen.
Parameter Unit Range
Date Sep., 1st 2014 – March, 31st 2015
Time 0.00 – 23.59
Solar Radiation W/m2 0 – 875.56
Wind Direction o 0 – 358.96
Ambient Temperature C -10.50 – 21.30
Wind Speed m/s 0 – 9.884
Conductor Temperature o -8.07 – 37.41
Current A 0 – 1377.9
Humidity % 0 – 95.20
Air Pressure mbar 901.4 – 1027.3
NormalDLR A 801 - 3515

Table 2. Numerical ranges and units of each variable


The weather conditions lead to an Average Conductor Temperature of 12.5570C. The average values were calculated
given the dataset of 9 months (July, 2014 - March, 2015). These represent the average values and therefore, the
deviations from the mean values can be significant for some parameters. So, the DLR potential must be accounted for
every day, hour or even minute.
The Conductor Temperature must signal gradually for an anomaly, alarm and danger. There was implemented a logic for
45 and 48 oC since the normal operation conductor temperature is 50oC. The algorithms for Conductor Temperature
must be fast in order to prompt the staff immediately so they can reduce the thermal stresses and provide an acceptable
conductor thermal stability. The chosen ranges for Conductor Temperature are: “-20 oC”, “45 oC” and “48 oC”.
The Humidity and Barometric Pressure observations exhibited a high level of erroneousness. According to them, there is
a vacuum in the neighborhood of the line sensor and weather station and with negative humidity sometimes. There was
implemented a linear interpolation of the negative and over one hundred values for Humidity. The lowest pressure on
19
Earth registered during non-tornadic periods was in Pacific Ocean in 1979 and it was 870 mbar. Thus, the lower limit is
set to 900 mbar and all the values below it are equalized with the mean value of the respective day. The values larger
than 1520 (50% larger than the normal pressure) are also equalized with the mean value of the respective day.
Every logic threshold for each parameter was agreed upon with Vattenfall AB. But, there should be a signal triggered
whenever a parameter is exceeding a specific value which would alarm the staff about a measuring device malfunction
which could jeopardize the normal operation.
3.8 PREDICTIVE ANALYSIS
When building the predictive models, one has to take into account several factors. Firstly, when the predictive models
are classified in supervised and non-supervised learning models. In the case of supervised learning the user tells the
machine/algorithm what the output should be whereas in the unsupervised learning, the machine/algorithm
“decides”/calculates by itself the output which can result in several clusters. NormalDLR is a continuous variable and
hence, a supervised learning algorithm will have to be used. And secondly, one has to account the characteristic or the
nature of the predicted parameter:/output: is it a linear or non-linear dependency between the output and input/s? In
order to build the predictive models for NormalDLR, it is clear that non-linear models have to be taken into account.
Therefore several non-linear models will be taken into account in this section. Potentially, the most important criteria of
choosing any supervised learning algorithm is the trade-off between accuracy and interpretability of the models. The
higher the accuracy of the model the less interpretable and less understandable it is for the mass public and vice-versa
(Figure 3).
The training dataset is defined from Sep., 15th 2014 to March, 31st 2015 and the test dataset is the month of April, 2015.
The reason of choosing the month of April as a test dataset was to run the models on “new” values of the variables.
“New” (here) refers to values of the predictors that models have not “seen” before in a particular combination i.e. the
month of March would have a particular set of values which is different from the ones in the February month, for
instance. Therefore, the “new” set of April values would illustrate the performance of the models with combinations of
values of the variables “unseen” by the models in the training sets. Training the model was achieved by tuning each
parameter of each model in order to reduce the error rate. The cross-validation (on ten folds) of the models was done
using the training dataset. Cross-validation means training and testing the performance of the models on (here) the
training dataset and then computing the average error rate. Ten folds were used in order to build and run the models on
the training dataset- it means that every time the models were being built, the training dataset was divided in ten folds:
nine folds for building the model and one for testing/validating the results of the respective model. This intermediate
step of testing and validating the models leads to a higher accuracy of the predictive models which means a lower error
in the latter test dataset.
POLYNOMIAL REGRESSION
In statistics, polynomial regression tries to find a relationship between the input variable x and the output variable y
using a polynomial of nth degree. Polynomial regression tries to fit a non-linear model (polynomial) through the points
of the dataset. In our case, there will be a non-linear relationship between DLR, weather data (wind speed, solar
radiation, ambient temperature and wind direction), conductor temperature and current. Hence, it is a multiple
polynomial regression because there are several input variables to describe one input.
A simple formula to describe polynomial regression:
𝑦𝑦 = 𝑓𝑓(𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑛𝑛 )
𝑝𝑝
𝑦𝑦 = 𝑎𝑎 ∗ 𝑥𝑥1𝑚𝑚 + 𝑏𝑏 ∗ 𝑥𝑥2𝑛𝑛 + 𝑐𝑐 ∗ 𝑥𝑥3 + ⋯
Where:
a, b, c, …- coefficients/constants
x1, x2, x3, …input variables
m, n, p= powers
y = output parameter (parameter to predict)

The coefficients depend on the nature of the problem; sometimes, coefficients are set to “1” and the input variables’
powers are adjusted to fit the model the most. The input variables, in our case, would be weather data (wind speed, solar
radiation, ambient temperature and wind direction), conductor temperature and current. The powers are chosen so that
the overall polynomial explains the model with the lowest error. The parameter to predict is the NormalDLR. An
advanced version of the simple polynomial regression model is the multivariate adaptive regression spline (MARS).

20
The abbreviated polynomial regression models used in this project are: lm (linear model which can be converted in a
polynomial non-linear regression model), gcvEarth, bagEarthGCV and bagEarth [20] (open-source libraries for
multivariate adaptive regression splines built and used in R). The latter three are advanced models of MARS with
different tuning parameters. Cubist is another regression-type model that is a rule-based algorithm. It also has its special
tuning parameters and it resembles a decision-tree algorithm for regression problems.
GENERAL ADDITIVE MODEL (GAM)
GAM has a similar approach to polynomial regression but in this case, the input variables represent different functions.
An example of GAM model is:
𝑦𝑦 = 𝑎𝑎1 ∗ 𝑓𝑓1 (𝑥𝑥1 ) + ⋯ + 𝑎𝑎𝑛𝑛 ∗ 𝑓𝑓𝑛𝑛 (𝑥𝑥𝑛𝑛 )

Conductor temperature can replace the functions f since, it is dependent on Current, and weather parameters. Also,
ambient temperature is dependent on Solar Radiation and possibly on wind speed- so, it can be expressed as a function
of the latter ones and so on.

ARTIFICIAL NEURAL NETWORK (ANN)


ANNs represent non-linear statistical tools which can model the complex dependencies between input (input layer) and
output variables (output layer). They have a rather more abstract approach relative to the ones discussed before. ANNs
are modeled as an interconnected system of “neurons” (gray spheres in the picture below) which may calculate values
from input variables and have the capacity of finding patterns automatically and implementing other machine learning
techniques given their adaptive and flexible nature. The ANN model used in this thesis project is called brnn which
stands for “Bayesian regularization for feed-forward neural networks” and the tuning parameter is the number of
neurons.

Figure 2. A typical structure of an ANN [36]

SUPPORT VECTOR MACHINES (SVM)


SVM can be considered a subclass of multivariate polynomial regression problem. SVM resembles ANN in the
sense that it tries to model the data and to find patterns in it. Although initially built and used only for non-
supervised learning, there were released several versions for regressions problems as well. The fitting of the model
is assessed by the objective function of minimizing the sum of squared errors between the true and predicted values
of the target variable. A more thorough explanation of support vector machines is outside the current thesis project
and more information can be found online at [21, 22, 23]. The model used in this thesis project is called
“svmLinear”.
Figure 3 illustrates the comparison between various predictive models in respect to accuracy and interpretability aspects.
Here we can see low accuracy for regression models but a high interpretability. A much higher accuracy is attributed to
more complex predictive models such as ANN and SVM. GAM models are located somehow in the middle given a
regression-like intrinsic structure but a well-deserved and justified improved accuracy. The GAM models resemble the
regression models’ structure and this feature allows for the GAM models to be more interpretable than the more
ambiguous models like ANN. The number of layers and the self-learning capabilities of ANN models resemble a “black
box” structure which ANNs are referred to usually.

21
Figure 3. Comparison of various predictive models

22
4 VISUALIZATON OF EDA

4.1 TEMPORAL VARIATION


During the exploratory analysis, the dataset was complemented with temporal variables such as the final version of the
dataset consisted of: weather data and time-periods (seasons, months, days, types of days, hour). This splitting would
provide greater insight into the data and find patterns. Figure 4 illustrates the monthly variations of several parameters:
DLR, Wind Speed (m/s), Solar Radiation (W/m2), Ambient and Conductor Temperatures (oC) and Wind Direction (o).
One can observe the increase of DLR from summer to fall- that is due to the increase of wind speed and decrease of
solar radiation and ambient temperature which decrease the conductor temperature allowing a large current to be
transmitted. However, one would expect an increased DLR profile from November to December given a regular
historical temperature behavior. In contrast, the DLR is larger in November than in December and this correlated with
slight increase of wind speed and decrease in solar radiation and large decrease of ambient temperature. Some boxplots
do not depict the variation in a clear manner- all the detailed boxplots can be found in the Annex. For example, Figure 5
illustrates in detail the monthly variation of NormalDLR.
The points which connect the broken line represent the mean values of the NormalDLR in the respective months. Here
the outliers are depicted with black dots. This boxplot shows us each month’s mean, median, maximum and minimum
observations as well as the fluctuations throughout the period of seven months. The steadily increase of NormalDLR
from September 2014 until November 2014 can be due to decrease of ambient temperature and solar radiation. The

Figure 4. Monthly Boxplots of multiple parameters


wind speed is almost the same in the three months. From December, 2014 to February, 2015 the solar radiation
averagely and generally had increased whereas the ambient temperature had some fluctuations generally and on average.
This accounts for small slopes in NormalDLR in those months. Even though, there were beneficial conditions for
NormalDLR to increase (decrease of ambient temperature, solar radiation and conductor temperature), it had decreased
from November to December. The mean wind speed had decreased however casting lower effect on NormalDLR. The
summaries of every month observations and analysis can be found in Annex.

23
Figure 5. Monthly variation of NormalDLR in boxplot

Figure 6. Seasonal variations

24
Figure 7. Monthly variation of Current in boxplot
Analyzing the above boxplots, one can conclude that there is no pattern in the DLR over the months: DLR during
winter should have increased but the figure illustrated the opposite- hence, there is a random distribution. Thus, there is
no need of splitting the dataset by month variable- the actual important variables are the weather ones.
The Current boxplot visualization is also worth noticing- see Figure 6. Here we can see the tremendous difference
between NormalDLR and Current monthly variations and the potential to use NormalDLR values. Also, we can see that
there are less outliers in the Current profile than in NormalDLR one. The reason might be that the latter one is
calculated and some errors might have been introduced during the calculations or from measuring the parameters
required by the calculation of NormalDLR with the IEEE standard. Remember that NormalDLR is calculated using
IEEE standard and it requires the average temperature as input which undergoes a sophisticated computational
algorithm featuring some thermodynamic formulas and parameters as air thermodynamic conductivity and dynamic
viscosity at given temperature.
The above boxplots can be concluded in seasonal variations and future patterns can be visualized. Figure 7 shows the
aggregated seasonal fluctuations. Here one can identify a seasonal pattern. Due to seasonal variations of weather
parameters the DLR fluctuates accordingly: large in winter and smaller in fall and spring.
The next boxplots depict the profiles throughout the analyzed period given a specific day, hour and type of day. They
represent the aggregated values and profiles throughout the analyzed period.

25

Figure 8. Variations during week days aggregated throughout the analyzed period
Figure 10. The hourly aggregated profiles throughout the analyzed period

Figure 9. The hourly aggregated Current profile throughout the analyzed period

One can conclude that there is not particular pattern in any plotted parameters’ variation throughout the days of the
week (Figure 8). Therefore, there is no need to take into account the “Days” variable. The same conclusion is valid for
the hourly and type-of-day profile (Figure 9 and Figure 11). Some variations can be difficult to read (because too many
outliers) so, the detailed boxplots can be found in the Annex.

Figure 10 shows that the Current follows the load pattern and it is validated. The aggregated Current profile is the result
of combining the current curves from all the months and dividing them in 24 hourly time slots.

26
Figure 11. Parameters’ variations as a function of type of day
Figure 11 illustrates the parameter’s variations as a function of the type of day. Like in the case of splitting the data in
months, this figure does not add relevant information and therefore, the “type of day” variable cannot be used in the
following modeling.
In all cases the NormalDLR is much higher than the measured current in every month. In conclusion, the EDA has
shown that there are only several variables that influence the most the profile of NormalDLR: weather parameters
(ambient temperature, wind speed, wind direction, solar radiation), current intensity and conductor temperature. Given a
low variation dependency between NormalDLR, temporal and additional variables (type of the day, day of the week,
hour of the day, month and season), the NormalDLR is influenced the least by them and building the predictive models
based on them would not lead to plausible results.
4.2 SCATTERPLOTS
In order to identify which variables are influencing the DLR the most, there were plotted the DLR against the weather
data and the conductor temperature. According to the literature research, the most impact on DLR must be exhibited by
the wind speed. Wind direction, ambient temperature and solar radiation have lower impact.
Charts were built on linear and logarithmic scale to discover hidden patterns- they can be found in the Annex. In the
case of negative temperatures there were generated errors but it did not hinder the visualizations of the plots. The plots
for entire analyzed period as well as for each month are available in the Annex (Scatterplots).
Below one can see the dependencies and influences of the weather data and conductor temperatures on DLR and
current. There was made a visible distinction between DLR and measured current for both, show once again the
unexploited potential and map them on the plots against different parameters. DLR is depicted with blue color and the
measured current is in green. As can be seen in the figures below, the measured current and DLR shared several
observations but it is not certain if they did so at the same time- judging by the boxplots from above, we can conclude
that the sharing observations occurred at different timestamps but at similar weather data and conductor temperatures.
Additionally, the shared observations number is not considerably large.
Figure 12 illustrates the evolution of DLR and current in respect to weather parameters and conductor temperature. As
can be seen, there is an almost exponential dependency between DLR and Wind Speed.

27
Figure 12. DLR and current versus various parameters
Next plot shows DLR versus the Ambient Temperature. The plot features a large amount of noise and also there is an
unexpected drop of DLR when the temperature is negative. The DLR peaks around 5oC and not at lower temperatures
as expected. After 10oC the DLR and measured current drop as expected. However, the expected behavior of DLR is an
exponentially decreasing characteristic with increasing the temperature.
The plot in the top right illustrates the DLR and measured current versus conductor temperature. As expected DLR
drops with increasing conductor temperature. However, the current seems to increase with increasing conductor
temperature but it is still much below the average DLR value and much more below than maximum DLR value. The
reason of increasing current with the conductor temperatures may be because Vattenfall AB was running individual tests
on uprating the line.
Next plot shows the DLR capacity versus Solar Radiation. As expected, the DLR decreases with increasing solar
radiation. Also, the current seems to also have an inverse proportionality with solar radiation. Linear scale is much more
relevant in this case.
The least bound parameter to DLR and measured current is the wind direction (bottom right). Although, some peaks
can be identified around 60o-70 o. The red dot represents the observations when DLR is smaller than the measured
current. This observations can be seen only in this case. The literature study suggest that the maximum DLR potential
can be achieved if the wind direction is around 90o. However, no prominent peak can be seen around 90o. But, there is a
peak around 270o which represents 90o plus π radians. A specific control coefficient must be applied to the wind
direction variable at the stage of building predictive models.
4.3 CORRELATION ANALYSIS
Correlation is a statistical method of illustrating how weakly or strongly variables are related when they are paired. In
order to identify the correlations between variables i.e. to discover what variables have the most and least interaction
between each other, a correlation study was undertaken. The correlation study seeks to find the relationships between
variables and to which extent they correlate to each other. However, a correlation is not always a dependency and
therefore, a causality must be found: there should be a physical meaning and proof of interaction of variable X on
evolution of variable Y.

28

Figure 13. Correlation table of the entire dataset


The exhaustive correlation study was again performed on the entire minute-based dataset and on each month in part.
The results can be seen below.

Figure 14. Correlogram of the entire dataset


The correlogram depicted in Figure 14 is explained as follows: red color represents a negative correlation and the blue
color stands for a positive correlation between the paired variables. The positive correlation between two variables
means a direct proportionality between those variables and vice-versa for the negative correlation. The filling of the
circles above each variable stands for the correlation coefficients between the paired variables. The correlogram study
was inspired from [24]
Both, the table and the correlogram show the correlation coefficients between pairs of variables. On the diagonal there
are digits equal to “1” evidently because a variable is the strongest correlated to itself. The negative numbers in the table
as well as the red squares and circles in correlogram show that those paired variables have a negative or inverse
proportionality i.e. when one is increasing the other one is decreasing and the other way around. The positive numbers
and the blue circles and show a positive correlation. The Data Analysis books advise upon correlation range significance.
Therefore, pairs with coefficients less than 0.4 (in module) are considered not correlated, pairs with coefficients 0.4 - 0.6
are somehow correlated and the pairs with coefficients larger than 0.6 are strongly correlated. [25] However, these limits
can vary to some extent given the nature and characteristic of the data analysis problem concerned.
As can be seen from the correlogram DLR is highly and positively correlated with wind speed (as expected) and
negatively correlated with conductor temperature and less with ambient temperature. Also, there is a significant negative
correlation between wind direction and DLR. Also, as expected a small correlation is identified between DLR and solar
radiation. Solar radiation is positively correlated with ambient temperature which both of them are correlated with the
conductor temperature which makes physical sense.
A monthly correlation study was undertaken in order to identify the correlation pairs in each month. Each month’s
correlation tables and correlograms can be found in the Annex (Correlations). Interesting to notice is the month of
September which had DLR data only from September, 15th onwards. The conclusion can be only one: we must use the
dataset which features the most observations in order to explore it more thoroughly and make the correct assumptions
which would lead to a better predictive model.
After the correlation study one can form clusters of parameters to include in the predictive model. Furthermore, it is
possible to form different clusters of variables for different months i.e. September will be determined by wind speed,
ambient temperature and solar radiation and December will be determined by others or maybe the same ones.

29
The possible clusters of variables (Which are highly and somehow correlated with NormalDLR) to be used in different
months are:
● September - wind speed, solar radiation, conductor temperature
● October - wind speed, conductor temperature, wind direction and ambient temperature. Notice that there is no
solar radiation because it decreased quite dramatically on average from September to October.
● November - wind direction, ambient temperature, wind speed, conductor temperature
● December - wind speed, wind direction, conductor temperature. Notice, almost null correlation between DLR
and solar radiation
● January - wind speed, ambient temperature
● February - wind speed, conductor temperature

The choice is somehow objective with respect to the limits stated above but various combinations can be achieved; such
as identifying variables most correlated with conductor temperature and then build the models upon the correlation of
DLR with conductor temperature. However, this approach decreases the model interpretability but it is expected to
increase its accuracy. Therefore, the models will be based on 3-4 variables which would predict the Conductor
Temperature which in return will predict the NormalDLR.
The initial version of the predictive modeling is illustrated in the flowchart below (Figure 15). This scheme illustrates the
approach which has to be taken in order to predict the NormalDLR but it might be changed if the prediction of
NormalDLR turns to be a complex and dependent on multiple factors.

Figure 15. The flowchart used for predicting the NormalDLR

4.4 TIME-RESOLUTION COMPARISON


The entire dataset has a minute-based granularity but for the sake of decreasing computational time and cost one might
use different time resolutions. In this section I tried to investigate how much information I lose if I were to use hourly-
based dataset. The boxplot illustrating minute- (on the left) and hourly-based NormalDLR observations can be seen
below in Figure 16 (The rest of the time resolution boxplot comparison can be found in the Annex (Time resolution
comparison). The hourly-based observations were analyzed from two different points of view: the one in the middle
represents the aggregated hourly-mean values whereas the one on the right illustrates the observations
extracted/sampled from each hour.

30
Figure 16. Hourly NormalDLR profile in different time-resolutions

Analyzing the above boxplots one can conclude that there is a slight difference in variation between different time
resolutions. Hence, the predictive models could be built using hourly based data which would decrease the
computational time and cost as stated before but it will also affect the models’ precisions.

4.5 CONCLUSIONS ON EDA


After performing an exhaustive EDA on the dataset there were registered the following findings:
• There were found temporal variations of several variables that were expected such as, current, air temperature,
solar radiation. Also, there was found a random variation of the variable NormalDLR over time.
• The scatterplots revealed that the NormalDLR strongly depends on the wind speed and less on the air
temperature and solar radiation (as it has been also found in the literature during the literature research)
• The additional temporal variables did not suggest any dependencies between NormalDLR and type of day,
season, month, day or hour.
• Nevertheless, with the intention of achieving low error rates for the predictive models for NormalDLR, the
models will be built using some of the temporal data. Temporal data will be also used to build the predictive
models for wind speed, current intensity and conductor temperature in order to reduce the error rates.
• The correlation analysis revealed which variables have the most influence on the NormalDLR as well
as which variables correlate strongly and weakly with each other.
• The fluctuations on minute-, hour- based and hour-sampled values were found to be similar and minute-based
values will be used to build the predictive models. This will considerably decrease the computational power and
the building time required by the predictive models.

31
5 PREDICTIVE MODELING
5.1 PREAMBLE
The purpose of the thesis is to build predictive models for DLR in order to assess the potential of applying dynamic
ratings on the overhead line during the high wind speed time-periods. The business and the operational impacts are
listed and explained in 1.3. In the following chapter there will be built predictive models for that use historical and
forecasted observations. Specifically, models will be built to predict the weather and power-line related parameters. The
process and the approach of building the predictive models are shown in the explicit and descriptive flowcharts and
detailed explanation about them is documented in the respective paragraphs. Finally, the predictive models’
performances are analyzed and visualized in an explicit way so that, the most performant models are chosen at each step
of the flowchart’s process.
5.2 BUILDING AND VALIDATING THE PREDICTIVE MODELS
This section will discuss about building the predictive models for historical as well as for the forecasted observations. It
will introduce different aspects related to the models, data and the problems that appeared during the process and how
they were overcame.

After discussing with Vattenfall officials about the possibility of predicting the Current Intensity values on a day-ahead
basis, it was concluded that these kind of predictions are done for other lines and not for the one where the line sensor
and weather station is installed on. However, Vattenfall has communicated that the current is dependent to some extent
on several factors such as:
• industrial load which is connected to the line (not correlated with ambient temperature)
• residential load profile (partially correlated with ambient temperature)
• leakage flow: which is dependent mostly on the load pattern in the near-by cities (which are partially Weather
correlated and specifically, dependent on the air temperature )
• wind power production of 500 MW (wind correlated).

Therefore, Current prediction becomes a far more complex problem and falls outside the scope and time-slot of the
current thesis project. Moreover, past prognosed Current values for other lines are not stored by Vattenfall so, a
correlation study between them and (for example) the line where the line sensor and weather station is installed on
cannot be undertaken. This study would have concluded in a dependency between the line with the sensor and other
line(s) which would have made possible predicting Current intensity values more accurately taking into account other
lines’ load current values.

Nonetheless, the Current may be predicted using one of the above mentioned predictive models built on weather and
possibly, temporal data (type of the day, day of the week, hour of the day, day of the year)- since it was observed a high
correlation between Current and hour of the day. However, the error rate of these models is expected to be large.
The four groups of models mentioned in section 3.8, each have their own parameters that the users have to set-up, tune
and balance in order to achieve the best predictive performance (e.g. mean absolute percentage error - MAPE) and a
better fitting and generalization of the model ((normalized) regression error characteristic – (NA)REC). NAREC is a
measure that quantifies how well the predictive model matches the observed and predicted values (it ranges from 0.00 to
1.00). MAPE is the averaged absolute percentage error with the reference of the measured value. Another indicator for
predictive models is the R2 values - it quantifies how a statistical model fits the given dataset. Out of 30 predictive
models, the top 8 models were chosen in order to evaluate them more in-depth with the goal of tuning their parameters
further to achieve the optimal performance and usability in terms of accuracy, precision, generalizability, interpretability,
business/technological/operational sense, ease of implementation and speed of execution. Additionally, the top 8
models were chosen in such a way that all three groups of models would be represented.
Despite the conclusions arrived on in the EDA section, the predictive models were built also using temporal data in
order to correlate the past prognosed ambient temperature and wind speed parameters to the measured values. The
temporal datasets would increase the accuracy of the models giving more input data for simulation. There is obviously a
correlation between the ambient temperature & wind speed and time as depicted later in this section. The temporal
variables are also used when modeling the Current, Conductor Temperature and NormalDLR for a higher accuracy.

32
5.3 PREDICTIVE MODELS FOR HISTORICAL OBSERVATIONS
The present paragraph will prove a higher accuracy of the predictive models on the initial simulated dataset: one-hour
resolution with the measured weather (solar radiation, wind speed, wind direction, ambient temperature), current and
conductor temperature values. The training dataset is defined from Sep., 15th 2014 to March, 31st 2015 and the test
dataset is the month of April, 2015. The following plots and figures will be based solely on the test dataset. The
predictive modeling will be based on the flowchart depicted in Figure 17. In this case, there are no forecasted weather
parameters but only historical measured observations.

Figure 17. The flowchart of predicting the NormalDLR using the historical observations

As shown in Figure 17, the measured observations of weather parameters at the line sensor and weather station location
were combined with the time parameters to build the predictive models for the load Current parameter. The time
parameters would increase the accuracy of the predictions. Having the predicted values for the load Current, the
predictive models for Conductor Temperature were built using also the historical weather observations as well as the
time parameters (to improve the predictions). With the predicted values of the Conductor Temperature values the
predictive models for NormalDLR were built using the weather and time parameters as well as the already
predicted/calculated load Current and Conductor Temperature parameters. Therefore, the predictive models built for
NormalDLR are a combination of the predictions from the previous predicted parameters. Building the models was
achieved on the training set whereas the testing of the models’ accuracies was assessed on the test dataset (month of
April, 2015).
In this paragraph it will be shown the importance of each models’ variable. One of the largest variable importance is one
of the Wind Direction within all models which ranges from 50% to 60% and only one model (cubist) features a 95%
variable importance of Wind Direction. But, as showed in Figure 18 and Figure 20, the cubist model is not the most
accurate and it might be biased by its intrinsic algorithm and caused by overfitting. Figure 19 shows the performance of
the predictive models on the test data. ”Formula” model stands for the regression model.

Figure 19. Performance of the Conductor Temperature predictive models on the test dataset

Figure 18. Predicted Conductor Temperature values on the test dataset


33
The goodness of fit indicator (R2) clearly shows a quite large coefficient among all predictive models in the case of
Conductor Temperature (Figure 18). Figure 20 illustrates a larger distribution of R2 for the models of NormalDLR. The
quite large R2 coefficients are due to the fact that the models were built on the historical observations. One can notice
much larger building times in this case- this is because the models had a large number of observations (they were built
on one-hour time-resolution). However once the models are built, running the models does not take much time.
Figure 19 shows how various predictive models try to match the real values of the Conductor Temperature dataset in the
test dataset. One can notice a rather well-fitted performance of the polynomial regression models (bagEarthGCV,
svmLinear) and a less accurate of the ANN (brnn) one. This is validated also by R2-values from Figure 18.
The NormalDLR predictive models have a rather large goodness of fit (R2) and low MAPE values (Figure 20). The
predictive models for NormalDLR feature a various range of variable importance depending on the chosen model:
• Solar Radiation: 0% - 20%
• Wind Direction: 12% - 95%
• Wind Speed: 14% - 100%.
o Note: All models registered 100% variable importance of Wind Speed but the linear model (Formula)
which registered 14%. This might be the cause of its poor accuracy depicted in Figure 3
• Ambient temperature: 0% - 62%
• Predicted Current: 0% - 2%
• Predicted Conductor Temperature: 0% - 39%
The variable importance coefficients were obtained simply by invoking one of the R’s function and they represent how
many times a variable was used within during the models’ iterations.
Therefore, one can conclude that the Wind Speed has the largest impact on the NormalDLR, which was also found
during the literature research. One can also identify a smooth matching performance of several predictive models which
overlap on each other due to rather similar MAPE and R2 values (Figure 20).

Figure 21. Performance of the NormalDLR predictive models on the test dataset

Figure 20. Predicted NormalDLR values on the test dataset 34


Figure 20 shows the error rate profiles (MAPE) of the predictive models on the test dataset in percent. Thus, one can
assess in detail the error distribution of the models. This is conclusive also with Figure 21 which shows a smaller MAPE
for the Polynomial Regression model (bagEarthGCV).
Figure 21 depicts the matching performance of several predictive models on the test dataset. Here one can see a highly
overlapped matching of several predictive models on the test dataset. Therefore, the models built during the training
session achieved a high performance which is proved also by Figure 20.
Another way of visualizing Figure 21 is to illustrate the real values of NormalDLR against the predicted values. Figure 23
depicts the predicted values vs. the calculated ones of NormalDLR of the most performant models on the test dataset.

Figure 23. Error rate profiles on the test dataset

Figure 22. Real observations of NormalDLR versus the predicted ones


Figure 22 represents a graphical visualization of the table from Figure 20 in terms of MAPE. But in this case MAPE is
broken down and the general error distribution profile can be investigated in more detail. The x-axis represents the
relative error rate in percents for each model in particular and the y-axis represents the relative distribution of the

35
models’ error rates. Figure 23 purposefully illustrates ones of the best predictive models for NormalDLR as well as the
model which achieved largest MAPE and lower R2 values (Figure 20) in order to investigate the accuracy differences
between the most and least accurate models.
A benchmarking of the predictive models is showed in Figure 24- which is another way of visualizing the Figure 22.
Error rate profiles on the test dataset. It illustrates the distribution of the error rates of the predictive models for
NormalDLR ran on the test dataset of April, 2015.

Figure 24. Error rate distribution across the predictive models for NormalDLR.

5.4 PREDICTIVE MODELS CONSIDERING THE FORECASTED DATA


In the 5.3 paragraph the predictive models were built solely on the historical dataset and they achieved a rather low error
rates for the predicted parameters. But in order to build the predictive models for real business and technical
applications, the datasets which include forecasted observations must be used. Therefore, this paragraph uses the
prognosis of several weather parameters and tries to predict the other parameters according to the weather and time
variables.
Nonetheless, there were built predictive models for Wind Speed, Ambient Temperature, Current, Conductor
Temperature and finally, NormalDLR. The predictive models for air temperature and for the wind speed were built to
model the air temperature and wind speed forecast against the real observations of the same parameters, respectively. In
this way, having as an input the air temperature (or, wind speed) forecast from SMHI for the respective location, it
would be possible to predict (with these models) the air temperature (or, wind speed) for the sensor location. These two
predicted parameters, air temperature and predicted wind speed will be used as input parameters to the latter predictive
models the last one being, the NormalDLR one. The performance of each model is explained and depicted below. There
was defined a training dataset: Sep., 15th 2014 to Feb., 28th 2015 (676 observations) and a test dataset of March, 2015
(116 observations). The training and test dataset were averaged in 6-hour datasets due to the fact that the prognosed
weather datasets were provided in this format. There was an attempt of building the predictive models to account for
every hour but there were found too large errors in prediction due to the used interpolation. The predictive models were
built for +24H and +48H time in advance for five parameters: Wind Speed, Ambient Temperature, Current, Conductor
Temperature and NormalDLR.
The predictive modeling algorithm is shown in Figure 25. This algorithm depicts the steps taken by each predictive
model. In order to calculate and predict the NormalDLR of the line, one has to solve a heat-balance problem on the
line. The heat balance equation consists of two major contributors: heating (due to Joule losses and the environment,
such as ambient temperature and solar radiation) and cooling (from convective due to the wind and radiative cooling).
Therefore, firstly there were built the predictive models for ambient temperature and wind speed which were correlated
and validated with the respective measured observations at the line sensor and weather station location. Then with the
predicted ambient temperature and wind speed values, the predictive models were built for the load Current parameter
combined with the time parameter (which proved to have a dependency on the Current). Next, having the predicted
36
values for ambient temperature, wind speed and load current, the predictive models were built for Conductor
Temperature. Lastly, the NormalDLR predictive models were built having the already predicted the rest of the
parameters.
EDA FOR FORECASTED DATA
Before building the predictive models for Ambient Temperature (AmbTemp) and for Wind Speed (WindSpeed) an
Exploratory Data Analysis (EDA) must be undertaken. A simple EDA was performed on the forecasted observations of
Ambient Temperature and Wind Speed in the time-period of September, 15th 2014 – March, 31st 2015.

Figure 25. Flowchart of the predictive models


The predictive models for air temperature and wind speed were built in order to have lower error rates for the final
models – the NormaldDLR models. On the left side of the Figure 26 one can observe the prognosed (+24H and +48H)
vs. measured values of Ambient Temperature while the right side illustrates the same for the Wind Speed. Remember
that the wind station of Måseskär and the air temperature station of Rörastrand are located at a distance of 35.2 km and
respectively, 22.91 km from the line sensor and weather station.

Figure 26. Exploratory Data Analysis on prognosed and measured observations

The black line represents the true measured values of the Ambient Temperature and respectively, Wind Speed
parameters. One can observe that the prognosed values do not differ to a large extent from the measured ones, in the
case of Ambient Temperature. A larger difference in values can be found in the case of the Wind Speed parameter. As
informed by Vattenfall and as found during the literature research, the wind speed (and direction) is highly dependent on
the air tunnel, spatial and time dimensions. These factors can be the causes of such a considerable difference between
the prognosed and measured wind speed values. Moreover, the wind weather station is located at a great distance from
the line sensor and weather station which also adds up to the forecast error.
As can be seen from the right side of the Figure 26, the wind speed data forecast from SMHI correlates badly with
the historical values from the line sensor and weather station’s sensor. If one were to use a persistence method to
produce forecast data for the wind speed instead, i.e. today’s wind speed data will be assumed to be tomorrow’s
wind speed then large error rates would be introduced in calculating the NormalDLR. As can be seen in Chapter 4,
the wind speed at the sensor location fluctuates quite considerably along temporal variables (more wind speed
37
fluctuations along temporal variables can be identified in the Annex). Moreover as has been identified in the
literature, the NormalDLR is the most influenced by the wind speed. Therefore, using a constant wind speed
parameter over the time will introduce large error rates.

From the predictive analysis point of view, the Ambient Temperature can be simply modeled as a linear problem
whereas the Wind Speed parameter does not resemble a linear dependency and more complex models must be
investigated. One would expect a higher error rate for the values prognosed 48 hours in advance as compared to the 24
hours. However, a reliable conclusion cannot be drawn from Figure 26- this issue would be discussed during the next
paragraph.
PREDICTIVE MODELS FOR DAY AND TWO-DAY AHEAD IN ADVANCE
Following the 5.3, now the predictions will rely solely on the weather forecast observations. This is done in order to
make possible, in future, to connect and deploy predictive capability on the monitored overhead line- the predictive
models must rely on the forecasted weather and technical parameters taken as reference. If one can predict the weather
and load current (for example), there can easily be built predictive models for conductor temperature and maximum line
current (DLR).
The following models represent the same models discussed in 3.8 and also some advanced versions of them. In the case
of ambient temperature and wind speed, the models will try to match as accurate as they can the prognosed values to the
actually measured ones from two different locations with the distance mentioned above. In the case of Current, the
models try to match the real values to the ones predicted with temporal and already predicted air temperature and wind
speed parameters.
The air temperature and the wind speed forecasts are available at two different geographical locations far from the
power sensor. The air temperature (and wind speed) predictive models were built in order to model and to predict the
air temperature (and wind speed) forecast to the power sensor location. The air temperature and wind speed forecasts
from SMHI served as input parameters while the output parameters were the power sensor’s historical observations.
Using only the forecast of the weather variables as input parameters in the latter models would result in large error rates.
When building the model for the ambient temperature it was decided to use a multivariate regression model which
consisted of prognosed +24H and +48H ambient temperature only because it was found a big correlation between the
prognosed and the measured values. Also because it was found that the more sophisticated models which took into
account also the temporal parameters did not register a higher accuracy. Figure 27 illustrates the performance of the
linear models on the test dataset for both +24H and +48H in advance. The black line represents the real measured
values whereas the blue and the green characteristics show the predicted ambient temperature profiles with the time
horizon +24 hours and +48 hours, respectively.

Figure 27. Predicted Ambient Temperature values on the test dataset +24H and +48H in advance.
38
Both time-horizons show a rather acceptable accuracy of the linear models, sometimes with some minor mismatching.
Predicting the ambient temperature +24H in advance yielded a MAPE of 62.95% whereas in the case of +48H time-
horizon it was: 72.46%. Some of the extremely large errors are attributed to air temperatures close to zero degrees
Celsius, since the MAPE is calculated by taking as reference the true/measured value of the air temperature (in this
case).

Figure 28. Fitted model’s predictions versus real measured values of the Ambient Temperature

Comparing by the time-horizons, the error profile distribution finds the same results as in [26] which stipulates that the
t+1 predictions will be more accurate than the t+2 ones.
Figure 28 illustrates the measured ambient temperature values versus the predicted ones for both time-horizons. One
can observe the greater accuracy of the +24H model over the +48H one as well as the intersection of them.
Predicting the wind speed was one of the greatest challenges given the distance span and the large variation in three
dimensions: temporal, vertical and horizontal axis (confirmed also by [18]). However, the next graphs illustrate the
performance of the most accurate models having the same time-horizons when trying to predict the Wind Speed. In
both time-horizons the model chosen was cubist. Even though it has lower R2 and higher MAPE values compared to
other models, it predicts higher wind speed values (but it does not exceed the range of the measured values) which affect
the most the profile of the NormalDLR. The high wind speeds profile adds two main advantages: it increases the DLR
profile and maximizes the wind power output of the turbines (until pitch control). The large MAPE and low R2 (Figure
30) values can be attributed to the above mentioned fact: the prognosis of the wind speed is a difficult task and it varies
a lot in several dimensions: time and space (horizontally and vertically). Here one can also observe that the models have
lower error rates for +24H time-horizon than in the case of +48H one, respectively for each predictive model with some
exceptions.
Figure 29 illustrates how both models try to match the real measured values of the wind speed. The training and testing
of the models were done, like in the case of the ambient temperature, 24 hours and 48 hours in advance. One can notice
a rather poor performance of the models in both time-horizons which is due to the fact that the wind speed is very
difficult to forecast- which makes it even more difficult to predict and model it +24H and +48H in advance taking into
account only the historical wind profile and temporal data. Indeed, the variable importance for both models shows a
100% correlation between past prognosed and the real measured wind speed values. As found in other research articles
and also during the thesis project research, the wind speed has the largest influence on the NormalDLR and therefore, it
must be predicted with a high accuracy. However, the error rate, as shown in Figure 30., in both models exceeds 50%.

39
Figure 30. Predicted vs. Measured Wind Speed values on the test dataset +24H and +48H in advance

Figure 29. Performance of the Wind Speed predictive models on the test dataset. Left: +24H. Right+48H
Predicting the Current was another challenge since it did not have any strong and well-defined pattern as discussed in
5.2. Nonetheless, there were built predictive models to fit the predicted values with the real ones. The models to predict
the current were built on the temporal dataset as well as on the already predicted ambient temperature and wind speed
values. One can observe, in Figure 31, a relative overlap of the both time-horizons’ predictions- it might be because of
the correlation between time and Current parameters. Also, the predicted values do not exceed the range of the true
values which shows rather acceptable models’ performances.
One can notice the various building times of the models. Some of the models required more time to be built whereas
others were much faster and achieved much higher accuracy. The models’ building time depends on the number of
observations and in this case the building time has rather low values- it is because there was used observations based on
6-hour time-resolution due to the fact that the prognosed weather data was acquired in this format. Also, it is believed to
be more feasible to use 6-hour time-resolution instead of hourly one. Furthermore once the models are built, running
them on the test or on the future datasets does not impose any time constraints.
The model chosen for the +24H time-horizon is svmLinear and for the +48H one is brnn because of the highest R2 and
MAPE values for the both time-horizons (Figure 31). As expected, the most important variable when predicting the
Current is the hour of the day- it was used every time when the models were being trained and built. The next important
variables are the type of the day variable which was used 25% of the times and the predicted Ambient Temperature and
days of the week- both were used around 12% of the times.

40
Figure 32. Performance of the Current predictive models on the test dataset. Left: +24H. Right +48H

Figure 31. Predicted vs. Measured Current values on the test dataset +24H and +48H in advance

The next predicted variable is the Conductor Temperature. The predictive model for the +48H time-horizon was
chosen the linear model (lm) because of its rather high R2 and smallest MAPE values. The chosen model for +24H time-
horizon is gam because of its larger R2 value. Figure 33 depicts a rather erroneous profile which is validated also by the
Figure 34 in the MAPE column.

Figure 33. Predicted vs. Measured Conductor Temperature values on the test dataset +24H and +48H in advance41
The conductor temperature models feature different variables’ importance profiles. The predictive model for +24H time
in advance features a 100% variable importance of both, the predicted Wind Speed and the temporal data (date and
time). The next important variables are the predicted ambient temperature (~39%) and current values (~32%).The
predictive model for +48H time in advance features a slightly different variable importance profile: predicted ambient
temperature- 20%, temporal data~ 10%, predicted current~2.3%, predicted wind speed ~7%.
The above mentioned variable importance profiles are validated also by the research articles discussed in the literature
research stage. All variables from above contribute with different weights to the heat balance of the line discussed at the
beginning of the chapter.

Figure 34. Performance of the Conductor Temperature predictive models on the test dataset. Left: +24H. Right
+48H
The last and the initially sought predicted output is the NormalDLR. Figure 36 illustrates the matching performance of
the models given the same time-horizons. The chosen model for +24H time-horizon is brnn because of its lowest
MAPE and rather high R2 values whereas the svmLinear model was chosen for the +48H time-horizon because of its
highest R2 and rather low MAPE values. Although, any predictive model must be chosen according to the criteria
mentioned above: interpretability, ease of understanding and use, accuracy, time, computational cost.

Figure 35. Performance of the NormalDLR predictive models on the test dataset. Left: +24H. Right +48H

Figure 35 shows the performance profiles of both models, +24H and +48H. Worth noticing is the fact that all the
models achieved a rather low MAPE value of approximately 10% (with two exceptions) which is below the error rate
required by Vattenfall of 20%. Of course, MAPE is the average error rate and in some points in time there can be a
much larger percentage error between the measured and the predicted value which is clearly shown also in Figure 36.

42

Figure 36. Predicted vs. Measured NormalDLR values on the test dataset +24H and +48H in advance
The predictive models feature different variable importance values for each time-horizons depending on the models
themselves. Furthermore, it is possible to extract variable importance coefficients from each above predicted parameter.
Approximate variable importance coefficients for NormalDLR predictive models are as follows:
• Predicted Wind Speed: 7% - 100%
• Predicted Ambient Temperature: 9% - 100%
• Date and Time: 3% - 63%
• Hour of the day: 0% - 27%
• Day of the week: 0% - 23%
• Predicted Conductor Temperature: 0% - 41%
• Predicted Current: 0% - 24%

Therefore, one can observe that DLR is a technical problem that can be solved using the heat-balance approach where
line cooling is provided by the wind speed and ambient temperature and the heating of the line is due to the load current
and the again, ambient temperature. The time parameters are to some extent influencing the DLR depending on the
predictive model.
The current paragraph did not take into account two weather parameters: Solar Radiation and Wind Direction because it
was found quite difficult to forecast and use them consequently in the predicting the three parameters (Current,
Conductor Temperature and NormalDLR). Moreover, SMHI has communicated that the Solar Radiation had different
units of measure as previously stated. Also, since the wind varies significantly as previously stated in three dimensions:
time and space (horizontally and vertically), it was not possible to predict either, the wind speed with a high accuracy or
the wind direction. Additionally, after having discussed my findings with SMHI and Vattenfall about wind direction
problem, it was concluded that the wind directions indeed varies considerably on a large distance span and of course, in
time.
Therefore, it was proved that the predictive models built with the historically measured values perform much better than
the ones built with the forecasted ones. Another reason might be that the latter ones were built using a six-hour time-
resolution whereas the former used hourly time-resolution. More conclusions follow in the next chapter.
5.5 SYNTHESIS OF PREDICTIVE MODELING
In the current chapter the predictive modeling approach is shown. Specifically, the predictive modeling approach is
applied to two different datasets: to one with the historical observations of weather and power-line related parameters
and to another one with the forecasted observations weather and power-line related parameters. The training and test
datasets differ for each of the datasets because the forecasted observations of air temperature and wind speeds was
available from Sep. 15th 2014 to Mar., 31st 2015. Therefore, the test dataset in “5.3 Predictive Models for historical
observations” is the month of April, 2015 while the test dataset in 5.4 Predictive Models considering the forecasted data
is the month of March, 2015.
The final predictive models (for NormalDLR) are built using the predictions obtained from the models built earlier in
the process of the flowcharts shown in Figure 17 and Figure 25. The same logic is applied when building the models for
conductor temperature and so on. The air temperature and wind speed models are built the first because they are
considered the first parameters to affect the thermal balance of the power line i.e. the drivers of the power line’s thermal
stability.
The models built in 5.3 experience a high accuracy and goodness of fit because:
• They are built on the actual weather and power line-related parameters observations.
• They are built on hour-based observations giving a larger training and testing dataset

The models built in 5.4 are less accurate than the ones in 5.3 because:
• They are built on the forecasted weather and power line-related parameters given the one- and two-day ahead
forecast of weather parameters.
• The training and testing datasets consist of less observations (compared with the ones in 5.3) because the
weather forecast was available on a six-hour time-resolution and the latter parameters in the flowchart (Figure
25) were also built for six-hour time-resolution.

43
• The weather forecast itself imposes a forecast error.
• The models built for forecasting the line current intensity in 5.4 have large error rates because, as shown in 5.2
in the second paragraph, the load current is difficult to forecast for a number of reasons. But since the Current
variable importance in the predictive models built in 5.4 is rather low compared to other variables, the
NormalDLR models error rates are not that large. Large error rates result also in predicting the conductor
temperature and wind speed (the latter is explained in EDA for forecasted data).

44
6 DISCUSSIONS
In this section the findings of the thesis are summarized and discussed. Also, the summary of the both prediction
approaches is noted. Paragraph 5.3 and 5.4 discussed about different approaches of building and validating the predictive
models. The test datasets provided a reality-check for the predictive models. Moreover, some of the models that
performed well during the training session, did not register the same performance tendency in the test dataset and
therefore, the models that performed well on the test dataset were chosen. Both approaches were found to be successful
in achieving meeting the requirements set by Vattenfall.
Predictive modeling is a viable and superior alternative to the legacy IEEE and CIGRE based DLR standards
calculations. Therefore, a robust, accurate, low-cost, user-friendly and easy to understand solution can be implemented
on the line where the power line sensor and weather station is installed. This would consequently lead to some
advantageous impacts:
1. The line’s ampacity/capacity will increase by tremendously. This in response will diminish the problem of the
congested power lines and therefore, the bottlenecks’ problems will be mitigated.
2. The share of RES will increase leading to less RES curtailment and cheaper electricity prices.
3. Save financial resources from avoiding power system’s expansions by building additional lines, installing new
equipments. Also, the additional ancillary, operating and maintenance cost will be avoided.
4. Increasing grid/line’s capacity factor, optimizing the system efficiency by utilizing the same line with a greater
rating.

In order to maximize the benefit of using DLR within a higher operational level, it must be used embedded with
technologies that can influence the power flows such as: demand side management (DSM; at the distribution level),
renewable energy control, phase shifting transformers, FACTS and/or HVDC for long lines. [8]
An important character of this thesis project and its results is that it engages open-source, low-maintenance and
relatively easy-to-understand software and programming language. In this project was used the open-source software R
and the graphical user interface RStudio, both of which are free, highly appreciated and valued in the online community
[27, 28, 29, 30] by academics, statisticians, mathematicians, engineers, data scientists and so on [31, 32, 33]. Moreover,
there are a lot of packages developed by the online community (professors, researchers, etc.) that can be downloaded
online from “The Comprehensive R Archive Network” [27]. The online resources of stackoverflow [34] and GitHub
[35] make it easy to contact any package developer and solve any problem related to programming in the R language.
This thesis project was performed on actual operational measured variables which makes it more applicable in the real-
world conditions compared to virtual simulations carried on in a number of papers within the literature research library.
Moreover, if there were more data available that would have increased the training and testing datasets which could
potentially, improve the model’s performance but could require more computational time.
6.1 PREDICTING NORMALDLR FROM FORECASTED DATA
The paragraph 5.4 discussed about the flowchart, the various data-sources, the predictive models (introduced in 3.8) and
their outcomes. Having the forecasted ambient temperature and wind speed on two different time-horizons (day-ahead
and two day-ahead) and historical data about line’s Current intensities, Conductor Temperature and NormalDLR, there
were built, trained and tested predictive models for each of them. The ambient temperature model was built on the
forecasted data; the wind speed models were built on forecasted as well as on the temporal data. The Current models
were built on already predicted ambient temperature and wind speed as well as on the temporal parameters. The
Conductor Temperature models had the same structure as the ones used in Current models with the additional predicted
Current values. And lastly, the NormalDLR models have the same structure as the Conductor Temperature ones plus
the already predicted Conductor Temperature values. The predictive models built before the NormalDLR ones are
important and must be built in order to have all the parameters in the power line thermal stability balance equation.
Failing/ignoring in predicting/forecasting one of the parameters could, on one hand, introduce large errors in predicting
the NormalDLR which would in return jeopardize the line’s and therefore, the power system’s operational stability and
reliably. On the other hand, forecasting lower predictions for NormalDLR (because of not having predicted the one or
more parameters in the thermal stability equation) would fail in leveraging the full potential of DLR, in terms of weather
conditions and power system’s operation (mostly, wind generation capacity), in a given time-frame.
In the paragraph 5.4 it was shown that the predictive models for each parameter featured various range of goodness of
fit and error rate profiles. Furthermore, all the predicted parameters’ values were benchmarked with the real measured

45
and calculated (NormalDLR) observations. Therefore, the models were validated on the test dataset and each models’
error rate is registered.
The limit set by Vattenfall of 20% error rate was not exceeded in the case for NormalDLR. Given an approximate error
rate of 11% for the most accurate models, the models’ error rates are far below the limit requested by Vattenfall.
Therefore, the predictive models meet the requirements for implementing them for the specific line.
Additionally, the predicted DLR values in both models are lower than the ones calculated by Vattenfall and at the same
time much higher than the measured Current values and therefore, imposes less thermal and mechanical stress on the
line/system which leads to a more reliable operation of the power system and at the same time to a higher line’s
electrical efficiency (given that the transmitted energy is much higher than the load current).
6.2 PREDICTING NORMALDLR FROM HISTORICAL DATA
Unlike paragraph, 5.4, the paragraph 5.3 did not use any forecasted data-streams. It used only the historical datasets to
build the Current, Conductor Temperature and NormalDLR parameters. The models built for Current as well as their
performances is not registered in the report because they feature similar poor performance as ones from the paragraph
5.4.
In this paragraph it was shown the greater predictive models’ performances on the test dataset compared to the ones
from paragraph 5.4. There are several reasons that account for this:
• In paragraph 5.3, the time-resolution is on hourly basis and therefore, there is a large number of observations.
Given a larger number of observations, the models were trained more rigorously and featured different
variables importance’s coefficients relative to the ones from paragraph 5.4
• Paragraph 5.3 operated only with historical measured datasets whereas the paragraph 5.4 used both, real and
forecasted values. Having the forecasted weather parameters on different time-horizons and from different
geographical locations, influenced the models’ goodness of fit and their error rate profiles. Any predictive
model trained on historical data will always have better performance over the ones which were built using the
forecasted data.

Therefore, the models built in this paragraph (5.3) are far much better than the ones registered in paragraph 5.4 and also
can be implemented on the line where the line sensor and weather station is installed.
6.3 IMPROVING MODELS’ PRECISIONS
The predictive models performed quite well using the forecasted weather data so they can be easily implemented. But, if
Vattenfall wants to increase even more the accuracy of the models they might be interested in considering enabling local
forecasting possibility of the weather parameters. Having a local (at the power sensor) weather forecasting capability
would improve the NormalDLR models’ precisions which could approach the performance of the models from 5.3. A
more accurate model would leverage to the fullest the weather conditions and power system’s operational potential
(especially, in terms of wind generation capacity).
However, the capital/investment and operational/maintenance expenses must be accounted for when considering
installing local weather forecasting capabilities. A study should be undertaken in order to assess these costs, the net
present value of installing such capabilities and the outcomes of improved models’ precisions which would predict more
accurately the NormalDLR and in turn would allow harnessing more wind power, increase the line’s/system’s efficiency,
optimize power system’s flows and decrease emissions and end-user electricity price.
A thorough study could be undertaken in order to calculate the long-run cost of installing local weather forecasting
taking into account the improved accuracy of the models (with local forecasting) and therefore, assessing to the fullest
the weather conditions and the power line’s potential. Harnessing more wind power could be beneficial for end-users
and also to the environment.
Important is to pinpoint the upper limit of the NormalDLR which if exceeded could result in power line’s failure which
in turn would lead to disastrous damage to the power system and to the society. From the power system standpoint this
would result in a local or general blackout and possibly to equipment failure. From the society standpoint, this failure
could result in loss of electricity in the hospitals, at industrial sites, schools and so on.
The error rates in the case of NormalDLR for the forecasted data are much lower than the ones for Conductor
Temperature. The reason behind this is that, as reported in 4.3, Conductor Temperature strongly correlates with Air
Temperature, Solar Radiation and Current and less with Wind Speed – parameters that make up the thermal equilibrium
46
across the line and of course the sensor. The Current itself, as was reported in 5.4, was quite difficult to model and to
predict for various reasons. On the other hand, the NormalDLR features lower error rates because it strongly depends
on Wind Speed and less on Conductor Temperature, as reported in 4.3.

47
7 CONCLUSIONS
High wind speed periods represent the largest wind power production potential, thus increasing wind turbines’ load
factor and efficiency. This power would displace some of the conventionally-produced power and therefore, would drive
the emissions and the end-user prices down. The old IEEE and CIGRE standards can be used to calculate the DLR
(and therefore, increase line’s/system’s transmission capacity and efficiency) of a line/system but the advanced statistical
and machine learning techniques and models used in this project introduce accuracy, speed and robustness of the
process of calculating DLR one- and two-days ahead. The old standards calculate the DLR taking into considerations the
azimuth, zenith, the number of the day of the year and trigonometric equations; whereas, the predictive models calculate
the DLR by finding historical patterns and trends between DLR and the input parameters (predictors), which represent
the terms in the thermal balance equation of the power line. In this way, the thermal balance of the power line is
calculated fast and in a straightforward way (depending on the models’ interpretability) with the models already built
which consist of the parameters from the line’s heat-balance equation. In this way, the predictive models built during this
project represent a viable and superior alternative, compared to the old standards, of calculating DLR with a specific
time in advance.
Larger transmission capacity means more power transmitted through the same power line and also it means larger power
system efficiency (since there is more power transported using the same system). In this way, this means that not only
the wind power can be more largely capitalized/harnessed and consumed by the end-users (at a lower price, given the
subsidized renewables) but also, increased system capacity means postponing or canceling at all the plans for expanding
the power system by designing, building, installing and maintaining the new power system equipment such as:
transformers, lines, surge-arresters and other protections systems, substations, insulations and so on. Furthermore, with
more equipment in the power system the system’s reliability decreases (because now there are more apparatus that can
fail at a particular time). Therefore, in this way DLR represents not only an economical viable solution of increasing
system’s capacity but also a reliable one (as long as the maximum conductor temperature is not exceeded). In the current
project, it was demonstrated that the DLR predictions on both time-horizons never overpassed the historical values.
The approach of building and running the DLR models can be applied also to other power lines (which are connected to
wind farms/turbines but not only) given that enough data (about the heat-balance equation terms) is available. In this
way, the predictive models can be applied to more power lines which would increase even more the wind power
potential harnessed (driving the prices even lower) as well as power system’s transmission capacity and efficiency.
Because of an increased line capacity, there also would be less line congestion which would affect the electricity prices
between zones.
The ability of calculating the DLR time in advance would optimize the power market operation on both, SPOT and
futures markets, and it will also empower its actors to operate accordingly knowing time in advance the market’s merit
order with increased precision (given the increased transmission capacity and wind power output forecast from wind
producers).
7.1 OUTCOMES
The expected outcomes of implementing the predictive models on the specified power line range from the technical to
the economic standpoints:
• Increase system capacity
– increase grid utilization factor
– reduce lines’ congestion
– avoid/postpone building new infrastructure
• Increase RES share in the energy mix
– more RES generation units
– less/no curtailed RES
– decrease GHG emissions
– lower consumer prices
• Optimal electricity market operation
– knowing time in advance the supply, demand, price and the quantity that would be traded on the SPOT
market based on the predicted and updated merit order
– hedge the energy consumption risk
– some of the predicted power could be also used as an ancillary service

48
7.2 REPRODUCIBILITY
A significant advantage of the predictive models is that they can be reproduced for any location given that the datasets
are available. The models of course will feature different ranges of parameters but the approach and the class of the
models will be similar. Furthermore, any energy producing and/or transmission and distribution company can
implement these models on their power lines contingent on the availability of the datasets. Obviously, each line/site
must be tackled in particular due to the fact that each site/line feature various range of parameters such as: dimension,
datasets available and their time-resolution, ratings and so forth.
In conclusion, the power system’s capacity factor can be hugely increased if the predictive models are implemented on
the most (if not all) power lines. But, the DLR problem imposes also several constraints on the grid’s equipment
(power/current/voltage transformers, reactive and capacitive units). All these must be considered holistically as their
mean time to failure (MTTF) and mean time to repair (MTTR) impact vastly on the power system’s reliability, security
and availability.
All the data analysis scripts implemented in this project can be found on the following link:
https://github.com/nickdoban/Dynamic-Line-Rating-DLR

49
8 SUGGESTIONS ON FUTURE WORK
The data made available by Vattenfall and SMHI led to building of the predictive models for two time-horizons, +24H
and +48H. The predictive models achieved the requirements set by Vattenfall and namely, robust, accurate, fast and easy
to understand.
However, in order to improve the accuracy of the predictive models even more the following steps should be
considered:
• Local weather forecasting capacity installed near or within the line sensor. Paragraph 5.3 showed that if the
weather data is collected from a close proximity from the line, the predictive models are much more accurate
and can have a much higher a goodness of fit.
• A more complex type of predictive modeling was not discussed in the current thesis project. The ensemble
modeling plays a role of an “umbrella” for several predictive models. It can incorporate a number of predictive
models each with its own tuning parameters. Therefore, the accuracy and time of the final (ensemble) model(s)
can be highly improved.
• The predictive ability might be configured such as to implement/allow continuous learning of the predictive
models. Thus, the models can be ran (on choice) several times a month/year in order to train them and to
achieve an improved performance of the updated models. Given a rather low building time of the predictive
models, the continuous learning approach might be considered running the models once a week, month,
season, year and so forth.

The above bullet-points represent several suggestions for improvements of the predictive models but they are not
mandatory tasks to be accomplished.
Additionally, as stated in the previous chapter, these types of predictive models can be implemented at any site and
within any power line of any power system. The only requirement is the availability of the data that would be used to
build, train, run the predictive models and assess their performances. In this way, some of the power lines (if not all of
them) can be equipped with predictive capability and make the power system more efficient leading to achieving the
outcomes mentioned in paragraph 7.1.

50
9 REFERENCES

[1] Swedish Energy Agency, "Energy in Sweden - facts and figures 2012.," Swedish Energy Agency,
2012.
[2] International Energy Agency, "OIL & GAS SECURITY. Emergency Response of IEA
Countries.," International Energy Agency, pp. 1-19, 2012.
[3] Swedish Energy Agency, "Energy in Sweden 2013," Swedish Energy Agency, p. 41, 2013.
[4] International Energy Agency, "Sweden: Electricity and heat for 2012.," International Energy
Agency, 2012. [Online]. Available:
http://www.iea.org/statistics/statisticssearch/report/?country=SWEDEN&product=electricit
yandheat&year=2012. [Accessed 16 January 2015].
[5] NORDSTJERNAN, "Millions breaking wind day records," NORDSTJERNAN, [Online].
Available: http://www.nordstjernan.com/news/nordic/2300/. [Accessed 16 January 2015].
[6] EU Action, "Climate Action," European Commission, 24 September 2015. [Online]. Available:
http://ec.europa.eu/clima/policies/index_en.htm. [Accessed 24 March 2015].
[7] P. Watkiss, T. Downing, C. Handley and R. Butterfield, "The Impacts and Costs of Climate
Change.," European Commission DG Environment, 2005.
[8] P. Schell, "Dynamic Line Rating (DLR): A Safe, Quick, and Economic Way to," in Renewable
Energy Integration: Practical Management of Variability, Uncertainty, and Flexibility in Power Grids,
Academic Press: Elsevier, June 2014, pp. 405-411.
[9] "Climate of the World: Sweden - Weather UK," WeatherOnline Ltd. - Meteorological Services,
[Online]. Available: http://www.weatheronline.co.uk/reports/climate/Sweden.htm. [Accessed
16 January 2015].
[10] B. M. Weedy, "Dynamic Current Rating of Overhead Lines," Electric Power Systems Research, no.
16, pp. 11-15, 1989.
[11] A. Bergstrőm, U. Axelsson and V. Neimane, "Vattenfall Goes Real Time," T&D World
Magazine, 25 November 2014. [Online]. Available: http://tdworld.com/substations/vattenfall-
goes-real-time-0. [Accessed 24 February 2015].
[12] IEEE, "738-2006 - IEEE Standard for Calculating the Current-Temperature of Bare Overhead
Conductors," no. Jan. 30 2007, pp. c1 - 59, 2007.
[13] B. Xu, A. Ulbig and G. Andersson, "Impacts of Dynamic Line Rating on Power Dispatch
Performance and Grid Integration of Renewable Energy Sources," Innovative Smart Grid
Technologies Europe (ISGT EUROPE), 2013 4th IEEE/PES, pp. 1-5, 6-9 Oct. 2013.
[14] C. J. Wallnerström, Y. Huang and L. Söder, "Impact From Dynamic Line Rating on Wind
Power Integration," IEEE TRANSACTIONS ON SMART GRID, vol. 6, no. 1, pp. 343-350,
2015.

51
[15] D. M. Kim, J. M. Cho, H. S. Lee, H. S. Jung and J. O. Kim, "Prediction of Dynamic Line
Rating Based on Assessment Risk by Time Series Weather Model," in Probabilistic Methods
Applied to Power Systems, 2006. PMAPS 2006, Stockholm, 2006.
[16] Y. Yang, R. Harley, D. Divan and T. Habetler, "Real-Time Dynamic Thermal Rating
Evaluation of Overhead Power Lines based on Online Adaptation of Echo State Networks,"
Energy Conversion Congress and Exposition (ECCE), 2010 IEEE, pp. 3638 - 3645, 2010.
[17] J. Fu, J. Morrow and S. M. Abdelkader, "Modelling and Prediction Techniques for Dynamic
Overhead Line Rating," Power and Energy Society General Meeting, 2012 IEEE, pp. 1-7, 2012.
[18] H. M. Nguyen, J. L. Lilien and P. Schell, "Dynamic line rating and ampacity forecasting as the
keys to optimise power line assets with the integration of res. The European project Twenties
Demonstration inside Central Western Europe," in Electricity Distribution (CIRED 2013), 22nd
International Conference and Exhibition on, Stockholm, 2013.
[19] D. Morrow, J. Fu and S. M. Abdelkad, "Experimentally validated partial least squares model,"
Renewable Power Generation, IET , vol. 8, no. 3, pp. 260-268, April 2014.
[20] Topepo, "Multivariate Adaptive Regression Splines Models," [Online]. Available:
http://topepo.github.io/caret/Multivariate_Adaptive_Regression_Splines.html. [Accessed 10
April 2015].
[21] P. Paisitkriangkrai, "Linear Regression and Support Vector Regression," 2012 October 24.
[Online]. Available: http://cs.adelaide.edu.au/~chhshen/teaching/ML_SVR.pdf. [Accessed 7
April 2016].
[22] Support Vector Machines for Regression, "Support Vector Machines for Regression," [Online].
Available: http://www.svms.org/regression/. [Accessed 7 April 2016].
[23] Support Vector Machine Regression, "Support Vector Machine Regression," [Online].
Available: http://kernelsvm.tripod.com/. [Accessed 7 April 2016].
[24] C. Sandels, J. Widen, L. Nordstrom and E. Andersso, "Predicting Electricity Consumption in a
Swedish Oce Building from Weather and Occupancy: Data Analytic Approach," pp. 1-13,
2015.
[25] M. North, Data Mining for the Masses, 2012.
[26] M. A. Bucher, M. Vrakopoulou and G. Andersson, "Probabilistic N-1 security assessment
incorporating Dynamic Line Ratings.," Power and Energy Society General Meeting (PES), 2013
IEEE, pp. 1-5.
[27] CRAN, "Comprehensive R Archive Network," [Online]. Available: https://cran.r-project.org/.
[28] R-bloggers, "R-bloggers," [Online]. Available: http://www.r-bloggers.com/.
[29] R-statistics blog, "R-statistics blog," [Online]. Available: http://www.r-statistics.com/.
[30] inside-R, "inside-R," [Online]. Available: http://www.inside-r.org/blogs.
[31] Brigham Young University, Department of Statistics, "Brigham Young University, Department

52
of Statistics," [Online]. Available: http://statistics.byu.edu/content/r-and-rstudio.
[32] George R. Brown School of Engineering, "George R. Brown School of Engineering," [Online].
Available: https://statistics.rice.edu/feed/FacultyDisplay.aspx?FID=3340.
[33] Newcastle University, "School of Mathematics and Statistics," [Online]. Available:
http://www.ncl.ac.uk/maths/students/teaching/installingr/.
[34] stackoverflow, "stackoverflow," [Online]. Available: http://stackoverflow.com/.
[35] GitHub, "GitHub," [Online]. Available: https://github.com/.
[36] University of Wisconsin-Madison, "Computer Sciences User Pages," [Online]. Available:
http://pages.cs.wisc.edu/. [Accessed 24 March 2015].
[37] The Comprehensive R Archive Network, "The Comprehensive R Archive Network," [Online].
Available: https://cran.r-project.org/.

53

You might also like