1. Introduction
To reduce the consumption of high-carbon-emission fossil fuels, the renewable energy sector has witnessed rapid growth. Photovoltaics (PVs) and wind turbines (WTs) have been extensively constructed, and they have been integrated into the distribution network in the form of distributed generation (DG), gradually transforming the conventional passive distribution network into a complex active distribution network [
1,
2,
3].
Due to the significant influence of natural environmental factors such as solar radiation, PVs exhibit instability, marked by pronounced randomness and fluctuation [
4]. Similarly, WTs are constrained by natural factors like wind speed, resulting in strong randomness and uncertainty [
5]. The substantial integration of DG and the fluctuating loads introduce various sources of uncertainty into the distribution network system, significantly affecting its stable operation. This elevates the risk of voltage violations at network nodes, posing a considerable challenge to voltage control throughout the entire grid.
Hence, scholars have embarked on research pertaining to voltage violation risk assessment in distribution networks with DG [
6,
7,
8]. However, the majority of these risk assessment studies primarily rely on historical data of DG to determine “static risks” of voltage violations at network nodes. They do not take into account load variations at different times and the influence of external environmental factors on DG. For instance, the load levels are typically lower during the late night compared to the afternoon, and the DG output can significantly differ under varying environmental conditions. These factors can lead to substantial variations in voltage violation risk in the same node at different times on the same day or at the same time on different days. Consequently, it becomes imperative to propose a dynamic risk assessment methodology for voltage violations in distribution networks with DG which can offer information for the coordination and control of network node voltages. Given the complex operating conditions in active distribution networks, deterministic load flow analysis methods often yield imprecise results, necessitating the use of probabilistic load flow (PLF) for analytical computations.
PLF analysis methods can be broadly categorized into three main types: the Monte Carlo Simulation (MCS) [
9,
10], the Point Estimate Method (PEM) [
11,
12], and analytical methods [
13]. The MCS involves the simulation of numerous random variable samples through sampling, resulting in high accuracy but consuming a significant amount of time for calculations. The PEM approximates the moments of the output variables based on the characteristics of input random variables, but its computational efficiency is closely tied to the number of variables. Analytical methods rely on relationships among input random variables to calculate the probability statistical characteristics of output random variables. The key challenge of analytical methods lies in handling complex convolution calculations. Given the intricacies of convolution operations, it is possible to enhance computational efficiency by substituting convolution calculations with cumulant operations.
Currently, there are three main issues with using the probabilistic load flow based on the cumulant method (PLFCM) for dynamic risk assessment of voltage violations in distribution networks with DG.
The first issue pertains to the accuracy of the probability models for random variables such as load and DG, which directly affect the results of load flow calculations. There are currently two main approaches to describing probability models for random variables: parameter-based methods [
14] and non-parameter-based methods [
15]. Parameter-based methods begin by assuming that the research subject follows a certain probability distribution based on empirical observations. Then, they calculate the relevant parameter from actual samples to obtain a complete probability density function (pdf). However, due to the randomness of DG and the volatility of load, common probability density forms such as the Beta distribution and Weibull distribution often fail to accurately reflect the actual probability distribution, leading to significant errors. Non-parameter-based methods, on the other hand, do not require prior assumptions about the underlying probability distribution model or prior knowledge. Instead, they directly analyze the probability distribution based on actual data. Kernel density estimation (KDE) is the most widely used non-parameter method [
16]. However, a notable characteristic of random variables like load and DG is that they are bounded. Traditional KDE methods, when applied to bounded data, can exhibit boundary effects that subsequently impact the results of probabilistic load flow. Additionally, another prominent feature of DG is its susceptibility to external environmental factors. As is well known, wind speed and global horizontal radiation are the most important factors affecting the output of WT and PV, respectively. How to achieve dynamic probity density estimation for DG remains an unresolved challenge under the influence of both.
The second issue pertains to the prerequisite that input variables must be mutually independent for the application of PLFCM. Due to the consistent characteristics of renewable energy sources in the same or neighboring areas, unit outputs often exhibit similar trends. The existence of correlations between input variables makes it impractical to directly employ this method for probabilistic load flow, necessitating the transformation of correlated outputs into independent ones. Presently, two widely used methods for achieving this are the Orthogonal inverse transform [
17] and the Nataf inverse transform [
18]. The Orthogonal inverse transform is straightforward and widely applied, while the Nataf inverse transform uses the correlation coefficient matrix, taking into account changes in equivalent correlation coefficients before and after transformation. However, both of these methods rely on the Pearson correlation coefficient and cannot capture nonlinear relationships among DGs. Moreover, they require variables to adhere to specific distribution types to achieve higher accuracy. Furthermore, the most challenging aspect of the Nataf inverse transform is determining the equivalent correlation coefficients. Although some scholars [
19] have provided empirical formulas for a certain number of equivalent correlation coefficients for the reference of other researchers, they only cover a limited range of common probability distributions. Addressing how to achieve a more precise and widely applicable independent transformation for random variables is an issue that requires resolution.
The third issue is the establishment of a reasonable risk assessment framework, which is a key factor in achieving voltage violation risk assessment in distribution networks with DG. Risk is a comprehensive measure that combines the probability of an event occurring with the severity of its consequences [
20]. When conducting risk assessments, both aspects should be taken into consideration. Traditional risk assessments often focus solely on the probability of a voltage violation occurring, without considering the severity of its consequences. Voltage violation events with a low probability but significant impact should receive more attention than those with a higher probability but less significant consequences. How to accurately portray the severity of losses resulting from risk events is a currently unresolved issue.
To address the aforementioned issues, this paper proposes a dynamic risk assessment method for voltage violations in distribution networks with DG. Firstly, a pdf estimation method is proposed, based on boundary kernel density estimation (BKDE), to overcome the problem of errors at boundary points when handling bounded data. Moreover, considering the significant impact of wind speed and global horizontal radiation on DG, conditional density is introduced to enable the dynamic probability density estimation of DG based on numerical weather prediction (NWP). Secondly, an independent transformation method based on the Rosenblatt inverse transform is proposed. This method accurately characterizes the correlations between DG variables, achieving an independent transformation of variables, and laying the foundation for subsequent PLFCM. Thirdly, a voltage violation severity index based on the utility function is proposed. By combining this index, an integrated risk assessment framework is constructed on the basis of the probability of voltage violations. This framework quantitatively analyzes the dynamic risk of voltage violations in distribution networks with DG at both the node and system levels. Finally, simulation tests on the IEEE-33 system validate the rationality and effectiveness of the proposed method.
2. Pdf Estimation Based on BKDE
KDE is a data-driven, non-parametric method for estimating pdf and has the advantage of being unaffected by the choice of prior models [
21]. It is commonly used for data fitting when it is challenging to directly obtain the underlying pdf. This method is suitable for analyzing the probability characteristics of load and DG. Hence, in this paper, the KDE method is employed for estimation.
Given a set of independently and identically distributed data,
X1, ...,
XN, with an unknown density function
f(
x), the KDE is calculated as follows:
where
K(∙) represents the kernel function;
N represents the sample size;
h represents the bandwidth;
x represents the point at which the kernel density estimate is being calculated; and
Xi represents the
ith sample.
The choice of the kernel function can indeed influence the results of KDE. Generally, the direction of the sample
Xi from the density point
x does not affect the estimation, and the closer the distance from
x, the more weight should be assigned to
Xi. Therefore, a unimodal kernel function centered at 0 is typically chosen. In this paper, the Epanechnikov kernel function is used:
where
I(∙) represents the indicator function, which has a value of 1 when
z satisfies the condition and 0 otherwise. Given
z = (
x −
Xi)/
h and substituting (2) into (1), we can obtain the complete formula for the KDE.
2.1. pdf Estimation for Load
Assuming in the distribution network there are
NL load nodes, the load data for the distribution network at time
i, denoted as
, can be expressed as follows:
where
represents the local load power measurement for node
k at time
i. Additionally, if there are historical data for the load power for a total of
T time, the historical load data,
PL, can be described as follows:
In (4), the data for each load column can be decomposed, based on daily patterns, into two components: the basic data with a daily cycle and the random fluctuation data. Therefore, when performing PLFCM, the pdf for node
k at time
t can be calculated using the power data set for different days at the same time
t, denoted as
.
where
Td represents the number of monitoring sample points within one day.
However, a significant characteristic of load is that it is bounded, and its power consumption is always greater than zero. Therefore, when performing KDE, it is necessary to consider the impact of boundary effects.
The expression for the bias of the KDE is as follows:
where
= ∫
u2K(
u)
du, and
K(
u) is the kernel function as shown in (2).
It can be observed that this bias diminishes as the bandwidth
h decreases, approaching 0 at a rate of
h2. However, when the pdf has a boundary at 0, the above expression is no longer applicable. Instead, it is replaced by:
where
ai(
x) =
uiK(
u)
du. It can be observed that when
x ≥
h,
a0(
x) = 1 and
a1(
x) = 0, leading to no difference in bias from (6). However, when 0 ≤
x <
h, both
a0(
x) and
a1(
x) are non-zero, implying that bias is always present at the boundary.
In this case, when another kernel function
L(
x) is used to estimate
f(
x) once again, there is as follows:
where
b0 and
b1 are similar to
a0 and
a1 from (7), except that (8) is specific to
L(
x). By performing a linear combination of (7) and (8), we can obtain:
Equation (9) can be equivalently viewed as an estimation of
f(
x) using a new kernel function. This new kernel function is as follows:
In particular, if
L(
x) is taken as
x*
K(
x),
KB(
x) will have a simple form:
When x ≥ h, the new kernel function KB(x) is the same as K(x), but when 0 ≤ x < h (i.e., at the boundary), it adjusts the original kernel function. When x < 0, the estimated value is taken as 0.
Based on (1), (2), and (11), we can derive the calculation formula for BKDE by substituting (5), which provides the pdf for node k at time t.
Taking an exponential distribution with a parameter of 1 as an example, the performance of BKDE is tested. One thousand random numbers are generated from this exponential distribution, and pdf estimation is performed using both KDE and BKDE.
Figure 1 shows the results of these two estimation methods compared to the actual pdf.
Observing
Figure 1, it can be seen that, compared to the true density function, the results obtained by KDE exhibit a noticeable decrease at the boundary. In contrast, the results obtained by BKDE closely match the real situation, and the bias issues near the boundary have been significantly improved.
2.2. Conditional pdf Estimation for DG
For DG, its generated power is constrained between 0 and its maximum capacity. After normalization to the maximum capacity, its double boundary is [0, 1]. Therefore, when calculating the kernel function as shown in (11), the parameters
ai(
x) are:
In addition to its bounded nature, another significant characteristic of DG is its strong correlation with external factors. For instance, PVs might exhibit significant variations in power data at different times of one day due to differences in radiation. Therefore, the probability model for DG at time t can be characterized jointly by the conditional pdf and the predicted values of conditional variables at time t.
The conditional pdf for DG at time
t can be represented as:
where
pt is the DG output at time
t,
yt is the predicted value of the conditional variable at time
t,
is the conditional pdf for DG,
is the pdf for the conditional variable, and
is the joint pdf.
Based on Equations (1), (2), and (11), the conditional pdf for DG can be represented as:
where
Pi represents historical samples of DG,
Yi represents corresponding historical samples of conditional variables,
h1 and
h2 are bandwidth parameters, and
(∙) and
(∙) are kernel functions.
Different types of DG correspond to different conditional variables. In this paper, the conditional variable for PVs is global horizontal radiation, while for WTs, it is wind speed.
5. The Process of the Proposed Method
The method proposed in this paper enables dynamic risk assessment of voltage violation in distribution networks with DG, which is crucial for coordinated voltage control. The specific flow chart of the method is depicted in
Figure 2 and comprises three main steps:
Step 1: Probability Density Function Estimation. Accurate pdfs are obtained through BKDE, utilizing historical data for both load and DG, while also accounting for the influence of external factors.
Step 2: Probability Load Flow Calculation. Considering the correlation between DGs, independent DG variables are obtained through the Rosenblatt inverse transform. Based on these variables, PLFCM is applied to compute the probability distribution of node voltages.
Step 3: Risk Assessment. The severity of voltage violation at the node is considered through the application of the utility function, and this, combined with the results of PLFCM, yields dynamic risk assessment outcomes at both the node and system levels.
6. Case Study
Due to the confidentiality requirement, real grid data are difficult to obtain, so the case study in this paper is conducted using an improved IEEE-33 system, as depicted in
Figure 3. A WT with a capacity of 600 kW is connected to node 26, while two PVs are connected to node 33 and node 15 with capacities of 500 kW and 450 kW, respectively. The daytime DG penetration rate is approximately 30%. The output data for WTs and PVs is real and derived from a WT and PVs located in a specific region in northwest China, along with historical weather records providing the corresponding wind speed and global horizontal radiation data. To better simulate the probability density of the 24 h base load, the ratio of the load for each hour relative to the maximum load of that node is defined according to the solid line in
Figure 4. In accordance with daily patterns, the load can be decomposed into two components: periodic base data and random fluctuation data. To reflect the load’s volatility, the standard deviation of the load is set to 20%. Following the three-sigma rule, it can be considered that the load ratios at each time fall within the shaded area of
Figure 4.
In this section, the accuracy of pdf modeling under real data using BKDE and KDE is compared. Subsequently, the effectiveness of the proposed method for risk assessment is validated through simulations in two cases: when there is abundant radiation and high wind speed resulting in higher DG output during the daytime, and when there is no radiation at night coupled with low wind speed leading to reduced DG output.
6.1. The Performance of pdf Modeling
Section 2.1 compares the pdf modeling results of BKDE and KDE with simulation data through a simple example. Here, the accuracy of the modeling of the two is compared again through real DG data. Since the real pdf of the DG data is unknown, the relative error between the mean and variance of the modeled pdf and the mean and variance of the real measured data is used as the criterion for comparison. When using the measured data for PV, the data for the time when the PV is not producing power at night were removed. The results are presented in
Table 1.
Figure 5,
Figure 6 and
Figure 7 show histograms of the DG real data as well as the pdfs calculated by BKDE and KDE.
The results from
Figure 5,
Figure 6 and
Figure 7 reveal that KDE exhibits a segment of density curve near the boundary points that markedly contradicts the actual histogram trend, whereas BKDE yields results consistent with the actual trend. It can be affirmed that the results from BKDE are more in line with the actual data-based histogram compared to KDE.
where
Valuec represents the mean or variance of the modeled pdf, and
Valuer represents the mean or variance of the real measured data.
6.2. Case 1: High DG Output
Assuming that, based on NWP, the wind speed forecast for WT at 11 a.m. on a certain day is 10 m/s, and the global horizontal radiation at PV1 is 850 W/m
2, while at PV2, it is 900 W/m
2. Based on the BKDE, the pdfs and CDFs of WT, PV1, and PV2 can be obtained, as shown in
Figure 8, along with the joint pdf and joint CDF between these variables, as illustrated in
Figure 9,
Figure 10 and
Figure 11.
From the above figures, it is evident that the variables WT, PV1, and PV2 do not satisfy the condition of mutual independence between them. Therefore, it is not possible to directly perform PLFCM, and an independent transformation is required.
After performing the Rosenblatt inverse transformation, the correlated DG variables were transformed to become independent samples.
Table 2 and
Table 3 present the Pearson correlation coefficient, Kendall correlation coefficient, and Spearman correlation coefficient between WT, PV1, and PV2 before and after the transformation. The results indicate that after the transformation, the absolute values of the correlation coefficients between DG variables were largely reduced to below 0.1, significantly decreasing their correlations and rendering them nearly independent.
In this paper, the accuracy of the proposed method is validated through the MCS method for PLF. Additionally, a method based on Orthogonal inverse transform is employed to calculate PLF for comparative purposes. To account for the correlation, the MCS method used DG data that originally had correlations before the independence transformation. It employed random sampling based on the joint pdf of the DG variables obtained previously.
In total, the MCS method conducted 20,000 random samples, performed deterministic load flow, and obtained the pdf of node voltages.
Figure 12 illustrates the voltage pdf of three DG-connected nodes and the last node 18.
Due to space limitations, it is difficult to plot the pdf of all nodes under the three different methods. Therefore, in this paper, the results of the MCS method are used as the standard to calculate the relative errors of the mean and variance obtained by both the PLFCM with Rosenblatt inverse transform and PLFCM with Orthogonal inverse transform. The results are shown in
Table 4.
Observing
Figure 12, it can be noted that the results obtained from the PLFCM based on BKDE and Rosenblatt inverse transformation are in good agreement with the outcomes of the MCS method. Node 26 is located in the middle of the radial distribution network and has a WT connection, which is why its voltage remains within the allowable range without violations. However, nodes 33, 15, and 18 are situated at the endpoints of the distribution network. Despite having DG support at these nodes or in their vicinity, voltage violations below the lower limit still occur due to the impedance losses in the transmission lines.
Moreover, as can be seen from
Table 4, the error of PLFCM based on the Orthogonal inverse transform is significantly larger than that of PLFCM based on the Rosenblatt inverse transform proposed in this paper. This is due to the fact that the Orthogonal inverse transform is unable to characterize the nonlinear relationship that exists between the DG variables.
The time used for the calculation of the three methods is shown in
Table 5.
By comparing the computation times in
Table 5, it can be found that the computation time of PLFCM with Rosenblatt inverse transform and PLFCM with Orthogonal inverse transform is greatly reduced and the computation efficiency is improved compared with the MCS method. Through the above analysis, in general, the method proposed in this paper has high accuracy and prediction precision, and the calculation time is greatly reduced.
The corresponding risk indexes can be calculated based on the node voltage violation probability combined with the severity of the violation. A risk factor of three is assigned to regular nodes, while a risk factor of four is assigned to DG-connected nodes.
Table 6 presents the voltage violation probability and risk indexes for each node. Nodes with risk indexes above 0.01 are considered high risk, while those between 0.0001 and 0.01 are categorized as medium risk, values below 0.0001 are classified as low risk, and a risk index of 0 indicates no risk.
Figure 13 illustrates the risk zones of the system at different levels at 11 a.m.
6.3. Case 2: Low DG Output
Assuming that, based on NWP, the wind speed forecast for WT at 9 p.m. on a certain day is 6 m/s. Since it is night-time, the global horizontal radiation is zero, resulting in no power output from PV1 and PV2. Utilizing the BKDE, the pdf and CDF for WT can be obtained, as shown in
Figure 14.
As there is only one DG generating power at this moment, there is no correlation among multiple DGs. Hence, the PLFCM can be performed directly as there is no need for independent transformation. So in this section, only two methods are used for calculating the pdf of voltage, which are PLFCM and MCS. The sampling number of the MCS is still 20,000 times. The pdf of voltage at the WT-connected node and the last node 18 are shown in
Figure 15.
Table 7 illustrates the relative errors of the mean and variance of each node.
Table 8 demonstrates the computation time for both methods.
From
Figure 15, it can be observed that despite the relatively low WT output, node 26, due to its location and reduced load, maintains its voltage within the normal range. However, the end-node 18 experiences consistently low voltage levels during the night due to the lack of nearby PV power support. The voltage violation probability and risk indexes for each node at this time are presented in
Table 9.
Figure 16 illustrates the risk levels of the system at 9 p.m.
7. Conclusions
A dynamic risk assessment method for voltage violation in distribution networks with DG is proposed in this paper, which mainly contributes to the following three points.
Firstly, a pdf modeling method based on BKDE and conditional density is proposed. This method provides a more accurate representation of stochastic variables with boundary constraints, such as load and DG. Simulation results on actual DG data indicate that the relative errors of mean and variance using this method are approximately one-third of those obtained with KDE.
Secondly, to address the correlation between DG variables within the same region, an independent transformation method based on the Rosenblatt inverter transformation is proposed. This method has the capability to reduce the absolute values of correlation coefficients between random variables to below 0.1, laying the foundation for accurate PLFCM. Compared to the PLFCM based on the Orthogonal inverse transform, the proposed method demonstrates a reduction of one to two orders of magnitude in the relative errors of mean and variance, with comparable computational times.
Finally, an index for the severity of voltage violation, based on a utility function, is proposed. This index, combined with the voltage violation probability, quantitatively characterizes the risk of voltage violation at nodes and provides essential information for coordinated voltage control.