Solar Flare Prediction Model with Three Machine-learning Algorithms using Ultraviolet Brightening and Vector Magnetograms

N. Nishizuka; K. Sugiura; Y. Kubo; M. Den; S. Watari; M. Ishii

doi:10.3847/1538-4357/835/2/156

1. Introduction

The mechanism of solar flares is a long-standing puzzle in solar physics. The energy storage and triggering processes of flares are driven by the emergence of flux in the photosphere (e.g., Priest & Forbes 2002; Shibata & Magara 2011; Takasao et al. 2015), and flares are directly observed by a photospheric magnetogram. The shape and complexity of sunspots in white-light emission have been classified according to the sunspot growth level (McIntosh 1990). It is empirically known that larger sunspots with a large number of umbra and a more complicated magnetic flux structure tend to produce larger flares (e.g., Sammis et al. 2000; Gallagher et al. 2002; Li et al. 2008; Colak & Qahwaji 2009; Bloomfield et al. 2012; Lee et al. 2012; Barnes et al. 2016), as well as repeated flares in the same active regions (ARs) (e.g., Zirin 1988; Zirin & Marquette 1991).

Features derived from the line-of-sight magnetogram are useful indicators for future flare prediction, such as the magnetic flux, the gradient of the magnetic field (Yu et al. 2009; Steward et al. 2011), the length of magnetic neutral lines (Steward et al. 2011), the effective magnetic field (Georgoulis & Rust 2007; Papaioannou et al. 2015), the unsigned magnetic flux near the magnetic neutral lines (R-value: Schrijver 2007; Falconer et al. 2011), the total magnetic energy dissipation (Song et al. 2009), the weighted magnetic neutral line length and the distance between NS polarity sunspot centers (Mason & Hoeksema 2010), the non-potentiality (e.g., Falconer et al. 2014), and the wavelet spectra (Yu et al. 2010; Al-Ghraibah et al. 2015; Boucheron et al. 2015; Muranushi et al. 2015). These features are related to the dynamics of flux emergence and are strongly correlated with the energy storage and the triggering mechanisms.

Leka & Barnes (2003) pioneered the use of vector magnetic field data for flare prediction, and the features from a vector magnetogram were first used with machine learning by Bobra & Couvidat (2015). Detailed vector magnetogram observations show the dynamic variation of the magnetic configuration near magnetic neutral lines caused by successive flux emergence (Kubo et al. 2007), and the photospheric flow around magnetic neutral lines has been shown to be an important indicator of the occurrence of flares (Welsch et al. 2009). Recently, a model of flare triggers has been proposed by Kusano et al. (2012), in which the relative direction of an emerging flux near magnetic neutral lines to the pre-existing sheared magnetic loops determines the size of flares; this model has been supported by observations (Bamba et al. 2013; Toriumi et al. 2013).

As an emerging flux appears near magnetic neutral lines in an AR, small-scale energy release occurs in the lower chromosphere via magnetic reconnection, which has been observed using the 1600 Å filtergram of the Transition Region and Coronal Explorer as a gradual increase in the ultraviolet (UV) emission for 2–3 hr in the preflare phase (Saito 2006), as well as in Ca_II H emission by Hinode observations (Bamba et al. 2013). The 1600 Å filtergram observes the UV continuum, chromospheric lines, and the C_IV doublet (∼1550 Å), which is strongly enhanced and well correlated with the hard X-ray emission (Brekke et al. 1996; Handy et al. 1999; Warren & Warshall 2001; Nishizuka et al. 2009). Moon et al. (2004) found UV brightening at one end of a pre-erupting filament, where magnetic reconnection occurs in the low atmosphere and changes the magnetic connectivity, leading to the initiation of the filament eruption (see also Kim et al. 2008; Guo et al. 2012).

The amount of recent open-access solar observation data is so large that it is beyond human processing ability. To deal with the data, several machine-learning algorithms (see an introductory text to machine-learning, e.g., Hastie et al. 2009) have been applied to the flare prediction problem: a neural network (Qahwaji & Colak 2007; Colak & Qahwaji 2009; Higgins et al. 2011; Ahmed et al. 2013), C4.5 decision trees (Yu et al. 2009, 2010), learning vector quantization (Yu et al. 2009; Rong et al. 2011), a regression model (Lee et al. 2007; Song et al. 2009), k-nearest neighbor (Li et al. 2008; Huang et al. 2013; Winter et al. 2015), a support vector machine (SVM) (Qahwaji & Colak 2007; Bobra & Couvidat 2015; Muranushi et al. 2015), a relevant vector machine (Al-Ghraibah et al. 2015), SVM regression (Boucheron et al. 2015), and an ensemble of four predictors (Guerra et al. 2015). However, the best algorithm for flare prediction has not been discussed in previous works, and it cannot be found without directly comparing the performances of different algorithms.

Thus, in this paper, we compared three machine-learning algorithms to find which algorithm has the highest performance for a flare prediction. We also extended the observation data period and wavelength obtained by the Solar Dynamics Observatory (SDO; Pesnell et al. 2012) and optimized each algorithm to improve the prediction accuracy. Novel features such as UV brightening and the vector magnetogram have been included, and finally the importance of different features was calculated and ranked. In Section 2, we give an overview of our prediction model, which is explained in detail in Section 3. The prediction results are described in Section 4 and a discussion and conclusion are given in Section 5.

2. Overview of our Prediction Model

The procedures of our flare prediction model are as follows. (i) First, observation data are downloaded from the web archives of SDO and the Geostationary Operational Environmental Satellite (GOES), such as the line-of-sight magnetogram, vector magnetogram, 1600 Å broadband filtergram images, and the light curves of the soft X-ray emission. (ii) Second, ARs are detected from full-disk images of the line-of-sight magnetogram, and the ARs are tracked using their time evolution. (iii) For each AR, features are calculated from multiwavelength observations, and flare labels are attached to the solar feature database if an X/M-class flare occurs within 24 hr after an image. (iv) Supervised machine learning is carried out with a 1 hr cadence to predict the maximum class of flares occurring in the following 24 hr.

Our observation data are from 2010 June to 2015 December, which were taken by SDO, launched in 2010 February. During this period, 29 X-class and 433 M-class flares were observed on the disk, accounting for 90% of the flares observed during the period. The other 10% of the flares occurred on the limb and were removed from our event list. We call the events of data with flare labels "positive events," while the other events are "negative events." It is considered that X-class flares occur on average 5–10 times per year during the solar maximum period, while M-class flares occur 100 times per year. Negative events are much more common than positive events, making the flare prediction problem an imbalanced one.

We used the line-of-sight magnetogram taken by the Helioseismic and Magnetic Imager (HMI; Scherrer et al. 2012) on board SDO, as well as the vector magnetogram. The UV continuum of the lower chromosphere was taken by the 1600 Å broadband filtergram of the Atmospheric Imaging Assembly (AIA; Lemen et al. 2012) on board SDO. The full-disk integrated X-ray emission over the range of 1–8 Å was observed by GOES. The time cadence of the line-of-sight magnetogram is 45 s, that of the vector magnetogram is 12 minutes, that of the 1600 Å filtergram is 12 s, and that of GOES is less than 1 minute. Thus, the total size of the observation data set is so large that we reduced the cadence to 1 hr, in accordance with the forecast operation. The vector magnetogram data consist of the absolute field strength, the inclination angle, the azimuth angle, and the sign to solve the 180° ambiguity problem. By converting these components to Cartesian coordinates, we calculated the features listed in Table 1.

Table 1. The Extracted Solar Features and the Importance

Number	Features	Description	Importance
1	Xhis	Total history of X-class flares in an AR	0.0519
2	Xmax1d	Maximum X-ray intensity one day before	0.0495
3	Mhis	Total history of M-class flares in an AR	0.0365
4	TotNL	Total length of magnetic neutral lines in an AR	0.0351
5	Mhis1d	1 day history of M-class flares	0.0342
6	NumNL	Number of magnetic neutral lines	0.0341
7	USFlux†	Total unsigned flux	0.0332
8	CHArea	Chromospheric (UV) brightening area	0.0235
9	Bave	Averaged magnetic field	0.0230
10	Xhis1d	1 day history of X-class flares	0.0224
11	TotBSQ†	Total magnitude of Lorentz force	0.0199
12	Bmax	Maximum magnetic field	0.0193
13	MeanGAM†	Mean angle of the field from the radial direction	0.0179
14	dt24SavNCPP	Time derivative of SavNCPP over 24 hr	0.0171
15	dt24TotNL	Time derivative of TotNL over 24 hr	0.0169
16	dt24TotBSQ	Time derivative of TotBSQ over 24 hr	0.0164
17	TotFz†	Sum of Z-component of Lorentz force	0.0160
18	dt24TotFY	Time derivative of TotFY over 24 hr	0.0156
19	Area†	Area of the strong field in an AR	0.0153
20	TotFY†	Sum of Y-component of Lorentz force	0.0152
21	dt24TotFX	Time derivative of TotFX over 24 hr	0.0152
22	SavNCPP†	Modules of the net current per polarity	0.0150
23	TotUSJz†	Total unsigned vertical current	0.0149
24	dt24TotFZ	Time derivative of TotFz over 24 hr	0.0145
25	MeanJzh†	Mean current helicity (Bz contributions)	0.0144
26	ABSnJzh†	Absolute value of the net current per polarity	0.0137
27	CHAll	Total chromospheric (UV) brightening	0.0134
28	TotFx†	Sum of X-component of Lorentz force	0.0132
29	dt24USflux	Time derivative of USflux over 24 hr	0.0131
30	TotUSJh†	Total unsigned current helicity	0.0129
31	dt24Area	Time derivative of Area over 24 hr	0.0128
32	MeanGBt†	Mean gradient of the total field	0.0125
33	Max dxBz	Maximum of dBz/dx	0.0116
34	dt24ABSnJzh	Time derivative of ABSnJzh over 24 hr	0.0115
35	Max dyBz	Maximum of dBz/dy	0.0112
36	MeanGBz†	Mean gradient of the vertical field	0.0112
37	MeanJzd†	Mean vertical current density	0.0111
38	dt12Area	Time derivative of Area over 12 hr	0.0110
39	dt24TotUSJz	Time derivative of TotUSJz over 24 hr	0.0110
40	dt24Bmax	Time derivative of Bmax over 24 hr	0.0107
41	MaxNL	Maximum length of magnetic neutral lines	0.0107
42	Xflux4h	Averaged X-ray flux over 4 hr	0.0106
43	dt24CHArea	Time derivative of CHArea over 24 hr	0.0103
44	dt12Bmax	Time derivative of Bmax over 12 hr	0.0097
45	MeanGBh†	Mean gradient of the horizontal field	0.0092
46	Xflux1h	Averaged X-ray flux over 1 hr	0.0091
47	dt12USflux	Time derivative of USflux over 12 hr	0.0090
48	dt24 Max graB	Time derivative of Max. grad. Bz over 24 hr	0.0088
49	dt24 Max dzBy	Time derivative of Max. dBy/dz over 24 hr	0.0088
50	dt24 TotUSJh	Time derivative of TotUSJh over 24 hr	0.0081
51	dt24 NumNL	Time derivative of NumNL over 24 hr	0.0079
52	dt24 MaxdxBz	Time derivative of MaxdxBz over 24 hr	0.0079
53	dt24MeanJzh	Time derivative of MeanJzh over 24 hr	0.0078
54	dt24MaxNL	Time derivative of MaxNL over 24 hr	0.0075
55	dt02 Area	Time derivative of Area over 2 hr	0.0071
56	dt24 CHAll	Time derivative of CHAll over 24 hr	0.0071
57	Bmin	Minimum magnetic field of Bz	0.0071
58	CHMax	Maximum intensity of chromospheric (UV) brightening	0.0062
59	dt02 Bmax	Time derivative of Bmax over 2 hr	0.0061
60	dt24 CHMax	Time derivative of CHMax over 24 hr	0.0049
61	dt24 MeanGBz	Time derivative of MeanGBz over 24 hr	0.0028
62	dt24 MeanGBh	Time derivative of MeanGBh over 24 hr	0.0021
63	dt24 MeanGBt	Time derivative of MeanGBt over 24 hr	0.0002
64	dt24MeanGAM	Time derivative of MeanGAM over 24 hr	0.0002
65	dt24MeanJzd	Time derivative of MeanJzd over 24 hr	0.0000

Note. The formulae of the features attached with † marks are shown in Table 2 and in Bobra & Couvidat (2015). The importance was calculated by ERT for X-class flare prediction.

Download table as: ASCIITypeset images: 1 2

3. Details of our Prediction Model

3.1. Detection of ARs

First, we detected ARs to extract solar features from the images of the downloaded observation database. We used ∼10⁵ full-disk images of the line-of-sight magnetogram for detection with a reduced cadence of 1 hr (Figure 1). The line-of-sight magnetogram was selected for AR detection because it is less noisy than the vector magnetogram and more suitable for the processing carried out for detection. After determining ARs in magnetogram images, the frame coordinates of the ARs were applied to other images with different wavelengths (Figure 2).

Figure 1. Refer to the following caption and surrounding text. — **Figure 1.** Four steps of AR detection: (a) a full-disk magnetogram taken by HMI/*SDO*, (b) detected points of magnetic field larger than 140 G (in red), (c) 80'' × 80'' squares centering at the detected points (in light blue), and (d) coupled squares covering the whole ARs (in yellow).
Download figure:
Standard image High-resolution image

Figure 2. Refer to the following caption and surrounding text. — **Figure 2.** Full-disk images of (a) the line-of-sight magnetogram taken by HMI/*SDO*, with detected active regions framed in yellow or red, and of (b) the UV continuum taken with the 1600 Å broadband filter of AIA/*SDO*. The region with a red frame produced X5.4 flare, 3 hr after this image was taken.
Download figure:
Standard image High-resolution image

Here we defined the detection rules as follows: (i) first, we smoothed the data with 64 (=8 × 8) binning and detected the image pixels where the absolute magnetic field strength is larger than a threshold value, i.e., B_th = 140 G, for convenience (Figure 1(b)). We set the maximum value of the observation errors for the detection threshold to detect faint ARs. (ii) Second, we placed the detected pixels in squares with a side of 160 pixels (∼80'') (Figure 1(c)). Such an 80'' × 80'' square is the minimum unit of the detection region. (iii) Third, if two neighboring squares overlapped, they were combined to form a larger square containing two detected points. The repetition of this process resulted in a single large square covering the whole AR and reduced the number of detected regions (Figures 1(d) and 2(a)).

Next we neglected ARs detected on the limb, where the magnetic structure is difficult to see owing to the projection effect and is partially hidden by the limb. Additionally, the quality of the data from the vector magnetogram is poor near the limb. This is why previous papers focused on the disk-center data set. On the other hand, in an operational setting, it is necessary to deal with ARs near the limb to make predictions, but there have been no attempts to verify the effectiveness of a near-limb data set. Including the near-limb data, the size of the database is increased, making machine learning more appropriate. In this paper, to investigate the effect of the detection regions on the prediction score, we compared the following three cases by including or excluding the near-limb region: the full-disk case, an intermediate case focusing within ±53° of the center meridian (CM) (within 4/5 of the solar radius) and the disk-center case focusing within ±37° (within 3/5 of the solar radius). In Bobra & Couvidat (2015), the authors only considered flares within ±68° of the CM.

Furthermore, we tracked ARs moving along the axis of solar rotation and numbered them for identification. In the case of overlapping ARs in two successive images, we numbered them with the same ID. We detected a total of 11700 ARs from the full-disk images during the period 2010–2015. Here, we determined regions containing magnetic fields larger than 140 G as ARs, and our definition is different from NOAA's. Our model includes faint quasi ARs, so as not to miss even small flares occurring outside of NOAA's regions. If we enlarge the detection threshold, the flare occurrence rate of the data set will increase, but we cannot avoid missing some flares. Furthermore, a strong magnetic field is localized, so that not the whole area of ARs may be covered by 80'' × 80'' squares.

3.2. Extraction of Solar Features

Using the database of detected ARs, we next extracted solar features from each AR. We adopted solar features used in the previous papers, which were extracted from the line-of-sight magnetogram (e.g., Steward et al. 2011; Ahmed et al. 2013), the vector magnetogram (Leka & Barnes 2003; Bobra & Couvidat 2015), and GOES X-ray data. Furthermore, in this study, we extracted the feature of chromospheric brightening, which was obtained from the UV continuum taken by the SDO/AIA 1600 Å filtergram for the first time. The extracted features are summarized in Table 1, along with their importance ranking (which will be explained in a later section).

From the line-of-sight magnetogram, we extracted features such as the area of an AR, the maximum B_LOS, the average B_LOS, the unsigned magnetic flux, the gradients of the magnetic field in the longitudinal/latitudinal directions, and the number of magnetic neutral lines. The magnetic neutral line is an indicator of flare activity because it is directly related to the energy storage and triggering mechanisms. We counted the number of neutral lines in an AR and measured the maximum/total length of the lines (Figures 3(a)–(b)). We detected neutral lines using two conditions: a large magnetic field gradient and a reverse of the magnetic polarity across the lines. Here we focused on the magnetic neutral lines longer than 100 pixels (∼50'') to eliminate short and complicated neutral lines (Figure 3(b)).

Figure 3. Refer to the following caption and surrounding text. — **Figure 3.** Snapshot images of an AR, which produced X5.4 flare on 2012 March 7, with different wavelength observations: (a) line-of-sight magnetogram taken by HMI/*SDO*, with magnetic neutral lines in green, (b) one with only long magnetic neutral lines in yellow, (c) vector magnetogram taken by HMI/*SDO*, and (d) UV continuum using the 1600 Å broadband filter of AIA/*SDO*.
Download figure:
Standard image High-resolution image

After preprocessing of the vector magnetogram, we calculated features using the three vector components of the magnetic field (Figure 3(c)). Using the formulae in Bobra & Couvidat (2015), we extracted vector magnetogram features: such as the vertical current, the current helicity, the Lorentz force, and the mean gradient of the total field. The formulae of the features derived from the vector magnetogram are summarized in Table 2. The corresponding features in Table 1 are marked with daggers. Moreover, we differentiated the extracted features with respect to time. The time derivatives of the features over 24, 12 and 2 hr were calculated to track the variability of ARs over different timescales.

Table 2. Formulae of AR Features

Keyword	Description	Formula
TOTUSJH	Total unsigned current helicity	${H}_{{c}_{\mathrm{total}}}\propto \sum \,\| {B}_{z}\cdot {J}_{z}\|$
TOTBSQ	Total magnitude of Lorenz force	$F\propto \sum \,{B}^{2}$
TOTUSJZ	Total unsigned vertical current	${J}_{{z}_{\mathrm{total}}}=\sum \,\| {J}_{z}\| {dA}$
ABSNJZH	Absolute value of the net current per polarity	${H}_{{c}_{\mathrm{abs}}}\propto \| \sum \,{B}_{z}\cdot {J}_{z}\|$
SAVNCPP	Sum of the modules of the net current per polarity	${J}_{{z}_{\mathrm{sum}}}\propto \| {\sum }^{{B}_{z}^{+}}{J}_{z}{dA}\| +\| {\sum }^{{B}_{z}^{-}}{J}_{z}{dA}\|$
USFLUX	Total unsigned flux	${\rm{\Phi }}=\sum \,\| {B}_{z}\| {dA}$
AREA-ACR	Area of strong field pixels in the active region	$\mathrm{Area}=\sum \,\mathrm{Pixels}$
TOTFZ	Sum of z-component of Lorenz force	${F}_{z}\propto \sum \,({B}_{x}^{2}+{B}_{y}^{2}-{B}_{z}^{2}){dA}$
EPSZ	Sum of z-component of normalized Lorentz force	$\delta {F}_{z}\propto \tfrac{\sum \,({B}_{x}^{2}+{B}_{y}^{2}-{B}_{z}^{2})}{\sum \,{B}^{2}}$
MEANGAM	Mean angle of field from radial	$\overline{\gamma }=\tfrac{1}{N}\sum \,\arctan \left(\tfrac{{B}_{h}}{{B}_{z}}\right)$
MEANGBT	Mean gradient of total field	$\overline{\| {\rm{\nabla }}{B}_{\mathrm{tot}}\| }=\tfrac{1}{N}\sum \,\sqrt{{\left(\tfrac{\partial B}{\partial x}\right)}^{2}+{\left(\tfrac{\partial B}{\partial y}\right)}^{2}}$
MEANGBZ	Mean gradient of vertical field	$\overline{\| {\rm{\nabla }}{B}_{z}\| }=\tfrac{1}{N}\sum \,\sqrt{{\left(\tfrac{\partial {B}_{z}}{\partial x}\right)}^{2}+{\left(\tfrac{\partial {B}_{z}}{\partial y}\right)}^{2}}$
MEANGBH	Mean gradient of horizontal field	$\overline{\| {\rm{\nabla }}{B}_{h}\| }=\tfrac{1}{N}\sum \,\sqrt{{\left(\tfrac{\partial {B}_{h}}{\partial x}\right)}^{2}+{\left(\tfrac{\partial {B}_{h}}{\partial y}\right)}^{2}}$
MEANJZH	Mean current helicity (B_z contribution)	$\overline{{H}_{c}}\propto \tfrac{1}{N}\sum \,{B}_{z}\cdot {J}_{z}$
TOTFY	Sum of y-component of Lorentz force	${F}_{y}\propto \sum \,{B}_{y}{B}_{z}{dA}$
MEANJZD	Mean vertical current density	$\overline{{J}_{z}}\propto \tfrac{1}{N}\sum \,\left(\tfrac{\partial {B}_{y}}{\partial x}-\tfrac{\partial {B}_{x}}{\partial y}\right)$
TOTFX	Sum of x-component of Lorentz force	${F}_{x}\propto -\sum \,{B}_{x}{B}_{z}{dA}$
EPSY	Sum of y-component of normalized Lorentz force	$\delta {F}_{y}\propto \tfrac{-\sum \,{B}_{y}{B}_{z}}{\sum \,{B}^{2}}$
EPSX	Sum of x-component of normalized Lorentz force	$\delta {F}_{x}\propto \tfrac{\sum \,{B}_{x}{B}_{z}}{\sum \,{B}^{2}}$

Note. The formulae in this table are quoted from Bobra & Couvidat (2015).

Download table as: ASCII Typeset image

Brightening in the lower chromosphere is another indicator of flares. A few hours before a flare onset, the lower chromosphere is gradually heated, emitting light in the UV range (Figure 3(d)). The brightening is located around magnetic neutral lines. We extracted chromospheric (UV) features and used them for training in the machine learning for the first time; these features included the maximum intensity, the brightening area, and the total intensity of UV brightening in a whole AR. We used AIA 1600 Å filtergram images of SDO representing the lower-chromosphere brightening.

The exposure time of observations using the 1600 Å filtergram of AIA/SDO is almost constant (∼3 s) from 2010 to 2015; thus, we used the original photon numbers for feature calculations. We set a threshold intensity to determine the brightening area as 700 photon cm⁻⁵ s⁻¹ pix⁻¹. The threshold intensity was determined for features to show large variations by a parameter survey. The total intensity of the UV brightening was calculated by integrating the intensity above the threshold over the pixels of the determined brightening area.

We also used GOES X-ray data in the range of 1–8 Å as an indicator of previous and current flare activities proposed by several authors (e.g., Zirin & Marquette 1991; Wheatland 2004). We measured the background level of the X-ray intensity by averaging the light curve of X-ray emission over 1 and 4 hr. We derived the maximum intensity one day before an image and counted the number of previous flares in an AR one day before and for the total period after the AR emergence, referred to as the 1 day history and the total history of X/M-class flares, respectively.

3.3. Classification by Machine Learning

We used three machine-learning algorithms for comparison: the SVM, k-nearest neighbors (k-NN), and extremely randomized trees (ERT). Each algorithm was used as a classifier of the flare class and optimized to maximize the skill score true skill statistic (TSS, explained in a later section).

3.3.1. SVM Classifier

The SVM is a pattern recognition model using supervised learning (Vapnik & Lerner 1963; Boser et al. 1992; Cortes & Vapnik 1995). It is an algorithm of classifiers that uses a linear input to determine the maximum-margin hyperplane with the largest margin relative to certain points that belong to each group of the training sample. The learning process involves solving an optimization problem using Lagrange multipliers and the KKT condition. When the number of training samples increases, the calculation time rapidly increases. Here, we used a radial basis function kernel (RBF kernel or Gaussian kernel). The hyperparameter C was set to C = 10, which gave the most promising results. Following the standard method, the other hyperparameter γ was set to γ = 1/(the number of features).

3.3.2. k-Nearest Neighbor (k-NN) Classifier

The k-NN algorithm is a classifier based on the nearest instances in a feature space and is the simplest machine-learning algorithm in this study (Dasarathy 1991). The classification of objects is determined by the votes of the nearest groups, that is, an object is assigned to the most popular class of the nearest k objects, where k is an integer. When k = 1, an object is classified as being the same as the nearest object. Each feature is described by a position vector, and the distances among features are measured by the Euclidean or Manhattan distance. The k-NN algorithm is likely to be affected by the locality of the data. The selection of k is performed by several heuristics. A large k can reduce noise but make the border obscure. In this paper, to optimize our model, we set k = 1, for which the nearest instance in the training data set defines the prediction. Furthermore, we adopted the Manhattan distance, i.e., the distance ${d}_{1}({\boldsymbol{x}},{\boldsymbol{y}})={\,\sum }_{k=1}^{n}| {{\boldsymbol{x}}}_{k}-{{\boldsymbol{y}}}_{k}|$ .

3.3.3. ERT Classifier

The random forest (Breiman 2001) fits the number of decision-tree classifiers on various subsamples of a data set and uses averaging to improve the prediction accuracy. It selects random splits to separate the subset of a node into two subsets for each of the randomly selected features, and the best split is chosen. In the ERT classifier (Geurts et al. 2006), a random subset of candidate features is used, but thresholds are selected at random for each candidate feature and the best of these randomly generated thresholds is chosen. ERT prevents overfitting and increases the calculation speed by parallelization. We set the number of trees to 300 in this paper.

Another advantage of ERT is the possibility of calculation of the importance of features. Breiman (2001) proposed a method of evaluating the importance of a feature X_m for predicting Y in a tree structure T by adding the decreases in the weighted impurity p(t)Δ i(s_t, t) for all nodes t where X_m is used, then averaging over all N_T trees in the forest:

$\begin{eqnarray}&&\mathrm{Imp}({X}_{m})=\displaystyle \frac{1}{{N}_{T}}\displaystyle \,\sum _{T}\displaystyle \,\sum _{t\in T:v({s}_{t})={X}_{m}}p(t){\rm{\Delta }}i({s}_{t},t)\end{eqnarray} \tag{ 1 }$

where p(t) is the proportion N_t/N of samples reaching node t and v(s_t) is the feature used in split s_t (see also Louppe et al. 2013). The decrease in some impurity measure i(t) (e.g., the Gini index, the Shannon entropy, or the variance of Y) at node t is defined by the following formula:

$\begin{eqnarray}&&{\rm{\Delta }}i(s,t)=i(t)-{p}_{L}i({t}_{L})-{p}_{R}i({t}_{R}),\end{eqnarray} \tag{ 2 }$

where ${p}_{L}={N}_{{t}_{L}}/N$ and ${p}_{R}={N}_{{t}_{R}}/N$ , and the split s_t = s* for which the partition of the N node samples into two subsets t_L and t_R maximizes the decrease in the impurity is identified. When nodes become pure in terms of Y, the construction of the tree stops. We use Gini index as the impurity function, and this measure is known as the Gini importance or the mean decrease Gini.

3.4. Standardization, Evaluation, and Cross-validation

3.4.1. Standardization

The extracted solar features have different units and different scales; thus, data standardization is required. The standardization strongly affects the prediction accuracy, although this has not been widely acknowledged by the solar flare forecast community. We used the Z-value for standardization, i.e.,

$\begin{eqnarray}&&Z=(X-\mu )/\sigma ,\end{eqnarray} \tag{ 3 }$

where X is the original value of the extracted solar feature, μ is the mean, and σ is the standard deviation (e.g., Bishop 2006). Therefore, Z-values are expressed in terms of standard deviations from means. As a result, these Z-values have a distribution with a mean of 0 and a standard deviation of 1. For parameters with a large-scale variation, we took the logarithm first and then calculated the Z-value.

3.4.2. Redistribution to Training/Testing Data Set

The solar feature database with 1 hr cadence was fully shuffled and randomly separated in two data sets with a size ratio of 7:3 to obtain the data sets for training and testing, respectively. This ratio of 7:3 was adopted from previous works (Ahmed et al. 2013; Bobra & Couvidat 2015). Note that the solar feature database is appended with flare labels when the sample is within 24 hr of an X/M-class flare. Thus, there are at most 24 positive events per flare, and these events are included in both the training and testing databases.

3.4.3. Validation by TSS and Cross-validation

We evaluated the prediction results using the past data in 2010–2015 with a skill score, the TSS. This is also called the Hanssen–Kuiper skill score or Peirce skill score, and is defined by

$\begin{eqnarray}&&\mathrm{TSS}=\displaystyle \frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}}-\displaystyle \frac{\mathrm{FP}}{\mathrm{FP}+\mathrm{TN}},\end{eqnarray} \tag{ 4 }$

where TP, FN, FP, and TN are the numbers of true positives, false negatives, false positives, and true negatives, respectively. The score has a range of −1 to +1, with 0 representing no skill and 1 representing perfect prediction. The TSS expresses the hit rate relative to the false alarm rate, and it remains positive provided the hit rate is greater than the false alarm rate. Flare prediction is an imbalanced problem, which means that negative events are much more frequent than positive ones. Bloomfield et al. (2012) suggested the use of the TSS because it is not affected by imbalanced problems (see also Bobra & Couvidat 2015). This is why we selected the TSS for the evaluation of our prediction results.

It is important to show that our model did not suffer from overfitting. To show the validity of our approach, we used cross-validation (CV), which is the standard approach in this field. There are several types of CV approach such as K-fold CV, shuffle and split CV, leave-one-out CV, and so forth. In K-fold CV, the data set is partitioned into K partitions and one partition acts as the validation set, and K = 5 or K = 10 is usually used. Note that the validation set acts as a test set, but technically it is called the validation set.

Since there are much fewer positive samples than negative samples in solar-flare classification, a large validation set contains more positive samples. This allows us to analyze the common features of misclassification results compared with the case where a smaller validation set is used, such as in 10-fold CV.

For the above reason, we selected shuffle and split CV to show the validity of our model. The data set was shuffled and partitioned into training and validation sets. The size ratio of the two sets was 7:3, which is widely used in machine-learning and data-mining studies. This process was executed 10 times and the average results are shown.

4. Prediction Results

We performed supervised machine learning using the solar feature database to predict the maximum class of flares occurring in the following 24 hr. We used three machine-learning algorithms, k-NN, SVM, and ERT, to reveal which algorithm is the most effective for a flare prediction model. We predicted two types of flare: X-class flares and ≥M-class flares. The prediction results for the three algorithms are summarized in Table 3, where in addition to a contingency table, the average TSS and the standard deviation as the error calculated by 10-times shuffle and split CV are listed.

Table 3. The Prediction Results of X-class Flares and ≥M-class Flares

Algorithm	TP	FP	FN	TN	TSS
(a) X-class flares
k-NN	152	14	11	54439	0.91 ± 0.03
SVM	120	22	16	54458	0.88 ± 0.03
ERT	134	7	29	54446	0.82 ± 0.04
(b) ≥M-class flares
k-NN	1544	121	155	52796	0.912 ± 0.005
SVM	1496	473	203	52444	0.870 ± 0.007
ERT	1216	39	483	52878	0.71 ± 0.02

Note. The contingency tables of prediction results of X-class flares and ≥M-class flares, for the three machine-learning algorithms, k-NN, SVM, and ERT.

Download table as: ASCII Typeset image

The three algorithms show different prediction performances. Here, the feature database we used includes the previous flare activity derived from GOES data. For the prediction of X-class flares, the TSS was 0.91 ± 0.03 for k-NN, 0.88 ± 0.03 for SVM and 0.82 ± 0.04 for ERT. For ≥M-class flares, the TSS was 0.912 ± 0.005 for k-NN, 0.870 ± 0.007 for SVM, and 0.71 ± 0.02 for ERT. Consequently, the k-NN algorithm was found to show the highest performance among the three algorithms on the TSS, both for X-class and ≥M-class flare prediction, followed by SVM then ERT. However, the FP was smallest for ERT. In Table 3, since the error of TSS is sufficiently small, the overfitting is small.

Table 4 shows the prediction results when the flare history for the previous day, the flare history during the whole period after the appearance of an AR, and the maximum X-ray intensity for the previous day are neglected. This result shows only the contribution of magnetogram and chromospheric (UV) images to flare prediction. For the prediction of X-class flares, the TSS was 0.91 ± 0.02 for k-NN, 0.86 ± 0.02 for SVM, and 0.62 ± 0.03 for ERT, and for ≥M-class flare prediction, the TSS was 0.904 ± 0.005 for k-NN, 0.856 ± 0.009 for SVM, and 0.63 ± 0.01 for ERT. By comparing Tables 3 and 4, we found that the TSS values in Table 4 are within the error of the values reported in Table 3 for k-NN and SVM, so there is no statistical difference between the two. Only for ERT, we found a slight increase in the TSS upon consideration of the previous flare activity.

Table 4. The Prediction Results of X-class Flares and ≥M-class Flares, Neglecting Features of Previous Flare Activities

Algorithm	TP	FP	FN	TN	TSS
(a) X-class flares
k-NN	136	16	15	54449	0.91 ± 0.02
SVM	130	23	21	54442	0.86 ± 0.02
ERT	87	4	49	54476	0.62 ± 0.03
(b) ≥M-class flares
k-NN	1570	173	167	52706	0.904 ± 0.005
SVM	1501	759	236	52120	0.856 ± 0.009
ERT	1105	35	632	52844	0.63 ± 0.01

Note. The contingency tables of prediction results of X-class flares and ≥M-class flares, for the three machine-learning algorithms, k-NN, SVM, and ERT.

Download table as: ASCII Typeset image

The importance of features calculated by ERT is also given in Table 1 for the case where the previous flare activity is considered. According to Table 1, the most effective feature for X-class flare prediction is the total history of X/M-class flares in an AR, followed by the maximum X-ray intensity one day before an image and the 1 day history of X/M-class flares. The next most effective features are the total length of magnetic neutral lines, the number of neutral lines, the unsigned magnetic flux, and the UV brightening area. The average magnetic field is ranked next, followed by features derived from the vector magnetogram and the time derivative of each feature over 24 hr.

We included novel features, such as UV brightening and the time derivative of features over 24, 12 and 2 hr. For UV brightening, the importance of the brightening area and the total intensity is relatively high, while the importance of the maximum intensity is very low. When we compare the time derivatives over different timescales, the time derivative over 24 hr is effective for flare prediction, but those over 12 and 2 hr are ineffective. Note that the magnetic free energy, the shear angle, and the unsigned magnetic flux near magnetic neutral lines have not been considered in this paper.

Furthermore, we compared the TSS for different detection areas of ARs, including or excluding near-limb regions. We set the detection area as the full disk, an intermediate area within ±53° of the CM, and the disk center with a focusing area within ±37°. The prediction results in the latter two cases are summarized in Table 5. For X-class flare prediction in the intermediate case, the TSS was 0.92 ± 0.03 for k-NN, 0.89 ± 0.02 for SVM and 0.88 ± 0.03 for ERT. For the disk center, the TSS was 0.94 ± 0.02 for k-NN, 0.92 ± 0.03 for SVM and 0.88 ± 0.06 for ERT. ERT is greatly improved by neglecting the near-limb ARs. Consequently, when the near-limb regions were neglected, the TSS was improved for all the algorithms. However, we also stress that in an actual operational setting, the TSS with consideration of the near-limb regions is more realistic.

Table 5. The Prediction Results of X-class Flares with Different Detection Regions

Algorithm	TP	FP	FN	TN	TSS
(a) An intermediate area
k-NN	87	8	5	43277	0.92 ± 0.03
SVM	84	12	7	43274	0.89 ± 0.02
ERT	80	0	10	43287	0.88 ± 0.03
(b) The disk-center focusing area
k-NN	54	2	4	26782	0.94 ± 0.02
SVM	57	3	5	26777	0.92 ± 0.03
ERT	55	2	8	26777	0.88 ± 0.06

Note. The contingency tables of prediction results of X-class flares with different detection regions: with an intermediate area within ±57° of the CM (within 4/5 of the solar radius) and with the disk center with a focusing area within ±37° (within 3/5 of the solar radius). We used the three machine-learning algorithms, k-NN, SVM, and ERT.

Download table as: ASCII Typeset image

5. Summary and Discussion

We developed a flare prediction model with supervised machine-learning techniques using solar observations of a vector magnetogram and UV brightening. By detecting ARs, we extracted novel features and attached flare labels. Using training and test data sets constructed from the fully shuffled data set, we performed machine learning to predict the maximum class of flares that occur in the following 24 hr after observation images. One aim of this paper was to reveal which machine-learning algorithm is most suitable for a flare prediction model, and we compared three algorithms for the first time. Ranking of the importance of our novel features was another aim of this paper, and we attempted to compare the effectiveness of different features for flare prediction.

Our prediction model achieved a skill score, the TSS, of greater than 0.9. The average performance of the k-NN algorithm was superior to those of SVM and ERT. One of the reasons why the TSS is improved with our model is the use of standardization, which strongly affects the prediction accuracy, although this has not been widely acknowledged by the solar flare forecast community. Here we used the Z-value for standardization. Furthermore, the optimization of our model, such as by incorporating the Manhattan distance, improved the TSS.

In the daily forecast operations at NICT space weather forecast center, which use the knowledge of experts, the TSS was 0.21 for X-class flares and 0.50 for ≥M-class flares during the period 2000–2015 (Kubo et al. 2016). At the Solar Influences Data Center of the Royal Observatory of Belgium, the TSS was 0.34 for ≥M-class flares during the period 2004–2012 (Devos et al. 2014). Thus, our prediction model appears to achieve better performance than human operations. On the other hand, using the fully shuffled data set, several positive events before a flare can be divided into both training and test data sets, and consequently the prediction score is increased. In particular, k-NN was most effective in this study and gave the highest TSS, similar to in other studies.

We also found that the TSS varies with the detection area; we considered the full-disk area including the near-limb regions, an intermediate area, and the disk-center focusing area. Upon neglecting the near-limb data, the accuracy of the features extracted from observation data sets was increased, thus improving the TSS. On the other hand, in an operational setting, a data set with the near-limb region is more realistic, and the evaluation would be more similar to human operations.

Next, we investigated the ranking of the importance of features. We showed that previous flare activity, such as the flare history in an AR and the maximum X-ray intensity in the previous day, is most important. The configurations of magnetic neutral lines, the unsigned magnetic flux, and the area of UV brightening are next most important. We also showed that the time derivative of features over 24 hr is useful for prediction, while the time derivatives over 12 and 2 hr are not. We also found that the features of the vector magnetogram have only moderate importance, although our model did not include the magnetic free energy and the shear angle of magnetic fields to the magnetic neutral line.

The importance of previous flare activity has been pointed out by several authors. The tendency for regions that have already flared to soon flare again is referred to as "persistence" in the flare forecast literature (Zirin & Marquette 1991). Wheatland (2004) pointed out that future flare prediction is improved by adding the history of the occurrence of flares (of all sizes) to the McIntosh classification model by using a Bayesian approach. Welsch et al. (2009) showed that the flare flux averaged over a 24 hr window exhibits some discriminant power by calculating the discriminant function coefficient (e.g., Barnes et al. 2007; Leka & Barnes 2007, 2003). However, the relative importance of previous flare activity was not shown in the previous papers, and in this paper, we directly showed that it is an important indicator for future flare prediction.

Welsch et al. (2009) also showed that the proxy Poynting flux and the unsigned flux around strong magnetic neutral lines (R-value) are important indicators of flares because they are related to the dynamics of flux emergence and are recognized as flare triggers. These features are not included in our study, but instead the maximum length, the total length, and the number of magnetic neutral lines in an AR were found to have high importance. The activity of flux emergence is also correlated with chromospheric brightening, which we adopted for the first time. The brightening is mainly observed along magnetic neutral lines, and it was found that the area of chromospheric brightening is a useful indicator of flare prediction, while the maximum intensity of the brightening is less useful. The mechanisms of heating and emission in the chromosphere are not so simple, suggesting that the chromospheric intensity may be a less useful indicator.

The amount of flux emergence can also be measured as the time differential of magnetic flux near magnetic neutral lines. Welsch et al. (2009) differentiated the magnetic flux over 90 minutes and concluded that the 90 minutes differential is too short to be a good indicator of flare prediction. Studies by Schrijver et al. (2005) and Longcope et al. (2005) suggest that the timescale for coronal relaxation via flaring and reconnection is on the order of 24 hr. This is because the magnetic configuration is changed by flux emergence on the order of 24 hr, not on the order of 12 and 2 hr. Furthermore, the magnetic configuration varies over a short timescale but only in a local area. Therefore, the time differential of features without averaging over the whole area of an AR is better for predicting flares.

Finally, the prediction score greatly depends on the data sets used for training and testing and how the database is separated into two for the training and testing. Separating the data into years, for example, using the data for 2010–2013 for training and the data for 2014–2015 for testing, markedly decreased the prediction score. This is because the samples in the two data sets were completely unrelated to each other and no similar positive events were included in both sets of data for training and testing, leading to a more severe condition for prediction than that in the case of fully randomly shuffled data sets. As a future work, we intend to examine the dependence of the prediction score on the data set and to search for the optimal operational setting.

We acknowledge Dr. K. D. Leka for her useful comments and suggestions. We also acknowledge the referee for his/her great effort to review our paper and for giving us useful comments. This work is supported by KAKENHI grant Number JP15K17620. The data used here are courtesy of NASA/SDO and the HMI science team, as well as the Geostationary Satellite System (GOES) team.

Author e-mails

Author affiliations

ORCID iDs