Abstract
We developed a flare prediction model using machine learning, which is optimized to predict the maximum class of flares occurring in the following 24 hr. Machine learning is used to devise algorithms that can learn from and make decisions on a huge amount of data. We used solar observation data during the period 2010–2015, such as vector magnetograms, ultraviolet (UV) emission, and soft X-ray emission taken by the Solar Dynamics Observatory and the Geostationary Operational Environmental Satellite. We detected active regions (ARs) from the full-disk magnetogram, from which ∼60 features were extracted with their time differentials, including magnetic neutral lines, the current helicity, the UV brightening, and the flare history. After standardizing the feature database, we fully shuffled and randomly separated it into two for training and testing. To investigate which algorithm is best for flare prediction, we compared three machine-learning algorithms: the support vector machine, k-nearest neighbors (k-NN), and extremely randomized trees. The prediction score, the true skill statistic, was higher than 0.9 with a fully shuffled data set, which is higher than that for human forecasts. It was found that k-NN has the highest performance among the three algorithms. The ranking of the feature importance showed that previous flare activity is most effective, followed by the length of magnetic neutral lines, the unsigned magnetic flux, the area of UV brightening, and the time differentials of features over 24 hr, all of which are strongly correlated with the flux emergence dynamics in an AR.
Export citation and abstract BibTeX RIS
1. Introduction
The mechanism of solar flares is a long-standing puzzle in solar physics. The energy storage and triggering processes of flares are driven by the emergence of flux in the photosphere (e.g., Priest & Forbes 2002; Shibata & Magara 2011; Takasao et al. 2015), and flares are directly observed by a photospheric magnetogram. The shape and complexity of sunspots in white-light emission have been classified according to the sunspot growth level (McIntosh 1990). It is empirically known that larger sunspots with a large number of umbra and a more complicated magnetic flux structure tend to produce larger flares (e.g., Sammis et al. 2000; Gallagher et al. 2002; Li et al. 2008; Colak & Qahwaji 2009; Bloomfield et al. 2012; Lee et al. 2012; Barnes et al. 2016), as well as repeated flares in the same active regions (ARs) (e.g., Zirin 1988; Zirin & Marquette 1991).
Features derived from the line-of-sight magnetogram are useful indicators for future flare prediction, such as the magnetic flux, the gradient of the magnetic field (Yu et al. 2009; Steward et al. 2011), the length of magnetic neutral lines (Steward et al. 2011), the effective magnetic field (Georgoulis & Rust 2007; Papaioannou et al. 2015), the unsigned magnetic flux near the magnetic neutral lines (R-value: Schrijver 2007; Falconer et al. 2011), the total magnetic energy dissipation (Song et al. 2009), the weighted magnetic neutral line length and the distance between NS polarity sunspot centers (Mason & Hoeksema 2010), the non-potentiality (e.g., Falconer et al. 2014), and the wavelet spectra (Yu et al. 2010; Al-Ghraibah et al. 2015; Boucheron et al. 2015; Muranushi et al. 2015). These features are related to the dynamics of flux emergence and are strongly correlated with the energy storage and the triggering mechanisms.
Leka & Barnes (2003) pioneered the use of vector magnetic field data for flare prediction, and the features from a vector magnetogram were first used with machine learning by Bobra & Couvidat (2015). Detailed vector magnetogram observations show the dynamic variation of the magnetic configuration near magnetic neutral lines caused by successive flux emergence (Kubo et al. 2007), and the photospheric flow around magnetic neutral lines has been shown to be an important indicator of the occurrence of flares (Welsch et al. 2009). Recently, a model of flare triggers has been proposed by Kusano et al. (2012), in which the relative direction of an emerging flux near magnetic neutral lines to the pre-existing sheared magnetic loops determines the size of flares; this model has been supported by observations (Bamba et al. 2013; Toriumi et al. 2013).
As an emerging flux appears near magnetic neutral lines in an AR, small-scale energy release occurs in the lower chromosphere via magnetic reconnection, which has been observed using the 1600 Å filtergram of the Transition Region and Coronal Explorer as a gradual increase in the ultraviolet (UV) emission for 2–3 hr in the preflare phase (Saito 2006), as well as in CaII H emission by Hinode observations (Bamba et al. 2013). The 1600 Å filtergram observes the UV continuum, chromospheric lines, and the CIV doublet (∼1550 Å), which is strongly enhanced and well correlated with the hard X-ray emission (Brekke et al. 1996; Handy et al. 1999; Warren & Warshall 2001; Nishizuka et al. 2009). Moon et al. (2004) found UV brightening at one end of a pre-erupting filament, where magnetic reconnection occurs in the low atmosphere and changes the magnetic connectivity, leading to the initiation of the filament eruption (see also Kim et al. 2008; Guo et al. 2012).
The amount of recent open-access solar observation data is so large that it is beyond human processing ability. To deal with the data, several machine-learning algorithms (see an introductory text to machine-learning, e.g., Hastie et al. 2009) have been applied to the flare prediction problem: a neural network (Qahwaji & Colak 2007; Colak & Qahwaji 2009; Higgins et al. 2011; Ahmed et al. 2013), C4.5 decision trees (Yu et al. 2009, 2010), learning vector quantization (Yu et al. 2009; Rong et al. 2011), a regression model (Lee et al. 2007; Song et al. 2009), k-nearest neighbor (Li et al. 2008; Huang et al. 2013; Winter et al. 2015), a support vector machine (SVM) (Qahwaji & Colak 2007; Bobra & Couvidat 2015; Muranushi et al. 2015), a relevant vector machine (Al-Ghraibah et al. 2015), SVM regression (Boucheron et al. 2015), and an ensemble of four predictors (Guerra et al. 2015). However, the best algorithm for flare prediction has not been discussed in previous works, and it cannot be found without directly comparing the performances of different algorithms.
Thus, in this paper, we compared three machine-learning algorithms to find which algorithm has the highest performance for a flare prediction. We also extended the observation data period and wavelength obtained by the Solar Dynamics Observatory (SDO; Pesnell et al. 2012) and optimized each algorithm to improve the prediction accuracy. Novel features such as UV brightening and the vector magnetogram have been included, and finally the importance of different features was calculated and ranked. In Section 2, we give an overview of our prediction model, which is explained in detail in Section 3. The prediction results are described in Section 4 and a discussion and conclusion are given in Section 5.
2. Overview of our Prediction Model
The procedures of our flare prediction model are as follows. (i) First, observation data are downloaded from the web archives of SDO and the Geostationary Operational Environmental Satellite (GOES), such as the line-of-sight magnetogram, vector magnetogram, 1600 Å broadband filtergram images, and the light curves of the soft X-ray emission. (ii) Second, ARs are detected from full-disk images of the line-of-sight magnetogram, and the ARs are tracked using their time evolution. (iii) For each AR, features are calculated from multiwavelength observations, and flare labels are attached to the solar feature database if an X/M-class flare occurs within 24 hr after an image. (iv) Supervised machine learning is carried out with a 1 hr cadence to predict the maximum class of flares occurring in the following 24 hr.
Our observation data are from 2010 June to 2015 December, which were taken by SDO, launched in 2010 February. During this period, 29 X-class and 433 M-class flares were observed on the disk, accounting for 90% of the flares observed during the period. The other 10% of the flares occurred on the limb and were removed from our event list. We call the events of data with flare labels "positive events," while the other events are "negative events." It is considered that X-class flares occur on average 5–10 times per year during the solar maximum period, while M-class flares occur 100 times per year. Negative events are much more common than positive events, making the flare prediction problem an imbalanced one.
We used the line-of-sight magnetogram taken by the Helioseismic and Magnetic Imager (HMI; Scherrer et al. 2012) on board SDO, as well as the vector magnetogram. The UV continuum of the lower chromosphere was taken by the 1600 Å broadband filtergram of the Atmospheric Imaging Assembly (AIA; Lemen et al. 2012) on board SDO. The full-disk integrated X-ray emission over the range of 1–8 Å was observed by GOES. The time cadence of the line-of-sight magnetogram is 45 s, that of the vector magnetogram is 12 minutes, that of the 1600 Å filtergram is 12 s, and that of GOES is less than 1 minute. Thus, the total size of the observation data set is so large that we reduced the cadence to 1 hr, in accordance with the forecast operation. The vector magnetogram data consist of the absolute field strength, the inclination angle, the azimuth angle, and the sign to solve the 180° ambiguity problem. By converting these components to Cartesian coordinates, we calculated the features listed in Table 1.
Table 1. The Extracted Solar Features and the Importance
Number | Features | Description | Importance |
---|---|---|---|
1 | Xhis | Total history of X-class flares in an AR | 0.0519 |
2 | Xmax1d | Maximum X-ray intensity one day before | 0.0495 |
3 | Mhis | Total history of M-class flares in an AR | 0.0365 |
4 | TotNL | Total length of magnetic neutral lines in an AR | 0.0351 |
5 | Mhis1d | 1 day history of M-class flares | 0.0342 |
6 | NumNL | Number of magnetic neutral lines | 0.0341 |
7 | USFlux† | Total unsigned flux | 0.0332 |
8 | CHArea | Chromospheric (UV) brightening area | 0.0235 |
9 | Bave | Averaged magnetic field | 0.0230 |
10 | Xhis1d | 1 day history of X-class flares | 0.0224 |
11 | TotBSQ† | Total magnitude of Lorentz force | 0.0199 |
12 | Bmax | Maximum magnetic field | 0.0193 |
13 | MeanGAM† | Mean angle of the field from the radial direction | 0.0179 |
14 | dt24SavNCPP | Time derivative of SavNCPP over 24 hr | 0.0171 |
15 | dt24TotNL | Time derivative of TotNL over 24 hr | 0.0169 |
16 | dt24TotBSQ | Time derivative of TotBSQ over 24 hr | 0.0164 |
17 | TotFz† | Sum of Z-component of Lorentz force | 0.0160 |
18 | dt24TotFY | Time derivative of TotFY over 24 hr | 0.0156 |
19 | Area† | Area of the strong field in an AR | 0.0153 |
20 | TotFY† | Sum of Y-component of Lorentz force | 0.0152 |
21 | dt24TotFX | Time derivative of TotFX over 24 hr | 0.0152 |
22 | SavNCPP† | Modules of the net current per polarity | 0.0150 |
23 | TotUSJz† | Total unsigned vertical current | 0.0149 |
24 | dt24TotFZ | Time derivative of TotFz over 24 hr | 0.0145 |
25 | MeanJzh† | Mean current helicity (Bz contributions) | 0.0144 |
26 | ABSnJzh† | Absolute value of the net current per polarity | 0.0137 |
27 | CHAll | Total chromospheric (UV) brightening | 0.0134 |
28 | TotFx† | Sum of X-component of Lorentz force | 0.0132 |
29 | dt24USflux | Time derivative of USflux over 24 hr | 0.0131 |
30 | TotUSJh† | Total unsigned current helicity | 0.0129 |
31 | dt24Area | Time derivative of Area over 24 hr | 0.0128 |
32 | MeanGBt† | Mean gradient of the total field | 0.0125 |
33 | Max dxBz | Maximum of dBz/dx | 0.0116 |
34 | dt24ABSnJzh | Time derivative of ABSnJzh over 24 hr | 0.0115 |
35 | Max dyBz | Maximum of dBz/dy | 0.0112 |
36 | MeanGBz† | Mean gradient of the vertical field | 0.0112 |
37 | MeanJzd† | Mean vertical current density | 0.0111 |
38 | dt12Area | Time derivative of Area over 12 hr | 0.0110 |
39 | dt24TotUSJz | Time derivative of TotUSJz over 24 hr | 0.0110 |
40 | dt24Bmax | Time derivative of Bmax over 24 hr | 0.0107 |
41 | MaxNL | Maximum length of magnetic neutral lines | 0.0107 |
42 | Xflux4h | Averaged X-ray flux over 4 hr | 0.0106 |
43 | dt24CHArea | Time derivative of CHArea over 24 hr | 0.0103 |
44 | dt12Bmax | Time derivative of Bmax over 12 hr | 0.0097 |
45 | MeanGBh† | Mean gradient of the horizontal field | 0.0092 |
46 | Xflux1h | Averaged X-ray flux over 1 hr | 0.0091 |
47 | dt12USflux | Time derivative of USflux over 12 hr | 0.0090 |
48 | dt24 Max graB | Time derivative of Max. grad. Bz over 24 hr | 0.0088 |
49 | dt24 Max dzBy | Time derivative of Max. dBy/dz over 24 hr | 0.0088 |
50 | dt24 TotUSJh | Time derivative of TotUSJh over 24 hr | 0.0081 |
51 | dt24 NumNL | Time derivative of NumNL over 24 hr | 0.0079 |
52 | dt24 MaxdxBz | Time derivative of MaxdxBz over 24 hr | 0.0079 |
53 | dt24MeanJzh | Time derivative of MeanJzh over 24 hr | 0.0078 |
54 | dt24MaxNL | Time derivative of MaxNL over 24 hr | 0.0075 |
55 | dt02 Area | Time derivative of Area over 2 hr | 0.0071 |
56 | dt24 CHAll | Time derivative of CHAll over 24 hr | 0.0071 |
57 | Bmin | Minimum magnetic field of Bz | 0.0071 |
58 | CHMax | Maximum intensity of chromospheric (UV) brightening | 0.0062 |
59 | dt02 Bmax | Time derivative of Bmax over 2 hr | 0.0061 |
60 | dt24 CHMax | Time derivative of CHMax over 24 hr | 0.0049 |
61 | dt24 MeanGBz | Time derivative of MeanGBz over 24 hr | 0.0028 |
62 | dt24 MeanGBh | Time derivative of MeanGBh over 24 hr | 0.0021 |
63 | dt24 MeanGBt | Time derivative of MeanGBt over 24 hr | 0.0002 |
64 | dt24MeanGAM | Time derivative of MeanGAM over 24 hr | 0.0002 |
65 | dt24MeanJzd | Time derivative of MeanJzd over 24 hr | 0.0000 |
Note. The formulae of the features attached with † marks are shown in Table 2 and in Bobra & Couvidat (2015). The importance was calculated by ERT for X-class flare prediction.
3. Details of our Prediction Model
3.1. Detection of ARs
First, we detected ARs to extract solar features from the images of the downloaded observation database. We used ∼105 full-disk images of the line-of-sight magnetogram for detection with a reduced cadence of 1 hr (Figure 1). The line-of-sight magnetogram was selected for AR detection because it is less noisy than the vector magnetogram and more suitable for the processing carried out for detection. After determining ARs in magnetogram images, the frame coordinates of the ARs were applied to other images with different wavelengths (Figure 2).
Download figure:
Standard image High-resolution imageHere we defined the detection rules as follows: (i) first, we smoothed the data with 64 (=8 × 8) binning and detected the image pixels where the absolute magnetic field strength is larger than a threshold value, i.e., Bth = 140 G, for convenience (Figure 1(b)). We set the maximum value of the observation errors for the detection threshold to detect faint ARs. (ii) Second, we placed the detected pixels in squares with a side of 160 pixels (∼80'') (Figure 1(c)). Such an 80'' × 80'' square is the minimum unit of the detection region. (iii) Third, if two neighboring squares overlapped, they were combined to form a larger square containing two detected points. The repetition of this process resulted in a single large square covering the whole AR and reduced the number of detected regions (Figures 1(d) and 2(a)).
Next we neglected ARs detected on the limb, where the magnetic structure is difficult to see owing to the projection effect and is partially hidden by the limb. Additionally, the quality of the data from the vector magnetogram is poor near the limb. This is why previous papers focused on the disk-center data set. On the other hand, in an operational setting, it is necessary to deal with ARs near the limb to make predictions, but there have been no attempts to verify the effectiveness of a near-limb data set. Including the near-limb data, the size of the database is increased, making machine learning more appropriate. In this paper, to investigate the effect of the detection regions on the prediction score, we compared the following three cases by including or excluding the near-limb region: the full-disk case, an intermediate case focusing within ±53° of the center meridian (CM) (within 4/5 of the solar radius) and the disk-center case focusing within ±37° (within 3/5 of the solar radius). In Bobra & Couvidat (2015), the authors only considered flares within ±68° of the CM.
Furthermore, we tracked ARs moving along the axis of solar rotation and numbered them for identification. In the case of overlapping ARs in two successive images, we numbered them with the same ID. We detected a total of 11700 ARs from the full-disk images during the period 2010–2015. Here, we determined regions containing magnetic fields larger than 140 G as ARs, and our definition is different from NOAA's. Our model includes faint quasi ARs, so as not to miss even small flares occurring outside of NOAA's regions. If we enlarge the detection threshold, the flare occurrence rate of the data set will increase, but we cannot avoid missing some flares. Furthermore, a strong magnetic field is localized, so that not the whole area of ARs may be covered by 80'' × 80'' squares.
3.2. Extraction of Solar Features
Using the database of detected ARs, we next extracted solar features from each AR. We adopted solar features used in the previous papers, which were extracted from the line-of-sight magnetogram (e.g., Steward et al. 2011; Ahmed et al. 2013), the vector magnetogram (Leka & Barnes 2003; Bobra & Couvidat 2015), and GOES X-ray data. Furthermore, in this study, we extracted the feature of chromospheric brightening, which was obtained from the UV continuum taken by the SDO/AIA 1600 Å filtergram for the first time. The extracted features are summarized in Table 1, along with their importance ranking (which will be explained in a later section).
From the line-of-sight magnetogram, we extracted features such as the area of an AR, the maximum BLOS, the average BLOS, the unsigned magnetic flux, the gradients of the magnetic field in the longitudinal/latitudinal directions, and the number of magnetic neutral lines. The magnetic neutral line is an indicator of flare activity because it is directly related to the energy storage and triggering mechanisms. We counted the number of neutral lines in an AR and measured the maximum/total length of the lines (Figures 3(a)–(b)). We detected neutral lines using two conditions: a large magnetic field gradient and a reverse of the magnetic polarity across the lines. Here we focused on the magnetic neutral lines longer than 100 pixels (∼50'') to eliminate short and complicated neutral lines (Figure 3(b)).
Download figure:
Standard image High-resolution imageAfter preprocessing of the vector magnetogram, we calculated features using the three vector components of the magnetic field (Figure 3(c)). Using the formulae in Bobra & Couvidat (2015), we extracted vector magnetogram features: such as the vertical current, the current helicity, the Lorentz force, and the mean gradient of the total field. The formulae of the features derived from the vector magnetogram are summarized in Table 2. The corresponding features in Table 1 are marked with daggers. Moreover, we differentiated the extracted features with respect to time. The time derivatives of the features over 24, 12 and 2 hr were calculated to track the variability of ARs over different timescales.
Table 2. Formulae of AR Features
Keyword | Description | Formula |
---|---|---|
TOTUSJH | Total unsigned current helicity | |
TOTBSQ | Total magnitude of Lorenz force | |
TOTUSJZ | Total unsigned vertical current | |
ABSNJZH | Absolute value of the net current per polarity | |
SAVNCPP | Sum of the modules of the net current per polarity | |
USFLUX | Total unsigned flux | |
AREA-ACR | Area of strong field pixels in the active region | |
TOTFZ | Sum of z-component of Lorenz force | |
EPSZ | Sum of z-component of normalized Lorentz force | |
MEANGAM | Mean angle of field from radial | |
MEANGBT | Mean gradient of total field | |
MEANGBZ | Mean gradient of vertical field | |
MEANGBH | Mean gradient of horizontal field | |
MEANJZH | Mean current helicity (Bz contribution) | |
TOTFY | Sum of y-component of Lorentz force | |
MEANJZD | Mean vertical current density | |
TOTFX | Sum of x-component of Lorentz force | |
EPSY | Sum of y-component of normalized Lorentz force | |
EPSX | Sum of x-component of normalized Lorentz force |
Note. The formulae in this table are quoted from Bobra & Couvidat (2015).
Download table as: ASCIITypeset image
Brightening in the lower chromosphere is another indicator of flares. A few hours before a flare onset, the lower chromosphere is gradually heated, emitting light in the UV range (Figure 3(d)). The brightening is located around magnetic neutral lines. We extracted chromospheric (UV) features and used them for training in the machine learning for the first time; these features included the maximum intensity, the brightening area, and the total intensity of UV brightening in a whole AR. We used AIA 1600 Å filtergram images of SDO representing the lower-chromosphere brightening.
The exposure time of observations using the 1600 Å filtergram of AIA/SDO is almost constant (∼3 s) from 2010 to 2015; thus, we used the original photon numbers for feature calculations. We set a threshold intensity to determine the brightening area as 700 photon cm−5 s−1 pix−1. The threshold intensity was determined for features to show large variations by a parameter survey. The total intensity of the UV brightening was calculated by integrating the intensity above the threshold over the pixels of the determined brightening area.
We also used GOES X-ray data in the range of 1–8 Å as an indicator of previous and current flare activities proposed by several authors (e.g., Zirin & Marquette 1991; Wheatland 2004). We measured the background level of the X-ray intensity by averaging the light curve of X-ray emission over 1 and 4 hr. We derived the maximum intensity one day before an image and counted the number of previous flares in an AR one day before and for the total period after the AR emergence, referred to as the 1 day history and the total history of X/M-class flares, respectively.
3.3. Classification by Machine Learning
We used three machine-learning algorithms for comparison: the SVM, k-nearest neighbors (k-NN), and extremely randomized trees (ERT). Each algorithm was used as a classifier of the flare class and optimized to maximize the skill score true skill statistic (TSS, explained in a later section).
3.3.1. SVM Classifier
The SVM is a pattern recognition model using supervised learning (Vapnik & Lerner 1963; Boser et al. 1992; Cortes & Vapnik 1995). It is an algorithm of classifiers that uses a linear input to determine the maximum-margin hyperplane with the largest margin relative to certain points that belong to each group of the training sample. The learning process involves solving an optimization problem using Lagrange multipliers and the KKT condition. When the number of training samples increases, the calculation time rapidly increases. Here, we used a radial basis function kernel (RBF kernel or Gaussian kernel). The hyperparameter C was set to C = 10, which gave the most promising results. Following the standard method, the other hyperparameter γ was set to γ = 1/(the number of features).
3.3.2. k-Nearest Neighbor (k-NN) Classifier
The k-NN algorithm is a classifier based on the nearest instances in a feature space and is the simplest machine-learning algorithm in this study (Dasarathy 1991). The classification of objects is determined by the votes of the nearest groups, that is, an object is assigned to the most popular class of the nearest k objects, where k is an integer. When k = 1, an object is classified as being the same as the nearest object. Each feature is described by a position vector, and the distances among features are measured by the Euclidean or Manhattan distance. The k-NN algorithm is likely to be affected by the locality of the data. The selection of k is performed by several heuristics. A large k can reduce noise but make the border obscure. In this paper, to optimize our model, we set k = 1, for which the nearest instance in the training data set defines the prediction. Furthermore, we adopted the Manhattan distance, i.e., the distance .
3.3.3. ERT Classifier
The random forest (Breiman 2001) fits the number of decision-tree classifiers on various subsamples of a data set and uses averaging to improve the prediction accuracy. It selects random splits to separate the subset of a node into two subsets for each of the randomly selected features, and the best split is chosen. In the ERT classifier (Geurts et al. 2006), a random subset of candidate features is used, but thresholds are selected at random for each candidate feature and the best of these randomly generated thresholds is chosen. ERT prevents overfitting and increases the calculation speed by parallelization. We set the number of trees to 300 in this paper.
Another advantage of ERT is the possibility of calculation of the importance of features. Breiman (2001) proposed a method of evaluating the importance of a feature Xm for predicting Y in a tree structure T by adding the decreases in the weighted impurity p(t)Δ i(st, t) for all nodes t where Xm is used, then averaging over all NT trees in the forest:
where p(t) is the proportion Nt/N of samples reaching node t and v(st) is the feature used in split st (see also Louppe et al. 2013). The decrease in some impurity measure i(t) (e.g., the Gini index, the Shannon entropy, or the variance of Y) at node t is defined by the following formula:
where and , and the split st = s* for which the partition of the N node samples into two subsets tL and tR maximizes the decrease in the impurity is identified. When nodes become pure in terms of Y, the construction of the tree stops. We use Gini index as the impurity function, and this measure is known as the Gini importance or the mean decrease Gini.
3.4. Standardization, Evaluation, and Cross-validation
3.4.1. Standardization
The extracted solar features have different units and different scales; thus, data standardization is required. The standardization strongly affects the prediction accuracy, although this has not been widely acknowledged by the solar flare forecast community. We used the Z-value for standardization, i.e.,
where X is the original value of the extracted solar feature, μ is the mean, and σ is the standard deviation (e.g., Bishop 2006). Therefore, Z-values are expressed in terms of standard deviations from means. As a result, these Z-values have a distribution with a mean of 0 and a standard deviation of 1. For parameters with a large-scale variation, we took the logarithm first and then calculated the Z-value.
3.4.2. Redistribution to Training/Testing Data Set
The solar feature database with 1 hr cadence was fully shuffled and randomly separated in two data sets with a size ratio of 7:3 to obtain the data sets for training and testing, respectively. This ratio of 7:3 was adopted from previous works (Ahmed et al. 2013; Bobra & Couvidat 2015). Note that the solar feature database is appended with flare labels when the sample is within 24 hr of an X/M-class flare. Thus, there are at most 24 positive events per flare, and these events are included in both the training and testing databases.
3.4.3. Validation by TSS and Cross-validation
We evaluated the prediction results using the past data in 2010–2015 with a skill score, the TSS. This is also called the Hanssen–Kuiper skill score or Peirce skill score, and is defined by
where TP, FN, FP, and TN are the numbers of true positives, false negatives, false positives, and true negatives, respectively. The score has a range of −1 to +1, with 0 representing no skill and 1 representing perfect prediction. The TSS expresses the hit rate relative to the false alarm rate, and it remains positive provided the hit rate is greater than the false alarm rate. Flare prediction is an imbalanced problem, which means that negative events are much more frequent than positive ones. Bloomfield et al. (2012) suggested the use of the TSS because it is not affected by imbalanced problems (see also Bobra & Couvidat 2015). This is why we selected the TSS for the evaluation of our prediction results.
It is important to show that our model did not suffer from overfitting. To show the validity of our approach, we used cross-validation (CV), which is the standard approach in this field. There are several types of CV approach such as K-fold CV, shuffle and split CV, leave-one-out CV, and so forth. In K-fold CV, the data set is partitioned into K partitions and one partition acts as the validation set, and K = 5 or K = 10 is usually used. Note that the validation set acts as a test set, but technically it is called the validation set.
Since there are much fewer positive samples than negative samples in solar-flare classification, a large validation set contains more positive samples. This allows us to analyze the common features of misclassification results compared with the case where a smaller validation set is used, such as in 10-fold CV.
For the above reason, we selected shuffle and split CV to show the validity of our model. The data set was shuffled and partitioned into training and validation sets. The size ratio of the two sets was 7:3, which is widely used in machine-learning and data-mining studies. This process was executed 10 times and the average results are shown.
4. Prediction Results
We performed supervised machine learning using the solar feature database to predict the maximum class of flares occurring in the following 24 hr. We used three machine-learning algorithms, k-NN, SVM, and ERT, to reveal which algorithm is the most effective for a flare prediction model. We predicted two types of flare: X-class flares and ≥M-class flares. The prediction results for the three algorithms are summarized in Table 3, where in addition to a contingency table, the average TSS and the standard deviation as the error calculated by 10-times shuffle and split CV are listed.
Table 3. The Prediction Results of X-class Flares and ≥M-class Flares
Algorithm | TP | FP | FN | TN | TSS |
---|---|---|---|---|---|
(a) X-class flares | |||||
k-NN | 152 | 14 | 11 | 54439 | 0.91 ± 0.03 |
SVM | 120 | 22 | 16 | 54458 | 0.88 ± 0.03 |
ERT | 134 | 7 | 29 | 54446 | 0.82 ± 0.04 |
(b) ≥M-class flares | |||||
k-NN | 1544 | 121 | 155 | 52796 | 0.912 ± 0.005 |
SVM | 1496 | 473 | 203 | 52444 | 0.870 ± 0.007 |
ERT | 1216 | 39 | 483 | 52878 | 0.71 ± 0.02 |
Note. The contingency tables of prediction results of X-class flares and ≥M-class flares, for the three machine-learning algorithms, k-NN, SVM, and ERT.
Download table as: ASCIITypeset image
The three algorithms show different prediction performances. Here, the feature database we used includes the previous flare activity derived from GOES data. For the prediction of X-class flares, the TSS was 0.91 ± 0.03 for k-NN, 0.88 ± 0.03 for SVM and 0.82 ± 0.04 for ERT. For ≥M-class flares, the TSS was 0.912 ± 0.005 for k-NN, 0.870 ± 0.007 for SVM, and 0.71 ± 0.02 for ERT. Consequently, the k-NN algorithm was found to show the highest performance among the three algorithms on the TSS, both for X-class and ≥M-class flare prediction, followed by SVM then ERT. However, the FP was smallest for ERT. In Table 3, since the error of TSS is sufficiently small, the overfitting is small.
Table 4 shows the prediction results when the flare history for the previous day, the flare history during the whole period after the appearance of an AR, and the maximum X-ray intensity for the previous day are neglected. This result shows only the contribution of magnetogram and chromospheric (UV) images to flare prediction. For the prediction of X-class flares, the TSS was 0.91 ± 0.02 for k-NN, 0.86 ± 0.02 for SVM, and 0.62 ± 0.03 for ERT, and for ≥M-class flare prediction, the TSS was 0.904 ± 0.005 for k-NN, 0.856 ± 0.009 for SVM, and 0.63 ± 0.01 for ERT. By comparing Tables 3 and 4, we found that the TSS values in Table 4 are within the error of the values reported in Table 3 for k-NN and SVM, so there is no statistical difference between the two. Only for ERT, we found a slight increase in the TSS upon consideration of the previous flare activity.
Table 4. The Prediction Results of X-class Flares and ≥M-class Flares, Neglecting Features of Previous Flare Activities
Algorithm | TP | FP | FN | TN | TSS |
---|---|---|---|---|---|
(a) X-class flares | |||||
k-NN | 136 | 16 | 15 | 54449 | 0.91 ± 0.02 |
SVM | 130 | 23 | 21 | 54442 | 0.86 ± 0.02 |
ERT | 87 | 4 | 49 | 54476 | 0.62 ± 0.03 |
(b) ≥M-class flares | |||||
k-NN | 1570 | 173 | 167 | 52706 | 0.904 ± 0.005 |
SVM | 1501 | 759 | 236 | 52120 | 0.856 ± 0.009 |
ERT | 1105 | 35 | 632 | 52844 | 0.63 ± 0.01 |
Note. The contingency tables of prediction results of X-class flares and ≥M-class flares, for the three machine-learning algorithms, k-NN, SVM, and ERT.
Download table as: ASCIITypeset image
The importance of features calculated by ERT is also given in Table 1 for the case where the previous flare activity is considered. According to Table 1, the most effective feature for X-class flare prediction is the total history of X/M-class flares in an AR, followed by the maximum X-ray intensity one day before an image and the 1 day history of X/M-class flares. The next most effective features are the total length of magnetic neutral lines, the number of neutral lines, the unsigned magnetic flux, and the UV brightening area. The average magnetic field is ranked next, followed by features derived from the vector magnetogram and the time derivative of each feature over 24 hr.
We included novel features, such as UV brightening and the time derivative of features over 24, 12 and 2 hr. For UV brightening, the importance of the brightening area and the total intensity is relatively high, while the importance of the maximum intensity is very low. When we compare the time derivatives over different timescales, the time derivative over 24 hr is effective for flare prediction, but those over 12 and 2 hr are ineffective. Note that the magnetic free energy, the shear angle, and the unsigned magnetic flux near magnetic neutral lines have not been considered in this paper.
Furthermore, we compared the TSS for different detection areas of ARs, including or excluding near-limb regions. We set the detection area as the full disk, an intermediate area within ±53° of the CM, and the disk center with a focusing area within ±37°. The prediction results in the latter two cases are summarized in Table 5. For X-class flare prediction in the intermediate case, the TSS was 0.92 ± 0.03 for k-NN, 0.89 ± 0.02 for SVM and 0.88 ± 0.03 for ERT. For the disk center, the TSS was 0.94 ± 0.02 for k-NN, 0.92 ± 0.03 for SVM and 0.88 ± 0.06 for ERT. ERT is greatly improved by neglecting the near-limb ARs. Consequently, when the near-limb regions were neglected, the TSS was improved for all the algorithms. However, we also stress that in an actual operational setting, the TSS with consideration of the near-limb regions is more realistic.
Table 5. The Prediction Results of X-class Flares with Different Detection Regions
Algorithm | TP | FP | FN | TN | TSS |
---|---|---|---|---|---|
(a) An intermediate area | |||||
k-NN | 87 | 8 | 5 | 43277 | 0.92 ± 0.03 |
SVM | 84 | 12 | 7 | 43274 | 0.89 ± 0.02 |
ERT | 80 | 0 | 10 | 43287 | 0.88 ± 0.03 |
(b) The disk-center focusing area | |||||
k-NN | 54 | 2 | 4 | 26782 | 0.94 ± 0.02 |
SVM | 57 | 3 | 5 | 26777 | 0.92 ± 0.03 |
ERT | 55 | 2 | 8 | 26777 | 0.88 ± 0.06 |
Note. The contingency tables of prediction results of X-class flares with different detection regions: with an intermediate area within ±57° of the CM (within 4/5 of the solar radius) and with the disk center with a focusing area within ±37° (within 3/5 of the solar radius). We used the three machine-learning algorithms, k-NN, SVM, and ERT.
Download table as: ASCIITypeset image
5. Summary and Discussion
We developed a flare prediction model with supervised machine-learning techniques using solar observations of a vector magnetogram and UV brightening. By detecting ARs, we extracted novel features and attached flare labels. Using training and test data sets constructed from the fully shuffled data set, we performed machine learning to predict the maximum class of flares that occur in the following 24 hr after observation images. One aim of this paper was to reveal which machine-learning algorithm is most suitable for a flare prediction model, and we compared three algorithms for the first time. Ranking of the importance of our novel features was another aim of this paper, and we attempted to compare the effectiveness of different features for flare prediction.
Our prediction model achieved a skill score, the TSS, of greater than 0.9. The average performance of the k-NN algorithm was superior to those of SVM and ERT. One of the reasons why the TSS is improved with our model is the use of standardization, which strongly affects the prediction accuracy, although this has not been widely acknowledged by the solar flare forecast community. Here we used the Z-value for standardization. Furthermore, the optimization of our model, such as by incorporating the Manhattan distance, improved the TSS.
In the daily forecast operations at NICT space weather forecast center, which use the knowledge of experts, the TSS was 0.21 for X-class flares and 0.50 for ≥M-class flares during the period 2000–2015 (Kubo et al. 2016). At the Solar Influences Data Center of the Royal Observatory of Belgium, the TSS was 0.34 for ≥M-class flares during the period 2004–2012 (Devos et al. 2014). Thus, our prediction model appears to achieve better performance than human operations. On the other hand, using the fully shuffled data set, several positive events before a flare can be divided into both training and test data sets, and consequently the prediction score is increased. In particular, k-NN was most effective in this study and gave the highest TSS, similar to in other studies.
We also found that the TSS varies with the detection area; we considered the full-disk area including the near-limb regions, an intermediate area, and the disk-center focusing area. Upon neglecting the near-limb data, the accuracy of the features extracted from observation data sets was increased, thus improving the TSS. On the other hand, in an operational setting, a data set with the near-limb region is more realistic, and the evaluation would be more similar to human operations.
Next, we investigated the ranking of the importance of features. We showed that previous flare activity, such as the flare history in an AR and the maximum X-ray intensity in the previous day, is most important. The configurations of magnetic neutral lines, the unsigned magnetic flux, and the area of UV brightening are next most important. We also showed that the time derivative of features over 24 hr is useful for prediction, while the time derivatives over 12 and 2 hr are not. We also found that the features of the vector magnetogram have only moderate importance, although our model did not include the magnetic free energy and the shear angle of magnetic fields to the magnetic neutral line.
The importance of previous flare activity has been pointed out by several authors. The tendency for regions that have already flared to soon flare again is referred to as "persistence" in the flare forecast literature (Zirin & Marquette 1991). Wheatland (2004) pointed out that future flare prediction is improved by adding the history of the occurrence of flares (of all sizes) to the McIntosh classification model by using a Bayesian approach. Welsch et al. (2009) showed that the flare flux averaged over a 24 hr window exhibits some discriminant power by calculating the discriminant function coefficient (e.g., Barnes et al. 2007; Leka & Barnes 2007, 2003). However, the relative importance of previous flare activity was not shown in the previous papers, and in this paper, we directly showed that it is an important indicator for future flare prediction.
Welsch et al. (2009) also showed that the proxy Poynting flux and the unsigned flux around strong magnetic neutral lines (R-value) are important indicators of flares because they are related to the dynamics of flux emergence and are recognized as flare triggers. These features are not included in our study, but instead the maximum length, the total length, and the number of magnetic neutral lines in an AR were found to have high importance. The activity of flux emergence is also correlated with chromospheric brightening, which we adopted for the first time. The brightening is mainly observed along magnetic neutral lines, and it was found that the area of chromospheric brightening is a useful indicator of flare prediction, while the maximum intensity of the brightening is less useful. The mechanisms of heating and emission in the chromosphere are not so simple, suggesting that the chromospheric intensity may be a less useful indicator.
The amount of flux emergence can also be measured as the time differential of magnetic flux near magnetic neutral lines. Welsch et al. (2009) differentiated the magnetic flux over 90 minutes and concluded that the 90 minutes differential is too short to be a good indicator of flare prediction. Studies by Schrijver et al. (2005) and Longcope et al. (2005) suggest that the timescale for coronal relaxation via flaring and reconnection is on the order of 24 hr. This is because the magnetic configuration is changed by flux emergence on the order of 24 hr, not on the order of 12 and 2 hr. Furthermore, the magnetic configuration varies over a short timescale but only in a local area. Therefore, the time differential of features without averaging over the whole area of an AR is better for predicting flares.
Finally, the prediction score greatly depends on the data sets used for training and testing and how the database is separated into two for the training and testing. Separating the data into years, for example, using the data for 2010–2013 for training and the data for 2014–2015 for testing, markedly decreased the prediction score. This is because the samples in the two data sets were completely unrelated to each other and no similar positive events were included in both sets of data for training and testing, leading to a more severe condition for prediction than that in the case of fully randomly shuffled data sets. As a future work, we intend to examine the dependence of the prediction score on the data set and to search for the optimal operational setting.
We acknowledge Dr. K. D. Leka for her useful comments and suggestions. We also acknowledge the referee for his/her great effort to review our paper and for giving us useful comments. This work is supported by KAKENHI grant Number JP15K17620. The data used here are courtesy of NASA/SDO and the HMI science team, as well as the Geostationary Satellite System (GOES) team.