A Novel Drift Detection Algorithm Based
A Novel Drift Detection Algorithm Based
287 – 298
10.2478/jaiscr-2020-0019
Abstract
The training set consists of many features that influence the classifier in different degrees.
Choosing the most important features and rejecting those that do not carry relevant in-
formation is of great importance to the operating of the learned model. In the case of
data streams, the importance of the features may additionally change over time. Such
changes affect the performance of the classifier but can also be an important indicator
of occurring concept-drift. In this work, we propose a new algorithm for data streams
classification, called Random Forest with Features Importance (RFFI), which uses the
measure of features importance as a drift detector. The RFFT algorithm implements solu-
tions inspired by the Random Forest algorithm to the data stream scenarios. The proposed
algorithm combines the ability of ensemble methods for handling slow changes in a data
stream with a new method for detecting concept drift occurrence. The work contains an
experimental analysis of the proposed algorithm, carried out on synthetic and real data.
Keywords: data stream mining, random forest, features importance
provide the output at any time, and the resources pendent models based on these subsets. The ran-
used by the model should be strictly limited. The dom forests algorithm also uses bootstrap samples,
algorithms of DSM have found many applications, but additionally, different features are excluded in
e.g. in network traffic analysis [14], financial data different subsets. It should be noted that decision
analysis [15], or credit card fraud detection [16]. trees are used as weak classifiers in the RF algo-
Recently, the possibilities of combining stream pro- rithm. The adaptation of these algorithms to work
cessing methods with deep learning techniques are in the case of data streams requires some modifica-
being explored [13, 17]. tions due to the need for minimizing the time for
One of the most popular techniques for data processing available data.
stream mining are ensemble algorithms [18, 19]. In The motives mentioned above inspired us to
the classic approach, their main idea is to combine propose a new algorithm for data streams classifi-
the outputs of models built only on a part of data. cation using the RF method. Our algorithm com-
This allows for achieving better results with respect bines the ability of ensemble methods for handling
to a single component. A simple modification of a slow changes in a data stream (passive reacting)
classic ensemble algorithm allows us to effectively with a new method, based on FI, developed herein
adapt the model to the changes observed in incom- to detecting (active reacting) concept drift occur-
ing data. Training new components on chunks of rence. Developed methods potentially can be ap-
recent data can keep the model up-to-date. The ap- plied to monitor various industrial processes, see
propriate criterion for including a new component e.g. [26–31].
into the ensemble is an important factor that affects The rest of the paper consists of the following
the performance of the model. This issue is cur- sections. Section 2 presents the main trends in the
rently the subject of many studies [20–23]. area of ensemble classifiers, random forest, and fea-
A non-stationarity phenomenon in the context tures importance. In Section 3, the descriptions of
of data streams is called a concept-drift. In the liter- the RF algorithm and the method of computing FI
ature two types of concept-drifts are distinguished: are shown. The proposed ensemble algorithm and
virtual, when changes in the distribution of data new drift detector are presented in Section 4. Sec-
do not affect the decision boundaries, and the real, tion 5 depicts results obtained in simulations per-
when the decision boundaries are changed. There formed on synthetic and real data. The article ends
are a few approaches that allow updating the algo- with the conclusions presented in Section 6.
rithms to operate in a new environment. One of
them is the passive approach [24, 25]. It is based
on the continuous adaptation of the model to cur- 2 Related works
rent data, it is used, inter alia, in ensemble algo-
In this Section, we recall the most significant
rithms. Another method, the so-called active ap-
and the most recent papers about ensemble meth-
proach, is based on the permanent monitoring of
ods, features importance, and drift detectors.
the stream itself and to indicate in which moment
the concept-drift took place. The methods that indi- Ensemble methods are popular techniques of
cate the moments of significant change in data dis- data mining in a static environment. They owe their
tribution are called drift detectors (DDs). In this ap- popularity to the possibility of using them to solve
proach, the model is updated only if DD signalizes many real-world problems, see e.g. [32]. The most
that the concept-drift occurred. significant features that distinguish various ensem-
ble methods are the method of creating new com-
The most popular techniques for creating en-
ponents and the methods for aggregating outputs.
sembles of classifiers are bagging and random
However, their adaptation to operate in the data
forests (RF). These methods allow the creation of
stream scenario also requires important modifica-
many different models from one training set. For
tions, in particular, a special approach to data pre-
this purpose, they use the bootstrap samples tech-
processing.
nique. This method consists in generating sev-
eral subsets by sampling with replacement from the In the Streaming Ensemble Algorithm (SEA)
training set. The idea of bagging is to learn inde- [18] the authors proposed to create the new clas-
A NOVEL DRIFT DETECTION ALGORITHM BASED ON . . . 289
sifier based on chunks of data (subsequently gath- the ADWIN algorithm [44] and the Page-Hinkley
ered from the stream and forgotten after process- test [45] is applied.
ing). To decide about classifying a new instance, As a consequence of the model training based
a major voting strategy was applied. In [33] the on nonstationary data, the significance of particular
authors proposed the Accuracy Weighted Ensemble features can change over time. Such type of drift
(AWE) algorithm, which is an improvement of the is called contextual concept drift [46], or feature
SEA algorithm by weighting the power of a vote of drift [47]. In [48] the authors adopt the off-line
each component according to its accuracy. Addi- evaluating feature importance procedure to oper-
tionally, the authors proved that the decision made ate in the on-line scenario with classification mod-
by the ensemble will always be at least as good as els. They proposed two models based on a mean
made by a single classifier. The resampling method decrease in Gini impurity and a mean decrease in
inspired by AdaBoost was proposed in the Learn++ accuracy, respectively. In [49] the authors investi-
algorithm [34], originally in a static environment. gate the statistical properties of feature importance
Additionally, the authors proposed a new way to measure to propose a novel algorithm called Rein-
establish the weights for the base classifiers. This forcement learning trees. The method called Iter-
idea was adapted to the data stream scenario in [35]. ative Subset Selection was proposed in [50]. This
In [22] the authors proposed a method to include method, first ranking the features, and then iter-
newly create components only if it ensures increas- atively selecting the best features from the rank-
ing the accuracy not only for a current chunk of data ing. More about feature selection on the static and
but also for a whole data stream. In [23] a new streaming environments can be found in [51, 52],
weighting strategy was proposed, assuming that the and [53].
weak learners are decision trees. Instead of assign-
ing a weight to the whole tree, the authors propose In literature, there exist many drift detecting
to establish weights in the leaves. The online ver- methods. One of the most popular DD is the
sion of Bagging and Boosting was proposed in [19], CUSUM algorithm [45]. It is based on tracing a
and this approach was extended in [36]. For more performance measure (e.g., the accuracy of classi-
recent information about ensemble algorithms, the fier). If this measure in the consecutive steps ex-
reader is referred to [37] and [38]. ceeds a fixed threshold, then the cumulative sum
starts to grow. If it is higher than a certain thresh-
In the paper [39], the author presents a pro- old, then the algorithm indicates concept-drift. The
cedure of random forest in a static environment, Page-Hinkley test examines differences between
by introducing randomness both on a training set current observations and means of previously an-
and on a feature set. This idea was tailored to the alyzed data in a similar way to the CUSUM algo-
data stream scenario in several ways. The Dynamic rithm. The DDM [54] (Drift Detection Method) al-
Streaming Random Forest (DSRF) was proposed gorithm treats data from a stream as Bernoulli tri-
in [40]. In this approach, after the initial phase of als (assigning them values of 0 or 1 depending on
the subsequently generating a finite number of trees, whether they were correctly classified by the cur-
the algorithm update statistics defining thresholds rent model). The final decision is based on a test
for decision trees construction. Then the algorithm that takes into account the means and standard de-
update forest with a fixed percentage of the trees. In viations of the previous trials. It was enhanced to
the DSRF algorithm, the entropy of incoming data deal with abrupt concept drift as the EDDM algo-
is measured for drift detecting. If the drift is de- rithm [55]. The Adwin algorithm [44] is based on
tected, all the parameters of the algorithm are re- a sliding window. It searches the point on current
set to initial values, and the algorithm replaces a windows to obtain two sub-windows with signifi-
specific number of trees in the forest, which num- cantly different means values. The decision is taken
ber depends on the value of measured entropy. Its on a base of the Hoeffding’s bound. Moreover, as
ideas are extended in the paper [41]. In [42] the au- drift detectors, different methods of tracing mov-
thors propose the Adaptive Random Forests algo- ing averages can be used. One of the most popular
rithm, which combines classical random forest pro- is GMADM (geometric moving average detecting
cedure with Hoeffding’s decision trees [43]. To re- method) [56].
act to changes in data stream, a procedure based on
290 Piotr Duda, Krzysztof Przybyszewski, Lipo Wang
for j = 1, . . . , d.
Algorithm 1. The RFFI algorithm
The initial values of FI (V I 0j ) are computed
based on chunk B0 . Every next value of FI will be
computed on unseen testing set Bt+1 . The idea of Data: Data stream S in a form of data
the proposed drift detection method is to compare, chunks B0 , B1 , B2 , ...; Number of
new values of V I j , with the previously obtained. initial trees M0 , Number of
The significance of the changes is tested by appli- addiitional trees M
cation of the Hoeffding’s bound [59] Result: Ensemble of classifiers
t = 0;
√ Take the first chunk Bt from the stream ;
R2 ln 1/α
V I 0j −V I j < , (6) Train Random Forest on Bt (M0 trees);
2n Compute V Ij0 on Bt by the equation (5),
where R is a range of considered random variable for j = 1, 2, . . . , d;
(in this particular case equal 2), and α is a fixed while new data chunks are available do
parameter. If inequality (6) is satisfied, then any t = t + 1;
changes are not made in the ensemble. New trees Take next chunk Bt from the stream ;
can be trained on Bt+1 , to add them to the ensem- Compute V Ij on Bt by the equation
ble. The number of additional trees, equal to M, is (5), for j = 1, 2, . . . , d;
fixed by the user. In the other case, we replace the Compute differences V Ij0 − V Ij for
forest by a new one trained on the current chunk of j = 1, . . . , d;
data and replace V I 0j values by the lastly obtained. Choose feature F which maximize the
computed differences;
In this paper, we will examine three different
if inequality (6) is not satisfied for
strategies for computing V I j values.
feature F then
V Ij0 = V Ij ;
FP - Fixed Permutation. In this approach, one per- Train new Random Forest on Bt
mutation is used for every feature and every (M0 trees);
chunk of data during the whole data stream pro- else
cessing. Do not make any changes;
Add new M random trees, trained
SP - Single Permutation. In this approach, one per- on Bt , to the forest;
mutation is used for every feature, but with every end
chunk of data, a new permutation is chosen. end
of the FI values, synthetic data was generated us- lar, the Adaptive Random Forest (ARF) algorithm
ing the Random Tree Generator implemented in the equipped with various drift detection methods. In
MOA software [60]. The data were described by the simulations, the following DD methods, de-
25 numerical features and one of a five-class label. scribed in Section 2 were used: Adwin, CUSUM,
The first chunk of stream, containing 2000 data el- DDM, EDDM, GMADM, and no-change (NoCh).
ements, was used to train the RFFI algorithm. The Those algorithms use VFDT as a weak classifier.
second one, obtained from the same stream (with- The number of components in the RFFI algorithm
out concept-drift), was used to compute FI. The val- was set to 10, M = 0, and the level of confidence
ues obtained by SP, FP, and MP methods (see Sec- for the drift detector was equal to α = 0.9. The ID3
tion 4), were computed 124 times independently. algorithm was applied as the weak classifier. The
The results for one feature, in the form of boxplot, results obtained after each 1000 data elements are
are presented in Figure 1. The values of MP are the presented in Figure 2.
results of averaging 25 samples.
One can see that the widest variety of values
is in the case of the FP method. The remaining
methods seem to provide similar values. The MP
method gives slightly more stable outputs, but at the
expense of an increasing number of computations,
it seems to be the worse choice for data stream pro-
cessing.
In the following subsections, the SP method is
applied to investigate various types of concept-drift.
sure and confidence level of equal to δ = 0.05 one concept. Then, data from two different con-
[HT-E-0.05] cepts began to be appended to the stream, alter-
nately, each in a package of 2000 elements. Ulti-
– VFDT algorithm with entropy as impurity mea- mately, the stream contains 100000 elements. The
sure and confidence level of equal to δ = 0.01 comparisons with other random forest-based and
[HT-E-0.01] stare-of-the-art algorithms are presented in Figures
4 and 5, respectively.
– VFDT algorithm with Gini index as impurity
measure and confidence level of equal to δ =
0.05 [HT-G-0.05]
In the following experiment, the performance of tion about which feature (or features) has the most
the RFFI algorithm is compared with the best oper- significant impact on the detected drift.
ating algorithm from the previous subsection, i.e. As part of further work, we will improve the
OZA − 10, OZA − B − 10, HT − E − 0.05, ARF − drift detector to enhance its work in a rapidly chang-
CUSUM. The actual moments of the concept-drift ing environment. In the presented version detector
occurrence are not known in the case of real data. operates on data chunks. In the future, we will try
As the prequential evaluation was used, during the to develop its fully on-line version.
learning process the algorithms obtain various ac-
curacies for the subsequent data-chunks. The pro-
posed algorithm allows for achieving the best result, References
93.9, of all algorithms. However, one can see that
[1] P. Duda, M. Jaworski, L. Pietruczuk, and
the most stable accuracy was provided by the AFT-
L. Rutkowski, A novel application of Hoeffding’s
Adwin, which is probably due to the use of sliding inequality to decision trees construction for data
windows. None of the data chunk-based approaches streams, in Neural Networks (IJCNN), 2014 In-
have given better results. ternational Joint Conference on. IEEE, 2014, pp.
The aggregated values of accuracy and stan- 3324–3330.
dard deviations obtained after processing the whole [2] L. Rutkowski, L. Pietruczuk, P. Duda, and M. Ja-
stream, and the maximal values of accuracies are worski, Decision trees for mining data streams
depicted in Table 1. based on the McDiarmid’s bound, IEEE Transac-
tions on Knowledge and Data Engineering, vol. 25,
Table 1. Average accuracies (Aa) and standard no. 6, pp. 1272–1279, 2013.
deviations (Sd), in percents, and the maximum
value obtained by the algorithms on real dataset [3] L. Rutkowski, M. Jaworski, L. Pietruczuk, and
P. Duda, Decision trees for mining data streams
Algorithm Aa Sd max based on the Gaussian approximation, IEEE Trans-
actions on Knowledge and Data Engineering,
Oza-10 82.5 6.25 91.4
vol. 26, no. 1, pp. 108–119, 2014.
Oza-B-10 84.28 4.89 91.7
HT-E-0.05 82.54 4.7 92 [4] L. Rutkowski, M. Jaworski, L. Pietruczuk, and
ARF-Adwin 88.8 2.22 93.5 P. Duda, The CART decision tree for mining data
RFFI 85.37 5.79 93.9 streams, Information Sciences, vol. 266, pp. 1–15,
2014.
The presented results demonstrate that the RFFI
[5] L. Pietruczuk, L. Rutkowski, M. Jaworski, and
algorithm can be effectively used to analyze real P. Duda, The parzen kernel approach to learning
data. in non-stationary environment, in Neural Networks
(IJCNN), 2014 International Joint Conference on.
IEEE, 2014, pp. 3319–3323.
[9] M. Jaworski, P. Duda, L. Rutkowski, P. Najge- [20] P. Duda, On ensemble components selection in
bauer, and M. Pawlak, Heuristic regression func- data streams scenario with gradual concept-drift, in
tion estimation methods for data streams with con- International Conference on Artificial Intelligence
cept drift, in Lecture Notes in Computer Science. and Soft Computing. Springer, 2018, pp. 311–320.
Springer, 2017, pp. 726–737.
[21] P. Duda, M. Jaworski, and L. Rutkowski, On en-
[10] M. Jaworski, P. Duda, and L. Rutkowski, On apply- semble components selection in data streams sce-
ing the restricted boltzmann machine to active con- nario with reoccurring concept-drift, in 2017 IEEE
cept drift detection, in Computational Intelligence Symposium Series on Computational Intelligence
(SSCI), 2017 IEEE Symposium Series on. IEEE, (SSCI). IEEE, 2017, pp. 1–7.
2017, pp. 1–8.
[11] M. Jaworski, Regression function and noise vari- [22] L. Pietruczuk, L. Rutkowski, M. Jaworski, and
ance tracking methods for data streams with con- P. Duda, A method for automatic adjustment of en-
cept drift, International Journal of Applied Math- semble size in stream data mining, in Neural Net-
ematics and Computer Science, vol. 28, no. 3, pp. works (IJCNN), 2016 International Joint Confer-
559–567, 2018. ence on. IEEE, 2016, pp. 9–15.
[12] P. Duda, M. Jaworski, and L. Rutkowski, Con- [23] L. Pietruczuk, L. Rutkowski, M. Jaworski, and
vergent time-varying regression models for data P. Duda, How to adjust an ensemble size in stream
streams: Tracking concept drift by the recursive data mining? Information Sciences, vol. 381, pp.
parzen-based generalized regression neural net- 46–54, 2017.
works, International Journal of Neural Systems,
vol. 28, no. 02, p. 1750048, 2018. [24] G. Ditzler, M. Roveri, C. Alippi, and R. Polikar,
Learning in nonstationary environments: A sur-
[13] P. Duda, M. Jaworski, A. Cader, and L. Wang, vey, IEEE Computational Intelligence Magazine,
On training deep neural networks using a stream- vol. 10, no. 4, pp. 12–25, 2015.
ing approach, Journal of Artificial Intelligence and
Soft Computing Research, vol. 10, no. 1, 2020. [25] P. Duda, L. Rutkowski, M. Jaworski, and
D. Rutkowska, On the Parzen kernel-based prob-
[14] A. Lall, V. Sekar, M. Ogihara, J. Xu, and H. Zhang,
ability density function learning procedures over
Data streaming algorithms for estimating entropy
time-varying streaming data with applications to
of network traffic, in ACM SIGMETRICS Perfor-
pattern classification, IEEE transactions on cyber-
mance Evaluation Review, vol. 34, no. 1. ACM,
netics, vol 50, no. 4, pp. 1683-1696, 2020.
2006, pp. 145–156.
[15] C. Phua, V. Lee, K. Smith, and R. Gayler, A com- [26] E. Rafajlowicz, W. Rafajlowicz, Testing (non-
prehensive survey of data mining-based fraud de- ) linearity of distributed-parameter systems from a
tection research, arXiv preprint arXiv:1009.6119, video sequence, Asian Journal of Control, Vol. 12,
2010. no. 2, pp. 146–158, 2010.
[16] A. Dal Pozzolo, G. Boracchi, O. Caelen, C. Alippi, [27] E. Rafajlowicz, H. Pawlak-Kruczek, W. Rafajlow-
and G. Bontempi, Credit card fraud detection: A icz, Statistical Classifier with Ordered Decisions as
realistic modeling and a novel learning strategy, an Image Based Controller with Application to Gas
IEEE transactions on neural networks and learn- Burners , Springer, Lecture Notes in Artificial In-
ing systems, vol. 29, no. 8, p. 3784–3797, August telligence, vol. 8467, pp. 586–597, 2014.
2018.
[28] E. Rafajlowicz, W. Rafajlowicz, Iterative learning
[17] S. Disabato and M. Roveri, Learning convolu- in optimal control of linear dynamic processes , In-
tional neural networks in presence of concept drift, ternational Journal Of Control, vol. 91, no. 7, pp.
in 2019 International Joint Conference on Neural 1522–1540, 2018.
Networks (IJCNN), 2019, pp. 1–8.
[18] W. N. Street and Y. Kim, A streaming ensem- [29] P. Jurewicz, W. Rafajlowicz, J. Reiner, et al., Simu-
ble algorithm (sea) for large-scale classification, in lations for Tuning a Laser Power Control System of
Proceedings of the seventh ACM SIGKDD inter- the Cladding Process , Lecture Notes in Computer
national conference on Knowledge discovery and Science, vol. 9842, pp. 218–229, Springer, 2016.
data mining. ACM, 2001, pp. 377–382. [30] E. Rafajlowicz, W. Rafajlowicz, Iterative Learning
[19] N. C. Oza, Online bagging and boosting, in Sys- in Repetitive Optimal Control of Linear Dynamic
tems, man and cybernetics, 2005 IEEE interna- Processes , 15th International Conference on Arti-
tional conference on, vol. 3. IEEE, 2005, pp. 2340– ficial Intelligence and Soft Computing (ICAISC),
2345. 2016, Springer, vol. 9692, pp. 705–717, 2016.
A NOVEL DRIFT DETECTION ALGORITHM BASED ON . . . 297
[31] E. Rafajlowicz, W. Rafajlowicz, Control of linear [43] P. Domingos and G. Hulten, Mining high-speed
extended nD systems with minimized sensitivity data streams, in Proc. 6th ACM SIGKDD Internat.
to parameter uncertainties , Multidimensional Sys- Conf. on Knowledge Discovery and Data Mining,
tems And Signal Processing, vol. 24, no. 4, pp. 2000, pp. 71–80.
637–656, 2013.
[44] A. Bifet and R. Gavaldà, Adaptive learning from
[32] S. A. Ludwig, Applying a neural network ensemble evolving data streams, in International Symposium
to intrusion detection , Journal of Artificial Intelli- on Intelligent Data Analysis. Springer, 2009, pp.
gence and Soft Computing Research, vol. 9, no. 3, 249–260.
pp. 177–188, 2019. [45] E. S. Page, Continuous inspection schemes,
[33] H. Wang, W. Fan, P. S. Yu, and J. Han, Mining Biometrika, vol. 41, no. 1/2, pp. 100–115, 1954.
concept-drifting data streams using ensemble clas- [46] J. P. Barddal, H. M. Gomes, F. Enembreck, and
sifiers, in Proceedings of the ninth ACM SIGKDD B. Pfahringer, A survey on feature drift adapta-
international conference on Knowledge discovery tion: Definition, benchmark, challenges and future
and data mining. AcM, 2003, pp. 226–235. directions, Journal of Systems and Software, 07
2016.
[34] R. Polikar, L. Upda, S. S. Upda, and V. Honavar,
Learn++: An incremental learning algorithm for [47] H.-L. Nguyen, Y.-K. Woon, W.-K. Ng, and L. Wan,
supervised neural networks, IEEE transactions on Heterogeneous ensemble for feature drifts in data
systems, man, and cybernetics, part C (applications streams, in Pacific-Asia Conference on Knowledge
and reviews), vol. 31, no. 4, pp. 497–508, 2001. Discovery and Data Mining. Springer, 2012, pp. 1–
12.
[35] R. Elwell and R. Polikar, Incremental learning of
concept drift in nonstationary environments, IEEE [48] A. P. Cassidy and F. A. Deviney, Calculating fea-
Transactions on Neural Networks, vol. 22, no. 10, ture importance in data streams with concept drift
pp. 1517–1531, 2011. using online random forest, in 2014 IEEE Interna-
tional Conference on Big Data (Big Data). IEEE,
[36] A. Beygelzimer, S. Kale, and H. Luo, Optimal and 2014, pp. 23–28.
adaptive algorithms for online boosting, in Pro-
[49] R. Zhu, D. Zeng, and M. R. Kosorok, Reinforce-
ceedings of the 32nd International Conference on
ment learning trees, Journal of the American Sta-
Machine Learning (ICML-15), 2015, pp. 2323–
tistical Association, vol. 110, no. 512, pp. 1770–
2331.
1784, 2015.
[37] H. M. Gomes, J. P. Barddal, F. Enembreck, and [50] L. Yuan, B. Pfahringer, and J. P. Barddal, Iterative
A. Bifet, A survey on ensemble learning for data subset selection for feature drifting data streams, in
stream classification, ACM Computing Surveys Proceedings of the 33rd Annual ACM Symposium
(CSUR), vol. 50, no. 2, p. 23, 2017. on Applied Computing. ACM, 2018, pp. 510–517.
[38] B. Krawczyk, L. L. Minku, J. Gama, J. Ste- [51] L. C. Molina, L. Belanche, and À. Nebot, Feature
fanowski, and M. Wozniak, Ensemble learning for selection algorithms: A survey and experimental
data stream analysis: A survey, Information Fu- evaluation, in 2002 IEEE International Conference
sion, vol. 37, pp. 132–156, 2017. on Data Mining, 2002. Proceedings. IEEE, 2002,
pp. 306–313.
[39] L. Breiman, Random forests, Machine learning,
vol. 45, no. 1, pp. 5–32, 2001. [52] G. Ditzler, J. LaBarck, J. Ritchie, G. Rosen, and
R. Polikar, Extensions to online feature selection
[40] H. Abdulsalam, D. B. Skillicorn, and P. Martin, using bagging and boosting, IEEE Transactions on
Classifying evolving data streams using dynamic Neural Networks and Learning Systems, vol. 29,
streaming random forests, in International Confer- no. 9, pp. 4504–4509, 2018.
ence on Database and Expert Systems Applica-
tions. Springer, 2008, pp. 643–651. [53] J. P. Barddal, H. M. Gomes, F. Enembreck, and
B. Pfahringer, A survey on feature drift adapta-
[41] H. Abdulsalam, P. Martin, and D. Skillicorn, tion: Definition, benchmark, challenges and future
Streaming random forests, 2008. directions, Journal of Systems and Software, 07
2016.
[42] H. M. Gomes, A. Bifet, J. Read, J. P. Bard-
dal, F. Enembreck, B. Pfharinger, G. Holmes, and [54] J. Gama, P. Medas, G. Castillo, and P. Rodrigues,
T. Abdessalem, Adaptive random forests for evolv- Learning with drift detection, in Brazilian sympo-
ing data stream classification, Machine Learning, sium on artificial intelligence. Springer, 2004, pp.
vol. 106, no. 9-10, pp. 1469–1495, 2017. 286–295.
298 Piotr Duda, Krzysztof Przybyszewski, Lipo Wang
Piotr Duda received the M.Sc. degree Dr. Lipo Wang received the Bach-
in mathematics from the Department elor degree from National University
of Mathematics, Physics, and Chem- of Defense Technology (China) and
istry, University of Silesia, Katowice, Ph.D. from Louisiana State University
Poland, in 2009. He obtained the Ph.D. (USA). His research interest is arti-
degree and Sc.D. in computer science ficial intelligence/machine learning
from Częstochowa University of Tech- with applications to communications,
nology, Częstochowa, Poland in 2015 image/video processing, biomedical
and 2019, respectively. His current re- engineering, and data mining. He has
search interests include deep learning and data stream mining. authored 320 papers, of which 110 are in journals. He has au-
thored 2 monographs and edited 20 books. His work has been
Krzysztof Przybyszewski is a profes- cited 7,800 times in Google Scholar. He was/will be keynote
sor at the University of Social Sciences speaker for 40 international conferences. He was President
in Łódź. His adventure with applied of the Asia-Pacific Neural Network Assembly (APNNA) and
computer science began in the 1980s received the APNNA Excellent Service Award.
with a simulation of non-quantum col-
lective processes (the subject of a Ph.D.
dissertation). At present, he is involved
in research and applications of various
artificial intelligence technologies and
soft computing methods in selected IT problems (in particu-
lar, in expert systems supporting the management of educa-
tion quality in universities - the use of fuzzy numbers and
sets). As a deputy dean at the University of Social Sciences,
he is the designer and organizer of the on-Computer Science
Faculty education program. He is the author of over 80 pub-
lications in the field of computer science and IT applications.