Li 2011

This document discusses an oversampling method called Random-SMOTE (R-S) that is applied to imbalanced data mining tasks. R-S extends the SMOTE method by randomly generating new minority samples in the minority sample space rather than just between existing samples, in order to better handle sparse minority sample regions. The performance of R-S combined with a logit classifier is evaluated on five imbalanced datasets from a public repository.

Uploaded by

venket s

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views4 pages

Li 2011

Uploaded by

venket s

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

2011 Fourth International Conference on Business Intelligence and Financial Engineering

Application of Random-SMOTE on Imbalanced Data Mining

Jia Li, Hui Li *, Jun-Ling Yu

School of Economics and Management Zhejiang Normal University
Jinhua, Zhejiang,321004,China
E-mail: [email protected] (Hui Li)
* Corresponding author

Abstract: The performance of many classifiers based on classification. Explorations for new classifiers have become
balanced data sets can not do well in imbalanced data sets. a hot research topic in machine learning.
This article applied the over-sampling method of Random- Massive achievements in classifiers based on
SMOTE (R-S), which is based on SMOTE method, in imbalanced data sets have been presented with different
imbalanced data mining of several datasets. We used the R-S approaches. Sampling method focuses on the perspective of
method to increase the number of the minority samples data, which reconstructs the data set artificially to reduce the
randomly in the minority sample space. By this way, the degree of imbalance. Over-sampling is to increase the
number of the minority is improved to be almost equal to that number of the minority, but it may lead to over-fitting
of the majority in data mining tasks. Five UCI imbalanced
because of the duplication of data. While, under-sampling
data sets were balanced with the integrated data mining
process. Logit algorithm is used for classification with these
[4] is to cut down the number of the majority. But it may
data sets. The result shows that the integrated use of R-S and lose information of the majority and decrease the
logit in data mining improves the performance of the classifier performance of classification. The other method focuses on
significantly. the view of algorithm, which introduces certain mechanism
to compensate the imbalance and make it suitable for the
Keywords-Imbalaced Data set; Data Mining; Integrated use of classification on imbalanced data sets. Some examples
Random-SMOTE and logit include: cost sensitive study, revision of support vector
machines algorithm, and some ensemble methods. There are
I. INTRODUCTION many mechanisms in revising algorithms for imbalanced
data mining. For instance, the use of adjustment of cost
Traditional classification approaches are mainly based function, the use of different values of weight, the change of
on the following hypotheses: (1) taking classification probability density, and the trim of classification boundary.
precision as evaluation criteria; (2) the number of sample in Cost sensitive study algorithm uses the cost of each class to
each class is almost equal; (3) the cost of misclassify for make classification decision. Its target is to cut down the
each class is the same [ 1 ]. There are many classical overall cost instead of reducing the error rate as much as
classification algorithms based on these hypotheses, such as possible [5]. Support vector machine has been modified to
Bayes algorithms, neural network etc.. But in real word, the process imbalanced data sets. One simple modification is to
number of samples which come from different classes is not make a moderate skewing to the majority boundary. Thus,
always equal. Sometimes, they are even extremely different. there will be fewer samples in the minority class to be
Usually, more attention should be paid to the minority misclassified [6]. Another modification is to assign different
sample. Because the cost of misclassify of the minority costs for the majority and the minority [7]. Ensemble is to
sample is higher. The recognition of cheating in credit card combine many classifiers to form a new classifier and
transaction, the perdition of telecommunication equipment improve its performance. Through boost, many weak
failure, and the forecast of business failure are some classifiers can construct a strong one. AdaBoost is a
examples [2]. Another example is the use of imbalanced representative boost algorithm, which assigns iteration
data set for checking oil well erupts images by the satellite. weighting for training data set [8]. Based on this algorithm,
There are only 41 images with suspended oil out of 937 AdaCost changes the weight updating rule to cost sensitive
ones [3]. Obviously, people hope to establish an excellent one, and assigns bigger weight values for misclassified
classification model which can detect all the images with minority samples in order to get a lower misclassification
suspended oil precisely to avoid pollution. cost than AdaBoost [ 9 ]. Some scholars propose a
The information in the minority is insufficient compared SMOTEBoost algorithm, which combines boost and the
with that in the majority. It is easy to be overlapped by the sampling method of SMOTE. As artificial boost algorithm
information of the majority and lead to misclassification. As inclines to assign the minority with bigger weights, its effect
a result, the performance of the classifier based on balanced is almost the same as duplication of minority samples. It
data sets is far better than that based on the imbalanced ones. also has the problem of over-fitting. Therefore,
Therefore, the traditional classification approaches and their SMOTEBoost uses SMOTE algorithm to increase the new
evaluation criteria are not sitable for the imbalanced data minority samples [10].

130
DOI 10.1109/BIFE.2011.25
II. SMOTE IV. DATA SETS
SMOTE (Synthetic Minority Over-sampling Technique) In order to test the performance of R-S, we used five
algorithm is proposed by Chawla et al. [11]. This algorithm datasets from the UCI repository. These datasets vary
aims to increase the number of the minority artificially to extensively in their size and class portions. The purpose of
reduce the degree of imbalance. the experiment is to test classification performances with R-
The main idea in SMOTE is to insert man-made S. All the datasets are pre-processed with the following
hypothesized samples between a minority sample and its means. If the data set belongs to binary classification, the
near neighbors. For every sample in the minority, which is class with fewer samples is signed negative and the other
defined as x, search its k nearest neighbors and choose one class is signed positive. If the data set belongs to multi-
of them randomly as x1. Then a liner interpolation is classification, the most minority one of classes is signed
fulfilled randomly to produce a new sample called y. negative and all of the rest classes are signed positive. The
y = x + rand (0,1)×(x1-x) (1) datasets of Haberman and Pima belong to binary
The rand (0,1) means a random number between 0 and 1. classification dataset. Others are multi-classification dataset.
More minority samples can be produced by redoing the The description of the five datasets is shown in Table 1.
above step.
The hypothesized samples increased by SMOTE can TABLE I. EXPERIMENTIAL DATA SETS
maintain the primitive sample distribution. But they are
mainly in the interior. Seldom of them are in the edge.
Negative Number of Number of
III. RANDOM-SMOTE Data Set Kind of Positive Negative P/N
Marking sample (P) sample (N)
In imbalanced data sets, some minority samples are in the
dense area, and others are in the sparse area. This distribution
Glass 3 197 17 11.59
may not be changed with SMOTE, and it is still difficult to
identify the sample in the sparse area. The sampling method Haberman 2 225 81 2.78
of Random-SMOTE (R-S) [12] can balance the minority Pima 1 500 268 1.87
samples and the majority. And it can change the sparse Vowel 0 900 90 10
degree of minority samples. Thus, the performances of Wine 3 130 48 2.71
classifiers can be improved with the balance data set. In the table, P/N shows the proportion of the positive
The fundamental idea of R-S is described as follows. sample and negative sample. We can see the value of Glass
First, selecting one of the minority samples as the original and Vowel is quite big in term of the P/N, and their samples
sample which is marked as x. Two samples can be selected are absolutely sparse. The negative samples of others are
randomly from the rest samples which can be named x1 and relatively sparse.
x2.Then the three selected samples can make a triangle.
Second, generating N provisional samples on the line of x1 V. EXPRIMENTAL DESIGN
and x2 with the way of linear interpolation. And the new
samples are labeled yi. The last step is to generate N new A. Sample Pre-processing
samples named mi on the line of x and yi with the same way It needs to carry on the pre-processing on the samples to
which is described in the second step. eliminate the magnitude before using SMOTE. The method
Each minority sample should be made as original sample of multi-dimensional statistical analysis has the special
once. The new minority samples will be more than the request to the variables and the value of variables has
original sample sizes N times. So the minority samples are certain comprehensive nature. It is necessary to use some
balanced to the majority with R-S. methods to carry on standardized processing to the values.
The specific steps of R-S which produces the new The process can solve the comprehensive question of
minority sample is to be showed as follows. various values. This article has used Z-score method which
(1)The original sample x and two samples x1 and x2 are is defined as: the difference between each variable value and
selected randomly form the rest of minority samples to the mean value divided by the standard deviation. After the
make a triangle. pre-processing, the mean values of various variables are 0
(2) Implement the linear interpolation randomly between and the standard deviations are 1. Then the influence made
x1 and x2. Then N temporary samples named yi are produced by the dimension and the magnitude is eliminated.
and the range of i is from one to N.
yi= x1+rand(0,1)*( x2- x1) (2) B. RANDOM-SMOTE
(3)Implement the linear interpolation randomly between It needs to establish sampling percentage N with R-S.
yi and x. It will construct new minority samples named mi The sampling percentage N is defined in terms of
and the range of i is also from one to N. imbalanced degree.
mi=x+ rand(0,1)*( yi-x) (3) N=round(P/K)-1 (4)
The rand (0,1) expresses a random number within the P/N shows the proportion of the positive sample and
range from zero to 1. negative sample. The function round(x) means the value of X
with rounding off is rounded.

131
132
The new minority samples are more than the original TABLE III. ACCURACY RATE OF MINORITY SAMPLES
sample sizes N times with R-S. The new sampling method
R-S makes the minority sample of the five datasets be Glass Haberman Pima Vowel Wine
balanced to the majority.
Without 16.2% 13.26% 56.46% 72.2% 98.13%
C. Partition of Data Sets Sampling
Each empirical dataset is divided into training sample set Method
and testing sample set to calculate forecasting accuracy.
With 88.47% 64.46% 99.69% 99.95% 94.23%
Two-thirds of the samples are selected randomly as training R-S
samples to establish the Logit model. The rest samples are We should also inspect the forecast accuracy of the
used as testing samples to test performance of the model. positive sample and it equals to TP/(TP+ FN). So we can see
More specificity, two-thirds of the samples are chosen from whether the accuracy of the positive samples is influenced by
the negative samples after re-sampling. It is necessary to R-S or not. Meanwhile we should find whether the accuracy
select two-thirds of the samples from the positive samples. of the positive sample is improved. It is necessary to
Thus, the two parts form the training samples. Testing compare the forecasting accuracy of positive samples with
samples set is composed of the rest samples. This process is R-S and the results without the re-sampling method. The
carried on 100 times. Then the forecasting accuracy of comparisons are shown in Table 4.
negative samples is obtained by the Logit model and we will
calculate the mean value of accuracy in the end. TABLE IV. ACCURACY RATE OF MAJORITY SAMPLES
D. Evaluation Standard
Glass Haberman Pima Vowel Wine
The traditional sorter usually takes the error rate or the
accuracy as the evaluation criteria. The purpose of pursuing
Without 92.8% 95.75% 88.17% 98.08% 95.653%
is the high rate of accuracy. This kind of evaluation criteria Sampling
in imbalanced study of the question is obviously incorrect. Method
For example, consider a data set of binary classification,
where the percentage of positive samples accounts for 99%. With 80.94% 47.64% 100% 100% 100%
And the use of an ineffective sorter to record the data as the R-S
positive sample, its accuracy may achieve 99%. It is very From table 3, we can find that the predictive model
difficult to have a better sorter to obtain the accuracy higher performs very badly in recognizing negative samples
than 99% in the actual problem. Therefore it is suitable to without the sampling method. Especially, the recognition
use the hybrid matrix to evaluate the performance of sorter capability of the predicative model on negative samples in
which is used to the imbalanced data set. The hybrid matrix Glass and Haberman is extremely weak. The quantity of the
is as shown in Table 2. negative sample is increased with R-S. So the forecasting
accuracy of the predictive on negative samples is enhanced
TABLE II. HYBRID MATRIX greatly. And the difference on performance is significant.
Although the forecasting accuracy of the Haberman data set
Forecast Number of Forecast Number of
is still not very high, the performance of R-S in improving
Negative sample Positive sample predictive performance regarding the majority of data sets is
Actual Number of very effective.
Negative sample TN FP The negative samples of Glass and Vowel are absolutely
Actual Number of sparse, which has enlarged the recognition capability
Positive sample FN TP regarding the negative sample. From table 3, we can find
that the R-S can be an effective processing approach to the
TN is the number of forecasting the negative sample scarce samples and can enhance the forecasting accuracy of
correctly. FN is the number of forecasting the negative predictive model on the negative sample obviously.
sample wrongly. FP is the number of forecasting the From table 4, we can find that the recognition capability
positive sample wrongly. TP is the number of forecasting of the negative samples with R-S approach doesn’t have
the positive sample correctly. tremendous bad influence on recognizing positive samples.
It also has enhancement function on recognition of positive
VI. EXPERIMENTAL RESULT AND ANALYSIS samples. The recognition capability of the model on positive
This article mainly aims at measuring forecast accuracy samples has been improved in the datasets of Pima, Vowel,
of the minor sample. To study R-S’s sampling effect, we Wine, and the forecasting accuracy has achieved 100%.
inspect the forecast accuracy of negative samples, and it is Obviously, R-S is efficient for the imbalanced data mining.
equal to TN/(TN + FP). It is necessary to compare the Compared to the accuracy of positive samples, the
forecasting accuracy of negative samples with R-S and the accuracy of negative samples is very low without R-S,
results without the re-sampling method. The comparisons are which declares that Logit will produce bad performance on
as shown in Table 3. imbalanced data sets. When the data sets become balanced
with R-S approach, the performance of Logit has been

132
133
greatly improved, which indicates that the integrated use of 3358-3378.
R-S sampling method in data mining is a good way to [2] A.F. Atiya, Bankruptcy Prediction for Credit Risk Using Neural
imbalanced datasets. This integrated use provides an Network: A Survey and New Results. IEEE Trans on Neural Networks,
2001, 12 (4): 929-935.
effective solution to change the condition of relatively
[3] M. Kubat, R. Hohe, S. Matwin, Machine Learning for the Detection of
scarce and absolutely scarce of data. Oil Spills in Satellite Radar Images. Machine Learning, 1998, 30 (2/3):
195-215.
VII. CONCLUTION
[4] M. Kubat, S. Matwin, Addressing the Course of Imbalanced Training
We implemented an experiment with five datasets from Sets: One-sided Selection. ICML, 1997: 179-186.
UCI, and proved that the integrated use of R-S and logit in [5] C. Elkan, The Foundations of Cost-Sensitive Learning. Proceedings of
data mining tasks can improve predictive performance of the Seventeenth International Joint Conference on Artificial
the mining model. This approach increases the number of Intelligence (IJCAI’01), 2001, 973-978.
the minority randomly in the minority sample space and [6] B. Raskuttl, A. KowaleZyk, Extreme Re-Balancing for Svms: A case
Study. ACM SIGKDD Explorations, Newsletter,2004, 6(l): 60-69.
greatly improves the identity of the minority with no bad
[7] R. Akbani, S. KweK, N. JaPkowiez, Applying Support Vector Machines
effect on the recognition of the majority. 5 UCI imbalanced to Imbalanced Datasets. Lecture Notes in Computer Science,2004: 39-
data sets are balanced with this method. The results with the 50.
predictive model as Logit show that the integrated use of R- [8] Y. Freund, R.E. Schapire, A Decision Theoretic Generalization of on-
S method in data mining tasks can improve the performance Learning and an Application to Boosting. Journal of Computer and
of the classifier significantly. System Science,1997, 55(1): 119-139.
[9] W. Fan, S.J. Stolfo, J. Zhang, Adacost: Misclassification Cost-Sensitive
ACKNOWLEDGMENT Boosting. the 16th International Conference on Machine Learning
(ICML’99), 1999: 97-105.
This research is partially supported by the National
[10] N.V. Chawla, A. Lazarevic, L.O. Hall, Smoteboost: Improving
Natural Science Foundation of China (No. 70801055) and Prediction of the Minority Class in Boosting. Lecture Notes in
the Zhejiang Provincial Natural Science Foundation of Computer Science,2003: 107-119.
China (No. Y7100008). The authors gratefully thank [11]N.V. Chawla, K.W. Bowyer, L.O. Hall, W. Kegelmeyer, Smote:
anonymous referees for their useful comments and editors Synthetic Minority Over-Sampling Technique. Journal of Artificial
for their work. Intelligence Research, 2002, 16: 321-357.
[12]Y. Dong, Random-Smote for learning from imbalanced data sets.
REFERENCES Dalian University of Technology, 2009.
[1] Y. Sun, M. Kamela, A. Wong, et al, Cost-Sensitive Boosting for
Classification of Imbalanced Data. Pattern Recognilion, 2007, 40 (12):

133
134

Improving Imbalanced Learning Through A Heuristic Oversampling Method Based On K-Means and SMOTE
No ratings yet
Improving Imbalanced Learning Through A Heuristic Oversampling Method Based On K-Means and SMOTE
20 pages
A Comparative Study of SMOTE Borderline-SMOTE and ADASYN Oversampling Techniques Using Different Classifiers
No ratings yet
A Comparative Study of SMOTE Borderline-SMOTE and ADASYN Oversampling Techniques Using Different Classifiers
9 pages
A Three-Step Combination Strategy For Addressing Outliers and Class Imbalance in Software Defect Prediction
No ratings yet
A Three-Step Combination Strategy For Addressing Outliers and Class Imbalance in Software Defect Prediction
12 pages
Dialogue Writing
100% (5)
Dialogue Writing
12 pages
FULLTEXT01
No ratings yet
FULLTEXT01
42 pages
Handling Data Imbalance in Machine Learning
No ratings yet
Handling Data Imbalance in Machine Learning
51 pages
Admin, 1277
No ratings yet
Admin, 1277
21 pages
Imbalanced Data
No ratings yet
Imbalanced Data
54 pages
Under-Sampling Technique For Imbalanced Data Using Minimum Sum of Euclidean Distance in Principal Component Subset
No ratings yet
Under-Sampling Technique For Imbalanced Data Using Minimum Sum of Euclidean Distance in Principal Component Subset
14 pages
Advanced Series On Ocean Engineering V 9 Subrata Kumar Chakrabarti Offshore Structure Modeling WSPC 1994 PDF
0% (1)
Advanced Series On Ocean Engineering V 9 Subrata Kumar Chakrabarti Offshore Structure Modeling WSPC 1994 PDF
494 pages
JPSP - 2022 - 383
No ratings yet
JPSP - 2022 - 383
12 pages
Axioms 11 00607 v2
No ratings yet
Axioms 11 00607 v2
19 pages
Classifying Imbalanced Data Sets Using Similarity Based Hierarchical Decomposition
No ratings yet
Classifying Imbalanced Data Sets Using Similarity Based Hierarchical Decomposition
16 pages
Author Final Version
No ratings yet
Author Final Version
11 pages
Model Optimisation of Class Imbalanced Learning Using Ensemble Classifier On Over-Sampling Data
No ratings yet
Model Optimisation of Class Imbalanced Learning Using Ensemble Classifier On Over-Sampling Data
8 pages
Paper 6 - 240417 - 184500 OCR
No ratings yet
Paper 6 - 240417 - 184500 OCR
11 pages
International Conference On Information and Communications Technology
No ratings yet
International Conference On Information and Communications Technology
5 pages
702 1974 1 PB
No ratings yet
702 1974 1 PB
9 pages
Springer Format Data Imbalance Paper - Docm
No ratings yet
Springer Format Data Imbalance Paper - Docm
8 pages
Imbalanced Data Classification Method Based On LSSASMOTE
No ratings yet
Imbalanced Data Classification Method Based On LSSASMOTE
9 pages
Kumar 2021 IOP Conf. Ser. Mater. Sci. Eng. 1099 012077
No ratings yet
Kumar 2021 IOP Conf. Ser. Mater. Sci. Eng. 1099 012077
9 pages
11-A-SMOTE A New Preprocessing Approach For Highly Im
No ratings yet
11-A-SMOTE A New Preprocessing Approach For Highly Im
11 pages
A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) For Handling Class Imbalance
No ratings yet
A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) For Handling Class Imbalance
33 pages
MK-SMOTE and M-SMOTE: Enhanced Techniques For Handling Class Imbalance Problem
No ratings yet
MK-SMOTE and M-SMOTE: Enhanced Techniques For Handling Class Imbalance Problem
19 pages
Navo Minority Over-Sampling Technique (Nmote) : A Consistent Performance Booster On Imbalanced Datasets
No ratings yet
Navo Minority Over-Sampling Technique (Nmote) : A Consistent Performance Booster On Imbalanced Datasets
42 pages
Predictive Accuracy: A Misleading Performance Measure For Highly Imbalanced Data
No ratings yet
Predictive Accuracy: A Misleading Performance Measure For Highly Imbalanced Data
12 pages
I D L A R: Mbalanced ATA Earning Pproaches Eview
No ratings yet
I D L A R: Mbalanced ATA Earning Pproaches Eview
19 pages
d2c0 PDF
No ratings yet
d2c0 PDF
6 pages
Performance Evaluation of Class Balancing
No ratings yet
Performance Evaluation of Class Balancing
6 pages
An Improved Algorithm For Imbalanced Data and Small Sample Size Classification
No ratings yet
An Improved Algorithm For Imbalanced Data and Small Sample Size Classification
7 pages
Warehousing and Stock Control
No ratings yet
Warehousing and Stock Control
37 pages
A Systematic Review On Imbalanced Data Challenges in Machine Learning: Applications and Solutions
100% (1)
A Systematic Review On Imbalanced Data Challenges in Machine Learning: Applications and Solutions
36 pages
To SMOTE, or Not To SMOTE?
No ratings yet
To SMOTE, or Not To SMOTE?
10 pages
Random and Synthetic Over Sampling Approach To Resolve Data 2zu79c47m6
No ratings yet
Random and Synthetic Over Sampling Approach To Resolve Data 2zu79c47m6
9 pages
An Insight Into Classification With Imbalanced Data
No ratings yet
An Insight Into Classification With Imbalanced Data
29 pages
An Empirical Comparison and Evaluation of Minority Oversampling
No ratings yet
An Empirical Comparison and Evaluation of Minority Oversampling
13 pages
Using Bayes Minimum Risk To Improve Imbalanced Learning Under High-Dimensionality Difficulties
No ratings yet
Using Bayes Minimum Risk To Improve Imbalanced Learning Under High-Dimensionality Difficulties
38 pages
Catboost ET Comparaison
No ratings yet
Catboost ET Comparaison
20 pages
Oligois: Scalable Instance Selection For Class-Imbalanced Data Sets
No ratings yet
Oligois: Scalable Instance Selection For Class-Imbalanced Data Sets
15 pages
Machine Learning With Oversampling and Undersampling Techniques Overview Study and Experimental Results
No ratings yet
Machine Learning With Oversampling and Undersampling Techniques Overview Study and Experimental Results
6 pages
2515-Article Text-14337-4-10-20230331
No ratings yet
2515-Article Text-14337-4-10-20230331
12 pages
ADASYN: Adaptive Synthetic Sampling Approach For Imbalanced Learning
No ratings yet
ADASYN: Adaptive Synthetic Sampling Approach For Imbalanced Learning
7 pages
Analysis of Imbalanced Classification Algorithms A Perspective View
No ratings yet
Analysis of Imbalanced Classification Algorithms A Perspective View
5 pages
An Overview of Classification Algorithms For Imbalanced Datasets
No ratings yet
An Overview of Classification Algorithms For Imbalanced Datasets
7 pages
COMP 1842 Coursework Term 3
No ratings yet
COMP 1842 Coursework Term 3
9 pages
MSMOTE Improving Classification Performance When Training Data Is Imbalanced
No ratings yet
MSMOTE Improving Classification Performance When Training Data Is Imbalanced
5 pages
5 Techniques To Handle Imbalanced Data For A Classification Problem
No ratings yet
5 Techniques To Handle Imbalanced Data For A Classification Problem
7 pages
FAST - A ROC-Based Feature Selection Metric For Small Samples and Imbalanced Data Classification Problems (2008)
No ratings yet
FAST - A ROC-Based Feature Selection Metric For Small Samples and Imbalanced Data Classification Problems (2008)
9 pages
A Survey On Oversampling Techniques For Imbalanced Learning
No ratings yet
A Survey On Oversampling Techniques For Imbalanced Learning
6 pages
IMECS2010 pp513-517
No ratings yet
IMECS2010 pp513-517
5 pages
11192-Article (PDF) - 20731-1-10-20180420
No ratings yet
11192-Article (PDF) - 20731-1-10-20180420
43 pages
Eng2 12298 PDF
No ratings yet
Eng2 12298 PDF
24 pages
Clustering Based Undersampling For Handling Class Imbalance in C4.5 Classification Algorithm
No ratings yet
Clustering Based Undersampling For Handling Class Imbalance in C4.5 Classification Algorithm
7 pages
Over-Sampling Algorithm For Imbalanced Data Classification: XU Xiaolong, Chen Wen, and SUN Yanfei
No ratings yet
Over-Sampling Algorithm For Imbalanced Data Classification: XU Xiaolong, Chen Wen, and SUN Yanfei
10 pages
Addressing Imbalance Problem in The Class - A Survey
No ratings yet
Addressing Imbalance Problem in The Class - A Survey
5 pages
18 1 Variation and Natural Selection Theory Questions and Answers
100% (1)
18 1 Variation and Natural Selection Theory Questions and Answers
55 pages
Untitled
No ratings yet
Untitled
48 pages
Class Imbalance Problem in Data Mining: Review
No ratings yet
Class Imbalance Problem in Data Mining: Review
5 pages
Enhancing Classification Performance of Multi-Class Imbalanced Data Using The OAA-DB Algorithm
No ratings yet
Enhancing Classification Performance of Multi-Class Imbalanced Data Using The OAA-DB Algorithm
8 pages
Knowledge-Based Systems: Michał Koziarski Michał Woźniak Bartosz Krawczyk
No ratings yet
Knowledge-Based Systems: Michał Koziarski Michał Woźniak Bartosz Krawczyk
16 pages
Class Imbalance Notes
No ratings yet
Class Imbalance Notes
6 pages
Expert Systems With Applications: Georgios Douzas, Fernando Bacao
No ratings yet
Expert Systems With Applications: Georgios Douzas, Fernando Bacao
8 pages
PX - 120 - 01 - e Manual Casio Privia Px120
No ratings yet
PX - 120 - 01 - e Manual Casio Privia Px120
38 pages
Apollo Vs MRF: An Analysis of The Indian Tyre Industry
No ratings yet
Apollo Vs MRF: An Analysis of The Indian Tyre Industry
17 pages
Imbalanced Data: How To Handle Imbalanced Classification Problems
No ratings yet
Imbalanced Data: How To Handle Imbalanced Classification Problems
17 pages
Agricultural Land Preparation: A. Description
100% (2)
Agricultural Land Preparation: A. Description
47 pages
Install Help
No ratings yet
Install Help
318 pages
Who Is The Holy Spirit
No ratings yet
Who Is The Holy Spirit
4 pages
Sucker by Carson Mccullers Sep
No ratings yet
Sucker by Carson Mccullers Sep
4 pages
Broker's Title, Inc. v. Ralph E. Main, JR., Robert H. Blodinger, Orbin F. Carter, 806 F.2d 257, 4th Cir. (1986)
No ratings yet
Broker's Title, Inc. v. Ralph E. Main, JR., Robert H. Blodinger, Orbin F. Carter, 806 F.2d 257, 4th Cir. (1986)
2 pages
Sail Application
No ratings yet
Sail Application
3 pages
Kinnevik Broker Report Apr-15
No ratings yet
Kinnevik Broker Report Apr-15
118 pages
Pharmaceutical Analysis 1
No ratings yet
Pharmaceutical Analysis 1
5 pages
2025 1 B
No ratings yet
2025 1 B
7 pages
Unit 4 Grammar Summary
No ratings yet
Unit 4 Grammar Summary
14 pages
GCC Agro Investments in Sub Saharan Africa March 2015
No ratings yet
GCC Agro Investments in Sub Saharan Africa March 2015
52 pages
Mygov 1727777650123364621
No ratings yet
Mygov 1727777650123364621
2 pages
Level 7 Hfle
No ratings yet
Level 7 Hfle
21 pages
Documentation Report - Ammungan Festival 2019
No ratings yet
Documentation Report - Ammungan Festival 2019
12 pages
Quimpo Vs Mendoza - Digest
100% (1)
Quimpo Vs Mendoza - Digest
1 page
Tourism Industries in Assam Agriculture Economy Geography
No ratings yet
Tourism Industries in Assam Agriculture Economy Geography
6 pages
Djing
No ratings yet
Djing
8 pages
Jee Result
No ratings yet
Jee Result
1 page
Daftar Pustaka
No ratings yet
Daftar Pustaka
6 pages
Action Plan in Reading
No ratings yet
Action Plan in Reading
2 pages
Saiva Siddhanta Church Act, No 22 of 1988
No ratings yet
Saiva Siddhanta Church Act, No 22 of 1988
2 pages
Midterm GE 2 Ethics Week 11 Module 9
No ratings yet
Midterm GE 2 Ethics Week 11 Module 9
7 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet

Li 2011

Uploaded by

Li 2011

Uploaded by

2011 Fourth International Conference on Business Intelligence and Financial Engineering

Application of Random-SMOTE on Imbalanced Data Mining

Jia Li, Hui Li *, Jun-Ling Yu

978-0-7695-4527-1/11 $26.00 © 2011 IEEE 131

You might also like