Frequent Pattern For Classification

This document proposes a framework for discriminative frequent pattern analysis for effective classification. It discusses: 1) Why frequent patterns are useful features for classification as they capture underlying data semantics better than single features. 2) How frequent pattern-based classification can achieve high scalability and accuracy by leveraging efficient frequent pattern mining algorithms. 3) A strategy for setting the minimum support threshold for frequent pattern mining based on analyzing the relationship between pattern frequency and predictive power. 4) An effective feature selection algorithm to select discriminative frequent patterns for building high-quality classifiers.

Uploaded by

Ivan Fadillah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views10 pages

Frequent Pattern For Classification

Uploaded by

Ivan Fadillah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Discriminative Frequent Pattern Analysis for Effective Classification∗

Hong Cheng† Xifeng Yan‡ Jiawei Han† Chih-Wei Hsu†

†
University of Illinois at Urbana-Champaign
‡
IBM T. J. Watson Research Center
{hcheng3, hanj, chsu}@cs.uiuc.edu, [email protected]

Abstract broad applications in areas like association rule min-

ing, indexing, and clustering [1, 23, 20]. The applica-
The application of frequent patterns in classification tion of frequent patterns in classification also achieved
appeared in sporadic studies and achieved initial suc- some success in the classification of relational data
cess in the classification of relational data, text docu- [14, 13, 25, 6, 19], text [15], and graphs [7].
ments and graphs. In this paper, we conduct a system- Frequent patterns reflect strong associations be-
atic exploration of frequent pattern-based classification, tween items and carry the underlying semantics of the
and provide solid reasons supporting this methodol- data. They are potentially useful features for classi-
ogy. It was well known that feature combinations (pat- fication. In this paper, we investigate systematically
terns) could capture more underlying semantics than the framework of frequent pattern-based classification,
single features. However, inclusion of infrequent pat- where a classification model is built in the feature space
terns may not significantly improve the accuracy due of single features as well as frequent patterns. The
to their limited predictive power. By building a connec- idea of frequent pattern-based classification has been
tion between pattern frequency and discriminative mea- exploited by previous studies in different domains, in-
sures such as information gain and Fisher score, we cluding: (1) associative classification [14, 13, 25, 6, 19],
develop a strategy to set minimum support in frequent where association rules are generated and analyzed for
pattern mining for generating useful patterns. Based classification; and (2) graph classification [7], text cat-
on this strategy, coupled with a proposed feature se- egorization [15] and protein classification [12], where
lection algorithm, discriminative frequent patterns can subgraphs, phrases, or substrings are used as features.
be generated for building high quality classifiers. We All these related studies demonstrate, to some ex-
demonstrate that the frequent pattern-based classifica- tent, the usefulness of frequent patterns in classifica-
tion framework can achieve good scalability and high tion. Although it is known that frequent patterns are
accuracy in classifying large datasets. Empirical stud- useful, there is a lack of theoretical analysis on their
ies indicate that significant improvement in classifica- principles in classification. The following critical ques-
tion accuracy is achieved (up to 12% in UCI datasets) tions remain unexplored.
using the so-selected discriminative frequent patterns. • Why are frequent patterns useful for classification?
Why do frequent patterns provide a good substi-
tute for the complete pattern set?
• How does frequent pattern-based classification
1. Introduction achieve both high scalability and accuracy for the
classification of large datasets?
Frequent pattern mining has been a focused theme
in data mining research with a large number of scal- • What is the strategy for setting the minimum sup-
able methods proposed for mining various kinds of pat- port threshold?
terns including itemsets [2, 10, 27], sequences [3, 16, 26] • Given a set of frequent patterns, how should we
and graphs [11, 22]. Frequent patterns have found select high quality ones for effective classification?
∗ The work was supported in part by the U.S. National Science In this paper, we will systematically answer the above
Foundation NSF IIS-05-13678/06-42771 and NSF BDI-05-15813. questions.
Feature combinations are shown to be useful for classification. By analyzing the relationship be-
classification by mapping data to a higher dimensional tween pattern frequency and its predictive power,
space. For example, word phrases can improve the ac- we demonstrate that frequent patterns provide
curacy of document classification. Given a categori- high quality features for classification.
cal dataset D with n features, we can explicitly enu-
merate all (2n ) feature combinations and use them in • Frequent pattern-based classification could exploit
classification. However, there are two significant draw- the state-of-the-art frequent pattern mining algo-
backs for this approach. First, since the number of rithms for feature generation, thus achieving much
feature combinations is exponential to the number of better scalability than the method of enumerating
single features, in many cases, it is computationally in- all feature combinations.
tractable to enumerate them when the number of sin-
gle features is large (the scalability issue). Second, in- • We establish a formal connection between our
clusion of combined features that appear rarely could framework with an information gain-based fea-
decrease the classification accuracy due to the “overfit- ture selection approach and show that the min sup
ting” issue—features are not representative. The first threshold is equivalent to an information gain
problem can be partially solved by the kernel tricks threshold at filtering low quality features. Such an
which derive a subset of combined features based on analysis suggests a strategy for setting min sup.
parameter tuning. However, the kernel approach re-
quires an intensive search for good parameters to avoid • An effective and efficient feature selection algo-
overfitting. rithm is proposed to select a set of frequent and
Through analysis, we found that the discriminative discriminative patterns for classification.
power of a low-support feature is bounded by a low
The rest of the paper is organized as follows. Sec-
value due to its limited coverage in the dataset; hence
tion 2 gives the problem formulation. In Section 3, we
the contribution of low-support features in classifica-
provide a framework for frequent pattern-based classi-
tion is limited, which justifies the usage of frequent
fication. We study the usefulness of frequent patterns,
patterns in classification. Furthermore, existing fre-
figure out a connection between support and feature fil-
quent pattern mining algorithms can facilitate the pat-
tering measures, discuss the minimum support setting
tern generation, thus solving the scalability issue in the
strategy and propose a feature selection algorithm. Ex-
classification of large datasets.
tensive experimental results are presented in Section 4,
As to the minimum support (denoted as min sup)
and related work is discussed in Section 5, followed by
threshold setting in frequent pattern mining, a map-
conclusions in Section 6.
ping is built between support threshold and discrim-
inative measures such as information gain and Fisher
score, so that features filtered by an information gain 2 Problem Formulation
threshold cannot exceed the corresponding min sup
threshold either. This result can be used to set min sup Assume a dataset has k categorical attributes, where
for generating useful patterns. each attribute has a set of values, and m classes C =
Since frequent patterns are generated solely based {c1 , ..., cm }. Each (attribute, value) pair is mapped to
on frequency without considering the predictive power, a distinct item in I = {o1 , ..., od }. Assume a pair
the use of frequent patterns without feature selection (att, val) → oi , where att is an attribute and val is
will still result in a huge feature space. This might not a value. Let x be the feature vector of a data point
only slow down the model learning process, but even s. Then xi = 1 if att(s) = val; xi = 0 if att(s) 6= val.
worse, the classification accuracy deteriorates (another For numerical attributes, the continuous values are dis-
kind of overfitting issue—features are too many). In cretized first. Following the mapping, the dataset is
this paper, we demonstrate that feature selection is represented in Bd as D = {x i , yi }ni=1 , where x i ∈ Bd
necessary to single out a small set of discriminative and yi ∈ C. xij ∈ B = {0, 1}, ∀i ∈ [1, n], j ∈ [1, d].
frequent patterns, which is essential for high quality
classifiers. Coupled with feature selection, frequent Definition 1 (Combined Feature) A combined
pattern-based classification is able to solve the scalabil- feature α = {oα1 ...oαk } is a subset of I, where
ity issue and the overfitting issue smoothly and achieve oαi ∈ {o1 , ..., od }, ∀1 ≤ i ≤ k. oi ∈ I is
excellent classification accuracy. a single feature. Given a dataset D = {xi },
In summary, our contributions include the set of data that contains α is denoted as
• We propose a framework of frequent pattern-based Dα = {xi |xiαj = 1, ∀oαj ∈ α}.
(a) Austral (b) Breast (c) Sonar

Figure 1. Information Gain vs. Pattern Length on UCI data

Definition 2 (Frequent Combined Feature) For 3.1 Why Are Frequent Patterns Good Features?
a dataset D, a combined feature α is frequent if
θ = |D α| |Dα |
|D| ≥ θ0 , where θ = |D| is the relative support Frequent patterns have two properties: (1) each pat-
of α, and θ0 is the min sup threshold, 0 ≤ θ0 ≤ 1. The tern is a combination of single features, and (2) they are
set of frequent combined features is denoted as F. frequent. We will analyze these properties and explain
why frequent patterns are useful for classification.

Given a dataset D = {x i , yi }ni=1 and a set of fre-

3.1.1 The Usefulness of Combined Features
quent patterns F, D is mapped into a higher dimen-
0
sional space Bd with d0 features in I ∪ F. The data is Frequent pattern is a form of non-linear feature com-
0
denoted as D0 = {x 0i , yi }ni=1 , where x 0i ∈ Bd . Notice bination over the set of single features. With inclu-
that F is parameterized with min sup θ0 . sion of non-linear feature combinations, the expressive
power of the new feature space increases. The “Ex-
Frequent Pattern-Based Classification is learning a
clusive OR” is an example where the data is linearly
classification model in the feature space of single fea-
separable in B3 = (x, y, xy), but not so in the original
tures as well as frequent patterns, where frequent pat-
space B2 = (x, y). Non-linear mapping is widely used,
terns are generated w.r.t. min sup.
e.g., string kernel [15, 12] for text or biosequence clas-
sification. In frequent pattern-based classification, the
single feature vector x is explicitly transformed from
3 The Framework of Frequent Pattern- 0
the space Bd where d = |I| to a larger space Bd where
based Classification d0 = |I ∪ F|. This will likely increase the chance of in-
cluding important features.
In addition, the discriminative power of some fre-
In this section, we examine the framework of fre-
quent patterns is higher than that of single features
quent pattern-based classification which includes three
because they capture more underlying semantics of the
steps: (1) feature generation, (2) feature selection, and
data. We retrieved three UCI datasets and plotted in-
(3) model learning.
formation gain [17] of both single features and frequent
In the feature generation step, frequent patterns are patterns in Figure 1. It is clear that some frequent pat-
generated with a user-specified min sup. The data is terns have higher information gain than single features.
partitioned according to the class label. Frequent pat-
terns are discovered in each partition with min sup.
3.1.2 Discriminative Power versus Pattern
The collection of frequent patterns F is the feature
Frequency
candidates. In the second step, feature selection is ap-
plied on F. The set of selected features is Fs . Given In this subsection, we study the relationship between
0
Fs , the dataset D is transformed to D0 in Bd . The the discriminative power of a feature and its support
feature space includes all the single features as well as and demonstrate that the discriminative power of low-
the selected frequent patterns. Finally, a classification support features is limited. In addition, they could
model is built on the dataset D0 . harm the classification accuracy due to overfitting.
First, a classification model which uses frequent fea- The partial derivative of Hlb (C|X)|q=1 w.r.t. θ is
tures for induction has statistical significance, thus gen-
eralizes well to the test data. If an infrequent feature ∂Hlb (C|X)|q=1 p−θ p−1 1−p
= log − −
is used, the model cannot generalize well to the test ∂θ 1−θ 1−θ 1−θ
data since it is built based on statistically minor obser- p−θ
vations. This is referred to as overfitting. = log
1−θ
Second, the discriminative power of a pattern is ≤ log 1
closely related to its support. Take information gain ≤ 0
as an example. For a pattern α represented by a ran-
dom variable X, the information gain is The above analysis demonstrates that the informa-
tion gain upper bound IGub (C|X) is a function of
IG(C|X) = H(C) − H(C|X) (1)
support θ. Hlb (C|X)|q=1 is monotonically decreasing
where H(C) is the entropy and H(C|X) is the con- with θ, i.e., the smaller θ is, the larger Hlb (C|X), and
ditional entropy. Given a dataset with a fixed class the smaller IGub (C|X). When θ is small, IGub (C|X)
distribution, H(C) is a constant. The upper bound of is small. Therefore, the discriminative power of low-
the information gain, IGub , is frequency patterns is bounded by a small value. For
the symmetric case θ ≥ p, a similar conclusion could
IGub (C|X) = H(C) − Hlb (C|X) (2) be drawn: The discriminative power of very high-
frequency patterns is bounded by a small value, ac-
where Hlb (C|X) is the lower bound of H(C|X). As- cording to the similar rationale.
sume the support of α is θ, we will show in the fol- To support the analysis above, we depict empirical
lowing that, IGub (C|X) is closely related to θ. When results on three UCI datasets in Figure 2. The x axis
θ is small, IGub (C|X) is low. That is, the infrequent represents the (absolute) support of a pattern and the
features have a very low information gain upper bound. y axis represents the information gain. We can clearly
To simplify the analysis, assume X ∈ {0, 1} and see that the information gain of a low-support pattern
C = {0, 1}. Let P (x = 1) = θ, P (c = 1) = p and is bounded by a small value. In addition, for each abso-
P (c = 1|x = 1) = q. Then lute support, we also plot the theoretical upper bound
IGub (C|X)|q=1 if θ ≤ p or IGub (C|X)|q=p/θ if θ > p,
X X given the fixed p = P (c = 1) from the real dataset.
H(C|X) = − P (x) P (c|x) log P (c|x) We can see that the upper bound of information gain
x∈{0,1} c∈{0,1}
at very low support (and very high support) is small,
= −θq log q − θ(1 − q) log(1 − q) which confirms our analysis. For example, for a sup-
p − θq port count of 31 (i.e., θ = 5%) in Figure 2 (a), the
+ (θq − p) log
1−θ information gain upper bound is as low as 0.06.
(1 − p) − θ(1 − q) Another interesting observation is, at a medium
+ (θ(1 − q) − (1 − p)) log large support (e.g., support = 300 in Figure 2 (a))
1−θ
where the upper bound reaches the maximum possi-
ble value IGub = H(C), there is a big margin between
H(C|X) is a function of p, q and θ. Given a dataset, the information gain of frequent patterns and the upper
p is a fixed value. As H(C|X) is a concave function, bound. However, it does not necessarily demonstrate
it reaches its lower bound w.r.t. q, for fixed p and θ that frequent patterns cannot have very high discrimi-
at the following conditions. If θ ≤ p, H(C|X) reaches native power. As a matter of fact, the set of available
its lower bound when q = 0 or 1. If θ > p, H(C|X) frequent patterns and their predictive power is closely
reaches its lower bound when q = p/θ or 1 − (1 − p)/θ. related to the dataset and the class distribution.
The cases of θ ≤ p and θ ≥ p are symmetric. Due to Besides information gain, Fisher score [8] is also
space limit, we only discuss the case when θ ≤ p and popularly used to measure the discriminative power of
the analysis for the other is similar. a feature. We analyze the relationship between Fisher
Since q = 0 and q = 1 are symmetric for the case score and pattern support. Fisher score is defined as
θ ≤ p, we only discuss the case q = 1. In that case, the Pc
lower bound Hlb (C|X) is n (µ − µ)2
F r = i=1 Pc i i 2 (4)
p−θ p−θ 1−p 1−p i=1 ni σi
Hlb (C|X)|q=1 = (θ−1)( log + log ) where ni is the number of data samples in class i, µi is
1−θ 1−θ 1−θ 1−θ
(3) the average feature value in class i, σi is the standard
1

0.9 InfoGain
IG_UpperBnd
0.8

0.7

Information Gain
0.6

0.5

0.4

0.3

0.2

0.1

0
0 100 200 300 400 500 600 700
Support

(a) Austral (b) Breast (c) Sonar

Figure 2. Information Gain and the Theoretical Upper Bound vs. Support on UCI data

3.5

FisherScore
3 FS_UpperBnd

2.5
Fisher Score

1.5

0.5

0
0 100 200 300 400 500 600 700
Support

(a) Austral (b) Breast (c) Sonar

Figure 3. Fisher Score and the Theoretical Upper Bound vs. Support on UCI data

deviation of the feature value in class i, and µ is the According to Eq. (6), as θ increases, F rub|q=1 in-
average feature value in the whole dataset. creases monotonically, for a fixed p. For θ ≤ p, Fisher
We use the notation of p, q and θ as defined before score upper bound of a low-frequency pattern is smaller
and assume we only have two classes. Assume θ ≤ p than that of a high-frequency one. Note, as θ increases,
(the analysis for θ > p is symmetric), then F r is, F rub|q=1 will have a very large value. When θ → p,
F rub|q=1 → ∞.
θ(p − q)2 Another interesting evidence to show the relation-
Fr = (5)
p(1 − p)(1 − θ) − θ(p − q)2 ship between F r and θ is the sign of ∂F r
∂θ . For Eq. (5),
the partial derivative of F r w.r.t. θ is
In Eq. (5), let Y = p(1−p)(1−θ) and Z = θ(p−q)2 .
Then Y ≥ 0 and Z ≥ 0. If Y = 0, we can verify that
Z = 0 too. Then F r is undefined in Eq. (5). In this ∂F r (p − q)2 p(1 − p)
= ≥0 (7)
case, F r = 0 according to Eq. (4). For the case when ∂θ (p − p2 − θq 2 − θp + 2θpq)2
Y > 0 and Z ≥ 0, Eq. (5) is equivalent to
The inequality holds because p ∈ [0, 1]. Therefore,
Z when θ ≤ p, F r monotonically increases with θ, for
Fr =
Y −Z fixed p and q. The result shows that, Fisher score of
For fixed p and θ, Y is a positive constant. Then F r a high-frequency feature is larger than that of a low-
monotonically increases with Z = θ(p − q)2 . Assume frequency one, if p and q are fixed.
p ∈ (0, 0.5] (p ∈ [0.5, 1) is symmetric), then when q = 1, Figure 3 shows the Fisher score of each pattern vs.
F r reaches its maximum value w.r.t. q, for fixed p and its (absolute) support. We also plot the Fisher score
θ. We denote this maximum value as F rub . Put q = 1 upper bound F rub w.r.t. support. As mentioned above,
into Eq. (5), we have for θ ≤ p, as θ increases, F rub will have very large val-
ues. F rub → ∞ as θ approaches p. Hence, we only
θ(1 − p) plot a portion of the curve which shows the trend very
F rub|q=1 = (6) clearly. The result is similar to Figure 2. These empir-
p−θ
ical results demonstrate that, features of low support 3.2 The Minimum Support Effect
have very limited discriminative power, which is due to
their limited coverage in the dataset. Features of very Since the set of frequent patterns F is generated
high support have very limited discriminative power according to min sup, we study the impact of min sup
too, which is due to their commonness in the data. on the classification accuracy and propose a strategy
to set min sup.
3.1.3 The Justification of Frequent Pattern- If min sup is set with a large value, the patterns
Based Classification in F correspond to very frequent ones. In the con-
text of classification, they may not be the best feature
Based on the above analysis, we will demonstrate that candidates, since they appear in a large portion of the
the frequent pattern-based classification is a scalable dataset, in different classes. We can clearly observe in
and effective methodology. The justification is done by Figures 2 and 3 that at a very large min sup value, the
building a connection between a well-established infor- theoretical upper bound decreases, due to the “over-
mation gain-based feature selection approach and our whelming” occurrences of the high-support patterns.
frequent pattern-based method. This is analogous to the stop word in text retrieval
Assume the problem context is using combined fea- where those highly frequent words are removed before
tures for classification. In a commonly used feature se- document retrieval or text categorization.
lection approach, assume all feature combinations are As min sup lowers down, it is expected that the
generated as feature candidates. A subset of high qual- trend of classification accuracy increases, as more dis-
ity features are selected for classification, with an in- criminative patterns with medium frequency are dis-
formation gain threshold IG0 (or a Fisher score thresh- covered. However, as min sup decreases to a very low
old). According to the analysis in Section 3.1.2, one can value, the classification accuracy stops increasing, or
always find a min sup threshold θ∗ , which satisfies: even starts dropping due to overfitting. As we ana-
lyzed in Section 3.1, features with low support have
θ∗ = arg max (IGub (θ) ≤ IG0 ) (8) low discriminative power. They could even harm the
θ
classification accuracy if they are included for classifi-
where IGub (θ) is the information gain upper bound cation, due to the overfitting effect. In addition, the
at support θ. That is, θ∗ is the maximum support costs of time and space at both the frequent pattern
threshold where the information gain upper bound at mining and the feature selection step become very high
this point is no greater than IG0 . with a low min sup.
The feature selection approach filters all the com- We propose a strategy to set min sup, the major
bined features whose information gain is less than IG0 ; steps of which are outlined below.
accordingly, in the frequent pattern-based method, fea-
tures with support θ ≤ θ∗ can be safely skipped be- • Compute the theoretical information gain (or
cause IG(θ) ≤ IGub (θ) ≤ IGub (θ∗ ) ≤ IG0 . Compared Fisher score) upper bound as a function of sup-
with the information gain-based approach, it is equiva- port θ;
lent to generate the feature with min sup = θ∗ , then ap- • Choose an information gain threshold IG0 for fea-
ply feature selection on the frequent patterns only. The ture filtering purpose;
latter is our frequent pattern-based approach. Since
the number of all the feature combinations is usu- • Find θ∗ = arg maxθ (IGub (θ) ≤ IG0 );
ally very large, the enumeration and feature selection
• Mine frequent patterns with min sup = θ∗ .
over such a huge feature space is computationally in-
tractable. In contrast, frequent pattern-based method First, compute the theoretical information gain up-
achieves the same result but in a much more efficient per bound as a function of support θ. This only in-
way. Obviously it can benefit from the state-of-the-art volves with the class distribution p, without generat-
frequent pattern mining algorithms. The choice of the ing frequent patterns. Then decide an information gain
information gain threshold IG0 in the first approach threshold IG0 and find the corresponding θ∗ . Then for
corresponds to the setting of the min sup parameter in θ ≤ θ∗ , IGub (θ) ≤ IGub (θ∗ ) ≤ IG0 . In this way, fre-
our framework. If IG0 is large, the corresponding θ∗ is quent patterns are generated efficiently without miss-
large and vice versa. As it is important to determine ing any feature candidates w.r.t. IG0 . As there are
the information gain threshold in most feature selec- more mature studies on how to set the information
tion algorithms, the strategy of setting an appropriate gain threshold in feature selection methods [24], we can
min sup is equally crucial. We will discuss this issue in borrow their strategy and map the selected information
Section 3.2. gain threshold to a min sup threshold in our method.
3.3 Feature Selection Algorithm MMRFS Algorithm 1 Feature Selection Algorithm MMRFS
Input: Frequent patterns F, Coverage threshold δ,
Although frequent patterns are shown to be useful Relevance S, Redundancy R
for classification, not every frequent pattern is equally Output: A selected pattern set Fs
useful. It is necessary to perform feature selection
to single out a subset of discriminative features and 1: Let α be the most relevant pattern;
remove non-discriminative ones. In this section, we 2: Fs = {α};
propose an algorithm MMRFS. The notion is bor- 3: while (true)
rowed from the Maximal Marginal Relevance (MMR) 4: Find a pattern β such that the gain g(β) is the
[4] heuristic in information retrieval, where a document maximum among the set of patterns in F − Fs ;
has high marginal relevance if it is both relevant to the 5: If β can correctly cover at least one instance
query and contains minimal marginal similarity to pre- 6: Fs = Fs ∪ {β};
viously selected documents. We first define relevance 7: F = F − {β};
and redundancy of a frequent pattern in the context of 8: If all instances are covered δ times or F = φ
classification. 9: break;
10: return Fs
Definition 3 (Relevance) A relevance measure S is
a function mapping a pattern α to a real value such
that S(α) is the relevance w.r.t. the class label.
An interesting question arises: How many frequent
Relevance models the discriminative power of a fre-
patterns should be selected for effective classification?
quent pattern w.r.t. the class label. Measures like infor-
A promising method is to add a database coverage con-
mation gain and Fisher score can be used as a relevance
straint δ, as in [13]. The coverage parameter δ is set to
measure.
ensure that each training instance is covered at least δ
Definition 4 (Redundancy) A redundancy measure times by the selected features. In this way, the num-
R is a function mapping two patterns α and β to a ber of features selected is automatically determined,
real value such that R(α, β) is the redundancy between given a user-specified parameter δ. The algorithm is
them. described in Algorithm 1.

Redundancy measures the extent by which two pat-

4 Experimental Results
terns are similar. In this paper, we use a variant of
the Jaccard measure [18] to measure the redundancy
between different features. In this section, we report a systematic experimental
study for the evaluation of our frequent pattern-based
P (α, β) classification framework and our proposed feature se-
R(α, β) = × min(S(α), S(β))
P (α) + P (β) − P (α, β) lection algorithm MMRFS.
(9) A series of datasets from UCI Machine Learning
According to the redundancy definition, we use the Repository are tested. Continuous attributes are dis-
closed frequent patterns [27] as features instead of fre- cretized. We use FPClose [9] to generate closed pat-
quent ones in our framework, since for a closed pattern terns and MMRFS algorithm to do the feature selec-
α and its non-closed sub-pattern β, β is completely tion. LIBSVM [5] and C4.5 in Weka [21] are chosen as
redundant w.r.t. α. two classification models. Each dataset is partitioned
The MMRFS algorithm searches over the feature into ten parts evenly. Each time, one part is used for
space in a heuristic way. A feature is selected if it test and the other nine are used for training. We did
is relevant to the class label and contains very low re- 10-fold cross validation on each training set and picked
dundancy to the features already selected. Initially, the best model for test. The classification accuracies
the feature with the highest relevance measure is se- on the ten test datasets are averaged and reported.
lected. Then the algorithm incrementally selects more
patterns from F with an estimated gain g. A pattern 4.1 Frequent Pattern-based Classification
is selected if it has the maximum gain among the re-
maining patterns. The gain of a pattern α given a set We test the performance of the frequent pattern-
of already selected patterns Fs is based classification. For each dataset, a set of frequent
patterns F is generated. A classification model is built
g(α) = S(α) − max R(α, β) (10) using features in I ∪ F, denoted as Pat All. MMRFS is
β∈Fs
Table 1. Accuracy by SVM on Frequent Com- Table 2. Accuracy by C4.5 on Frequent Com-
bined Features vs. Single Features bined Features vs. Single Features

Data Single Feature Freq. Pattern Dataset Single Features Frequent Patterns
Item All Item F S Item RBF P at All P at F S Item All Item F S P at All P at F S
anneal 99.78 99.78 99.11 99.33 99.67 anneal 98.33 98.33 97.22 98.44
austral 85.01 85.50 85.01 81.79 91.14 austral 84.53 84.53 84.21 88.24
auto 83.25 84.21 78.80 74.97 90.79 auto 71.70 77.63 71.14 78.77
breast 97.46 97.46 96.98 96.83 97.78 breast 95.56 95.56 95.40 96.35
cleve 84.81 84.81 85.80 78.55 95.04 cleve 80.87 80.87 80.84 91.42
diabetes 74.41 74.41 74.55 77.73 78.31 diabetes 77.02 77.02 76.00 76.58
glass 75.19 75.19 74.78 79.91 81.32 glass 75.24 75.24 76.62 79.89
heart 84.81 84.81 84.07 82.22 88.15 heart 81.85 81.85 80.00 86.30
hepatic 84.50 89.04 85.83 81.29 96.83 hepatic 78.79 85.21 80.71 93.04
horse 83.70 84.79 82.36 82.35 92.39 horse 83.71 83.71 84.50 87.77
iono 93.15 94.30 92.61 89.17 95.44 iono 92.30 92.30 92.89 94.87
iris 94.00 96.00 94.00 95.33 96.00 iris 94.00 94.00 93.33 93.33
labor 89.99 91.67 91.67 94.99 95.00 labor 86.67 86.67 95.00 91.67
lymph 81.00 81.62 84.29 83.67 96.67 lymph 76.95 77.62 74.90 83.67
pima 74.56 74.56 76.15 76.43 77.16 pima 75.86 75.86 76.28 76.72
sonar 82.71 86.55 82.71 84.60 90.86 sonar 80.83 81.19 83.67 83.67
vehicle 70.43 72.93 72.14 73.33 76.34 vehicle 70.70 71.49 74.24 73.06
wine 98.33 99.44 98.33 98.30 100 wine 95.52 93.82 96.63 99.44
zoo 97.09 97.09 95.09 94.18 99.00 zoo 91.18 91.18 95.09 97.09

applied on F and a classifier is built using features in The degree (i.e., the maximum length) of combined fea-
I ∪ Fs , denoted as Pat FS. For comparison, we test the tures depends on the value of γ where γ is the factor
2
classifiers built on single features, denoted as Item All in K(x, y) = e−γkx−yk , i.e., the degree increases as
(using all single features) and Item FS (selected single γ grows. Given a particular γ, the combined features
features), respectively. Table 1 shows the results by F p of length ≤ p are used without discriminating their
SVM and Table 2 shows the results by C4.5. In LIB- frequency or predictive power, while the combined fea-
SVM, all the above four models use linear kernel. In tures of length > p are filtered out.
addition, an SVM model is built using RBF kernel on We also observe that the performance of Pat All is
single features, denoted as Item RBF. much worse than that of Pat FS, which confirms our
From Table 1, it is clear that Pat FS achieves the reasoning that, redundant and non-discriminative pat-
best classification accuracy in most cases. It has sig- terns often overfit the model and deteriorate the clas-
nificant improvement over Item All and Item FS. This sification accuracy. In addition, MMRFS is shown to
result is consistent with our theoretical analysis that be effective. Generally, any effective feature selection
(1) frequent patterns are useful by mapping the data algorithm can be used in our framework. The emphasis
to a higher dimensional space; and (2) the discrimi- is that feature selection is an important step in frequent
native power of some frequent patterns is higher than pattern-based classification.
that of single features. The above results are also observed in Table 2 for
Another interesting observation is that the perfor- decision tree-based classification.
mance of Item RBF is inferior to that of Pat FS. The
reason is, RBF kernel has a different mechanism for fea-
ture generation from our approach. In our approach,
4.2 Scalability Tests
min sup is used to filter out low-frequency features and
MMRFS is applied to select highly discriminative fea- Scalability tests are performed to show our frequent
tures. On the other hand, the RBF kernel maps the pattern-based framework is very scalable with good
original feature vector to a possibly infinite dimension. classification accuracy. Three dense datasets, Chess,
Waveform and Letter Recognition1 from UCI reposi- C4.5. When min sup = 1, the enumeration of all the
tory are used. On each data, min sup = 1 is used to patterns cannot complete in days, thus blocking model
enumerate all feature combinations and feature selec- construction. Our framework, benefiting from higher
tion is applied over them. In comparison, the frequent support threshold, can accomplish the mining of fre-
pattern-based classification method is tested with vari- quent patterns in seconds and achieve satisfactory clas-
ant support threshold settings. sification accuracy.
Tables 4 and 5 show similar results on the other two
datasets. When min sup = 1, millions of patterns are
Table 3. Accuracy & Time on Chess Data enumerated. Feature selection fails with such a large
number of patterns. In contrast, our frequent pattern-
min sup #Patterns Time (s) SVM (%) C4.5 (%)
based method is very efficient and achieves good accu-
1 N/A N/A N/A N/A
racy within a wide range of minimum support thresh-
2000 68,967 44.703 92.52 97.59
olds.
2200 28,358 19.938 91.68 97.84
2500 6,837 2.906 91.68 97.62
2800 1,031 0.469 91.84 97.37 5 Related Work
3000 136 0.063 91.90 97.06
The frequent pattern-based classification is related
to associative classification. In associative classifica-
tion, a classifier is built based on high-confidence, high-
Table 4. Accuracy & Time on Waveform Data support association rules [14, 13, 25, 6, 19]. The asso-
ciation between frequent patterns and class labels is
min sup #Patterns Time (s) SVM (%) C4.5 (%) used for prediction.
1 9,468,109 N/A N/A N/A A recent work on top-k rule mining [6] discovers top-
80 26,576 176.485 92.40 88.35 k covering rule groups for each row of gene expression
100 15,316 90.406 92.19 87.29 profiles. Prediction is then performed based on a clas-
150 5,408 23.610 91.53 88.80 sification score which combines the support and confi-
200 2,481 8.234 91.22 87.32 dence measures of the rules.
HARMONY [19] is another rule-based classifier
which directly mines classification rules. It uses an
instance-centric rule-generation approach and assures
Table 5. Accuracy & Time on Letter Recogni- for each training instance, that one of the highest-
tion Data confidence rules covering the instance is included in
the rule set. HARMONY is shown to be more efficient
and scalable than previous rule-based classifiers. On
min sup #Patterns Time (s) SVM (%) C4.5 (%) several datasets that were tested by both our method
1 5,147,030 N/A N/A N/A and HARMONY, our classification accuracy is signifi-
3000 3,246 200.406 79.86 77.08 cantly higher, e.g., the improvement is up to 11.94%
3500 2,078 103.797 80.21 77.28 on Waveform and 3.40% on Letter Recognition.
4000 1,429 61.047 79.57 77.32 Our work is different from associative classification
4500 962 35.235 79.51 77.42 in the following aspects: (1) We use frequent patterns
to represent the data in a different feature space, in
which any learning algorithm can be used, whereas
In Table 3, we show the result by varying min sup associative classification builds a classification model
on the Chess data which contains 3, 196 instances, 2 using rules only; (2) in associative classification, the
classes and 73 items. #P atterns gives the number of prediction process is to find one or several top ranked
closed patterns. T ime gives the sum of pattern min- rule(s) for prediction, whereas in our case, the predic-
ing and feature selection time. We do not include the tion is made by the classification model; and (3) more
classification time in the table because our goal is to importantly, we provide in-depth analysis on why fre-
show that the proposed framework has good scalabil- quent patterns provide a good solution for classifica-
ity in feature generation and selection. The last two tion, by studying the relationship between the discrim-
columns give the classification accuracy by SVM and inative power and pattern support. By establishing
1 The discretized Letter Recognition data is obtained from a connection with an information gain-based feature
www.csc.liv.ac.uk/∼frans/KDD/Software/LUCS-KDD-DN/DataSets selection approach, we propose a strategy for setting
min sup as well. In addition, we demonstrate the im- [8] R. Duda, P. Hart, and D. Stork. Pattern Classification.
portance of feature selection on the frequent pattern Wiley Interscience, 2nd edition, 2000.
features and propose a feature selection algorithm. [9] G. Grahne and J. Zhu. Efficiently using prefix-trees
Other related work includes classification which uses in mining frequent itemsets. In ICDM Workshop on
Frequent Itemset Mining Implementations (FIMI’03),
string kernels [15, 12], or word combinations in NLP or
2003.
structural features in graph classification [7]. In all [10] J. Han, J. Pei, and Y. Yin. Mining frequent patterns
these studies, frequent patterns are generated and the without candidate generation. In Proc. of SIGMOD,
data is mapped to a higher dimensional feature space. pages 1–12, 2000.
Data which are not linearly separable in the original [11] M. Kuramochi and G. Karypis. Frequent subgraph
space become linearly separable in the mapped space. discovery. In Proc. of ICDM, pages 313–320, 2001.
[12] C. Leslie, E. Eskin, and W. S. Noble. The spectrum
kernel: A string kernel for svm protein classification.
6 Conclusions In Proc. of PSB, pages 564–575, 2002.
[13] W. Li, J. Han, and J. Pei. CMAR: Accurate and effi-
In this paper, we propose a systematic framework cient classification based on multiple class-association
for frequent pattern-based classification and give the- rules. In Proc. of ICDM, pages 369–376, 2001.
[14] B. Liu, W. Hsu, and Y. Ma. Integrating classification
oretical answers to several critical questions raised by
and association rule mining. In Proc. of KDD, pages
this framework. Our study shows frequent patterns are 80–86, 1998.
high quality features and have good model generaliza- [15] H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristian-
tion ability. Connected with a commonly used feature ini, and C. Watkins. Text classification using string
selection approach, our method is able to overcome two kernels. Journal of Machine Learning Research, 2:419–
kinds of overfitting problems and shown to be scal- 444, 2002.
able. A strategy for setting min sup is also suggested. [16] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen,
In addition, we propose a feature selection algorithm U. Dayal, and M.-C. Hsu. PrefixSpan: Mining se-
quential patterns efficiently by prefix-projected pat-
to select discriminative frequent patterns. Experimen-
tern growth. In Proc. of ICDE, pages 215–226, 2001.
tal studies demonstrate that significant improvement [17] J. Quinlan. C4.5: Programs for Machine Learning.
is achieved in classification accuracy using the frequent Morgan Kaufmann, 1993.
pattern-based classification framework. [18] P. Tan, V. Kumar, and J. Srivastava. Selecting the
The framework is also applicable to more complex right interestingness measure for association patterns.
patterns, including sequences and graphs. In the fu- In Proc. of KDD, pages 32–41, 2002.
ture, we will conduct research in this direction. [19] J. Wang and G. Karypis. HARMONY: Efficiently min-
ing the best rules for classification. In Proc. of SDM,
pages 205–216, 2005.
References [20] K. Wang, C. Xu, and B. Liu. Clustering transactions
using large items. In Proc. of CIKM, pages 483–490,
[1] R. Agrawal, T. Imielinski, and A. Swami. Mining asso- 1999.
ciation rules between sets of items in large databases. [21] I. H. Witten and E. Frank. Data Mining: Practical
In Proc. of SIGMOD, pages 207–216, 1993. machine learning tools and techniques. Morgan Kauf-
[2] R. Agrawal and R. Srikant. Fast algorithms for mining mann, 2nd edition, 2005.
association rules. In Proc. of VLDB, pages 487–499, [22] X. Yan and J. Han. gSpan: Graph-based substructure
1994. pattern mining. In Proc. of ICDM, pages 721–724,
[3] R. Agrawal and R. Srikant. Mining sequential pat- 2002.
terns. In Proc. of ICDE, pages 3–14, 1995. [23] X. Yan, P. S. Yu, and J. Han. Graph Indexing: A fre-
[4] J. Carbonell and J. Coldstein. The use of mmr, quent structure-based approach. In Proc. of SIGMOD,
diversity-based reranking for reordering documents pages 335–346, 2004.
[24] Y. Yang and J. O. Pedersen. A comparative study
and producing summaries. In Proc. of SIGIR, pages
on feature selection in text categorization. In Proc. of
335–336, 1998.
[5] C.-C. Chang and C.-J. Lin. LIBSVM: a library for ICML, pages 412–420, 1997.
[25] X. Yin and J. Han. CPAR: Classification based on
support vector machines, 2001. Software available at
predictive association rules. In Proc. of SDM, pages
http://www.csie.ntu.edu.tw/∼cjlin/libsvm.
331–335, 2003.
[6] G. Cong, K. Tan, A. Tung, and X. Xu. Mining top-k
[26] M. J. Zaki. SPADE: An efficient algorithm for mining
covering rule groups for gene expression data. In Proc.
frequent sequences. Machine Learning, 42(1/2):31–60,
of SIGMOD, pages 670–681, 2005.
2001.
[7] M. Deshpande, M. Kuramochi, and G. Karypis. Fre- [27] M. J. Zaki and C. Hsiao. CHARM: An efficient al-
quent sub-structure-based approaches for classifying gorithm for closed itemset mining. In Proc. of SDM,
chemical compounds. In Proc. of ICDM, pages 35–42, pages 457–473, 2002.
2003.

Adaptive XML Tree Classification On Evolving Data Streams
No ratings yet
Adaptive XML Tree Classification On Evolving Data Streams
16 pages
FP Tree Basics
No ratings yet
FP Tree Basics
67 pages
Single-Pass Interesting Frequent Pattern Mining: Without Support Threshold
No ratings yet
Single-Pass Interesting Frequent Pattern Mining: Without Support Threshold
2 pages
2007 Jiawei Han FP Mining
No ratings yet
2007 Jiawei Han FP Mining
32 pages
An Efficient Mining Algorithm For Maximal Weighted Frequent Patterns in Transactional Databases
No ratings yet
An Efficient Mining Algorithm For Maximal Weighted Frequent Patterns in Transactional Databases
12 pages
Integration of Classification and Pattern Mining: A Discriminative and Frequent Pattern-Based Approach
No ratings yet
Integration of Classification and Pattern Mining: A Discriminative and Frequent Pattern-Based Approach
126 pages
Generic Pattern Mining
No ratings yet
Generic Pattern Mining
17 pages
A Review of Feature Selection and Its Methods
No ratings yet
A Review of Feature Selection and Its Methods
15 pages
Association Rules
No ratings yet
Association Rules
20 pages
What Is Frequent Pattern Analysis?
No ratings yet
What Is Frequent Pattern Analysis?
5 pages
DM-BS-lec6-Mining Frequent Patterns
No ratings yet
DM-BS-lec6-Mining Frequent Patterns
37 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
26 pages
Frequent Itemset Mining
No ratings yet
Frequent Itemset Mining
58 pages
Mining Frequent Patterns Without Candidate Generation
No ratings yet
Mining Frequent Patterns Without Candidate Generation
12 pages
CS 412 Intro. To Data Mining
No ratings yet
CS 412 Intro. To Data Mining
55 pages
Frequent Pattern Mining
No ratings yet
Frequent Pattern Mining
2 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
65 pages
Improv Me Net
No ratings yet
Improv Me Net
7 pages
What Is Frequent Pattern Analysis?
No ratings yet
What Is Frequent Pattern Analysis?
37 pages
KDDM-Lecture 3
No ratings yet
KDDM-Lecture 3
21 pages
FP Growth (Tree)
No ratings yet
FP Growth (Tree)
24 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
67 pages
04 FPbasic
No ratings yet
04 FPbasic
78 pages
Frequent Pattern Based Clustering Methods
No ratings yet
Frequent Pattern Based Clustering Methods
23 pages
Week 4
No ratings yet
Week 4
59 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
65 pages
Algorithms For Frequent Itemset Mining: A Literature Review
No ratings yet
Algorithms For Frequent Itemset Mining: A Literature Review
19 pages
06 Association Rule Mining
No ratings yet
06 Association Rule Mining
20 pages
06 FPBasic
No ratings yet
06 FPBasic
37 pages
Unit-03 DW&DM Notes Ashish Singh PDF 11
No ratings yet
Unit-03 DW&DM Notes Ashish Singh PDF 11
8 pages
Powerpoint Presentation On Somlething
No ratings yet
Powerpoint Presentation On Somlething
181 pages
Week 3
No ratings yet
Week 3
56 pages
Unit 3
No ratings yet
Unit 3
62 pages
A New Feature Selection Method Based On Frequent A
No ratings yet
A New Feature Selection Method Based On Frequent A
15 pages
DWDWM Unit2
No ratings yet
DWDWM Unit2
59 pages
Note 1455181909
No ratings yet
Note 1455181909
30 pages
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
No ratings yet
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
66 pages
Chapter - 6 Data Mining
No ratings yet
Chapter - 6 Data Mining
65 pages
Association
No ratings yet
Association
40 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
4 pages
PR Assignment 01 - Seemal Ajaz (206979)
No ratings yet
PR Assignment 01 - Seemal Ajaz (206979)
7 pages
Clustering Before Classification
No ratings yet
Clustering Before Classification
3 pages
Slide 06 Chapter6 Frequent Itemset Mining Methods
No ratings yet
Slide 06 Chapter6 Frequent Itemset Mining Methods
62 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
Updated Module 3
No ratings yet
Updated Module 3
31 pages
A Review of Artificial Intelligence Methods For Engineering Prognostics and Health Management With Implementation Guidelines
No ratings yet
A Review of Artificial Intelligence Methods For Engineering Prognostics and Health Management With Implementation Guidelines
51 pages
DWDS Unit 4
No ratings yet
DWDS Unit 4
56 pages
Frequent Pattern Mining Without Candidate Generation: Lesson Introduction
No ratings yet
Frequent Pattern Mining Without Candidate Generation: Lesson Introduction
6 pages
2 Unit DM K Raj Kuamr
No ratings yet
2 Unit DM K Raj Kuamr
26 pages
Use of Performance Metrics To Forecast Success in The National Hockey League
No ratings yet
Use of Performance Metrics To Forecast Success in The National Hockey League
10 pages
Data Mining UNIT 3 LECTURE NOTES
No ratings yet
Data Mining UNIT 3 LECTURE NOTES
13 pages
Chapter 4
No ratings yet
Chapter 4
32 pages
DWDM Answer
No ratings yet
DWDM Answer
19 pages
06apriori Edited v3
No ratings yet
06apriori Edited v3
29 pages
DWDM - Unit - IV
No ratings yet
DWDM - Unit - IV
67 pages
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
108 pages
Lecture 2.3.3 2.3.4
No ratings yet
Lecture 2.3.3 2.3.4
29 pages
DM Unit2 - 1 Association Mining 19I504
No ratings yet
DM Unit2 - 1 Association Mining 19I504
86 pages
Business Data Mining Week 4
No ratings yet
Business Data Mining Week 4
12 pages
Liveness Detection in Face Recognition Using Deep Learning
No ratings yet
Liveness Detection in Face Recognition Using Deep Learning
4 pages
DM Unit-2
No ratings yet
DM Unit-2
14 pages
Exam 3
No ratings yet
Exam 3
3 pages
Object Tracking Methods-A Review
No ratings yet
Object Tracking Methods-A Review
7 pages
Data Mining - Classification Using Frequent Pattern
No ratings yet
Data Mining - Classification Using Frequent Pattern
8 pages
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
No ratings yet
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
30 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
61 pages
I D L A R: Mbalanced ATA Earning Pproaches Eview
No ratings yet
I D L A R: Mbalanced ATA Earning Pproaches Eview
19 pages
BTP Sixth Sem Report
No ratings yet
BTP Sixth Sem Report
31 pages
Project Report Half
No ratings yet
Project Report Half
33 pages
ADC 2018 Paper 49
No ratings yet
ADC 2018 Paper 49
14 pages
Tkinter Tuitorial
No ratings yet
Tkinter Tuitorial
24 pages
CTC Loss Function
No ratings yet
CTC Loss Function
20 pages
CS F415 L1 - Introduction
No ratings yet
CS F415 L1 - Introduction
24 pages
TEA EKHO IDS: An Intrusion Detection System For Industrial CPS With Trustworthy Explainable AI and Enhanced Krill Herd Optimization
No ratings yet
TEA EKHO IDS: An Intrusion Detection System For Industrial CPS With Trustworthy Explainable AI and Enhanced Krill Herd Optimization
29 pages
Model Selection and Feature Selection: Piyush Rai CS5350/6350: Machine Learning
No ratings yet
Model Selection and Feature Selection: Piyush Rai CS5350/6350: Machine Learning
14 pages
ML Lab Manual (IT-804)
No ratings yet
ML Lab Manual (IT-804)
49 pages
Automatic Stress Detection in Working en
No ratings yet
Automatic Stress Detection in Working en
9 pages
Report Big Data-1
No ratings yet
Report Big Data-1
30 pages
Module 2 - ML
No ratings yet
Module 2 - ML
53 pages
Basics of Feature Engineering Marked
No ratings yet
Basics of Feature Engineering Marked
33 pages
IV Sem Internship Report
No ratings yet
IV Sem Internship Report
17 pages
Chi-Square Feature Selection Effect On Naive Bayes Classifier Algorithm Performance For Sentiment Analysis Document
No ratings yet
Chi-Square Feature Selection Effect On Naive Bayes Classifier Algorithm Performance For Sentiment Analysis Document
8 pages
Lecture 2
No ratings yet
Lecture 2
27 pages
Ai 900 Questions
No ratings yet
Ai 900 Questions
57 pages
Data-Driven A Review
No ratings yet
Data-Driven A Review
20 pages
Sustainability 14 06199
No ratings yet
Sustainability 14 06199
23 pages
Face Mask Detection in The Era of The COVID-19: How Can Machine Learning Help The Authorities?
No ratings yet
Face Mask Detection in The Era of The COVID-19: How Can Machine Learning Help The Authorities?
7 pages
Prac3 - Variable Selection
No ratings yet
Prac3 - Variable Selection
6 pages
Almugren, Alshamlan - 2019 - A Survey On Hybrid Feature Selection Methods in Microarray Gene Expression Data For Cancer Classification
No ratings yet
Almugren, Alshamlan - 2019 - A Survey On Hybrid Feature Selection Methods in Microarray Gene Expression Data For Cancer Classification
16 pages
A Big Data-Driven Hybrid Model For Enhancing Streaming Service Customer Retention Through Churn Prediction Integrated With Explainable AI
No ratings yet
A Big Data-Driven Hybrid Model For Enhancing Streaming Service Customer Retention Through Churn Prediction Integrated With Explainable AI
21 pages
Exploring Engineering
No ratings yet
Exploring Engineering
4 pages
Tingkat Signifikansi Untuk Uji 1 Arah
No ratings yet
Tingkat Signifikansi Untuk Uji 1 Arah
5 pages
im (x −2) δ ϵ - x −9 - ϵ ⟺ - x−3 - δ
No ratings yet
im (x −2) δ ϵ - x −9 - ϵ ⟺ - x−3 - δ
2 pages
Jntuk Machine Learning 3-2 Unit-4
No ratings yet
Jntuk Machine Learning 3-2 Unit-4
32 pages
Dining With The Data: The Case of New York City and Its Restaurants
No ratings yet
Dining With The Data: The Case of New York City and Its Restaurants
7 pages
A Survey On Challenges and Techniques of Sentiment Analysis
No ratings yet
A Survey On Challenges and Techniques of Sentiment Analysis
6 pages
Predicting Customer Churn A Systematic Literature Review
No ratings yet
Predicting Customer Churn A Systematic Literature Review
22 pages
ICET Presentation 1
No ratings yet
ICET Presentation 1
23 pages