0% found this document useful (0 votes)
60 views6 pages

Cheng 2013

This document summarizes a study that used machine learning techniques and chemometric analysis to evaluate Bupleuri Radix (Chaihu), a traditional Chinese herbal medicine, through high-performance thin-layer chromatography (HPTLC). 64 Chaihu samples from different Bupleurum species were analyzed by HPTLC to assess bioactive components. Image processing and classification algorithms like SVM and neural networks were used to construct prediction models from HPTLC image features. Ensemble feature selection combined with classifiers enhanced discrimination. Experimental results showed that commercial samples could be readily distinguished and classified.

Uploaded by

ngocan886979
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views6 pages

Cheng 2013

This document summarizes a study that used machine learning techniques and chemometric analysis to evaluate Bupleuri Radix (Chaihu), a traditional Chinese herbal medicine, through high-performance thin-layer chromatography (HPTLC). 64 Chaihu samples from different Bupleurum species were analyzed by HPTLC to assess bioactive components. Image processing and classification algorithms like SVM and neural networks were used to construct prediction models from HPTLC image features. Ensemble feature selection combined with classifiers enhanced discrimination. Experimental results showed that commercial samples could be readily distinguished and classified.

Uploaded by

ngocan886979
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Analytical

Methods
View Article Online
PAPER View Journal | View Issue
Published on 19 September 2013. Downloaded by Christian Albrechts Universitat zu Kiel on 27/10/2014 12:12:59.

Combination of effective machine learning techniques


and chemometric analysis for evaluation of Bupleuri
Cite this: Anal. Methods, 2013, 5, 6325
Radix through high-performance thin-layer
chromatography
Xiaoping Cheng,a Hongmin Cai,*a Ping He,b Yue Zhangc and Runtiao Tiand

Chaihu (Bupleuri Radix), the root of Bupleurum chinense and B. scorzonerifolium, is a traditional Chinese
herbal medicine authenticated in the Chinese Pharmacopoeia. There are also several variations available
from local herbal markets, for example, the roots of B. falcatum, B. bicaule, and B. marginatum var.
stenophyllum. In the current study, we collected 64 Chaihu samples, including 33 authenticated samples
and 31 commercial samples. Test solutions of all the examples were analysed by high-performance thin-
layer chromatography (HPTLC) to assess the principal bio-active components (saikosaponins). The HPTLC
fluorescent images acquired were analyzed by sophisticated image processing techniques for
comprehensive quantification. High dimensional features for both gray-scale and true color images
were constructed for the raw images. Classical classification algorithms, including naive Bayes, Support
Vector Machine (SVM), K-nearest neighbors, neural network and logistic, were used to construct
prediction models. To gain an insight into the principal components while evaluating the Chaihu
sample, feature selection and ensemble feature selection methods were further combined with the
Received 9th July 2013
Accepted 19th September 2013
classifiers to enhance the discrimination power. Ensemble feature selection was shown to achieve
superior performance. Experimental results demonstrated that the roots of Chaihu from different
DOI: 10.1039/c3ay41132j
species of the genus Bupleurum could be readily distinguished so that commercial samples could be
www.rsc.org/methods easily classified.

1 Introduction Therefore, an automatic and effective method is a necessity to


evaluate the quality of these different species of Chaihu. It has
Chaihu (Bupleuri Radix), the root of Bupleurum chinense and B. been shown in ref. 6 and 7 that HPTLC8 is more effective in
scorzonerifolium, is a commonly used Chinese herbal medicine assessing the therapeutic power of herbal medicines than
and has been officially recorded in all the editions of the conventional methods, such as high-performance liquid chro-
Chinese Pharmacopoeia. In clinical practice in China, it is used matographic (HPLC)9 and thin-layer chromatographic (TLC). It
in the treatment of fevers, the common cold, headaches, and allows for higher resolution and thus more accurate quantita-
liver disorders. It has also been employed to increase sweating, tive measurements than TLC. With the aid of modern tech-
to prevent kidney problems, as a liver tonic, and as a spleen and niques in articial intelligence, automatic quality assessment
stomach tonic.1 It has been recognized that the principle system for herb medicine with high accuracy has recently been
bioactive components of Chaihu are saikosaponins,2 with sai- reported.10,11
kosaponins a and d being the dominant ones. In addition to the In the current study, thirty-three lots of authenticated
authentic species of Chaihu such as B. chinense, B. scorzoner- samples and thirty-one lots of commercial Chaihu were
ifolium and B. falcatum, there are more than 20 other species of collected and analyzed by HPTLC to get a pictorial description.
the genus Bupleurum which are also habitually utilized as The purpose of this study was to design and assess an effective
Chaihu in China's markets.3,4 However, most of the variants discrimination system based on classical machine learning
possess inferior qualities to those of authenticated Chaihu.5 techniques to achieve fully automatic quality evaluation. To
achieve this goal, more than 500 attributes were calculated from
a
School of Computer Science and Engineering, South China University of Technology, HTPLC uorescence images to quantify their chemical proles
Guangdong, P.R. China. E-mail: [email protected] comprehensively. Classical classication tools, including naive
b
Division of Science and Technology, Beijing Normal University-Hong Kong Baptist Bayes (NB),12 support vector machine (SVM),13 K-nearest neigh-
University United International College, Zhuhai, China
c
bors (KNN),14 neural network (RBF-NN)15 and logistic,16 were
School of Zhuhai, Jinan University, Zhuhai, China
d
used to discriminate the authentic Chaihu from fake ones.17 To
ChemMind Technologies Co., Ltd., Beijing, China

This journal is ª The Royal Society of Chemistry 2013 Anal. Methods, 2013, 5, 6325–6330 | 6325
View Article Online

Analytical Methods Paper

further enhance the discrimination power, ensemble feature sprayed with DMAB reagent and heated at 105  C on a TLC plate
selection and various feature selection mechanisms were heater (CAMAG) until the colour of the saponins was distinct.
combined with various classiers. Extensive experiments are The uorescent images were examined at 365 nm by using a UV
reported and analyzed on the performance of the combination viewer cabinet (CAMAG). The images were captured by a Digi-
of classication tools and HPTLC. The current study demon- store 2 documentation system (CAMAG). The excitation wave-
strates that the combination of advanced machine learning length was 366 nm in the reection mode and the exposure time
Published on 19 September 2013. Downloaded by Christian Albrechts Universitat zu Kiel on 27/10/2014 12:12:59.

techniques and HPTLC can assess the quality of different was 3 seconds.
species of Chaihu in an accurate and effective way. A sample image obtained following the aforementioned
procedures is shown in Fig. 1(a).

2 Experimental
3 Pattern analysis for HPTLC
2.1 HPTLC experimental sample
Sixty-four batches of Chaihu samples were collected from To obtain an effective discrimination system with machine
different herbal markets or harvested from various habitats. learning techniques, pattern analysis was essential for our
Among them, thirty-one samples, including B. chinense, study. These procedures are depicted in Fig. 2.
B. scorzonerifolium, B. falcatum, B. longiradiatum, B. bicaule and
B. marginatum var. stenophyllum, were authenticated by bota- 3.1 HPTLC ngerprint images preprocessing
nists Prof. Z. D. Wang of Henan Science & Technology Univer- The raw HTPLC uorescence images have to be preprocessed to
sity, China and Prof. D. Q. Wang of Anhui University of standardize the data in order to prevent any side-effects arising
Traditional Chinese Medicine, China. from the experiment, such as image shiing and nonuniform
lighting. An example is shown in Fig. 1. The proposed pre-
2.2 HPTLC experiment setup processing method consists of two steps. In the rst step, the
raw image are converted into gray-scale or a true color image
The chemical reagents for the experiment were obtained from
and the noise suppression scheme aims to enhance the image
the Guangzhou Chemical Reagent Factory (Guangzhou, China).
quality. This step facilitates feature extraction and is used for
Chemical reference standards of saikosaponin a and saikosa-
quantication of the image. In the second step, the denoised
ponin d were provided by the National Institute for the Control
images are aligned manually so that the tested images are the
of Pharmaceutical and Biological Products (Beijing, China).
same size. The head and tail portion of each standardized
Chemical references of saikosaponin c, saikosaponin f and
image were removed as the images were imperfect because of
saikosaponin b2 were provided by Henan College of Traditional
nonuniform lighting.
Chinese Medicine, China.
The experimental procedure for the preparation of the
3.2 HPTLC image feature construction
HPTLC ngerprint was as follows:
(1) Preparation of sample solution: A 0.3 g portion of powdered 3.2.1 Feature calculation. Since the intensity of the pre-
herb was added to 20 mL of solution of 0.5% pyridine in processed images show band-wise variations, their averaged
methanol to prevent the degradation of saikosaponins a and d. intensity prole could be used to quantify the variation. To be
The mixture was reuxed twice in a water bath at 80  C for 30 specic, the peak and valley values along the curve at particular
minutes and ltered aerwards. The ltrate was evaporated to positions were estimated as feature values. In Fig. 3, the
dryness in a fume cupboard and reconstituted in 3 mL of water detected peaks and valleys are plotted as stars. In addition, each
before the suspension was applied to a C18 cartridge. Aer
elution with 10 mL of 30% methanol and 20 mL of 80%
methanol, successively, the 80% methanol fraction was evapo-
rated to dryness and the residue was dissolved in 2 mL of
methanol. The solution was subsequently ltered through a
0.45 mm membrane lter before analysis.
(2) Preparation of references solution: A 5 mg portion of each
saikosaponin reference was dissolved in 5 mL of methanol.
(3) HPTLC chromatographic condition: The sample solutions
were applied bandwisely via an ATS4 auto-sampler (CAMAG,
Muttenz, Switzerland) onto a commercial 20 cm  10 cm pre-
coated HPTLC Silica gel 60-plate (Merck). The sample plate was
placed into a desiccator with phosphorus pentoxide and dried
under vacuum for 2 hours before development. Fieen millili-
ters of mobile phase consisting of dichloromethane–ethyl
acetate–methanol–water (30 : 40 : 15 : 3, v/v/v/v) was added into Fig. 1 Demonstration of preprocessing of HPTLC fingerprint images. (a) A raw
a twin-trough chamber, to saturate it for 15 minutes. The plate HPTLC fingerprint image; (b) gray scale transformed image with histogram
in the chamber was developed upward over a path of 8 cm and equalization; (c) image after alignment to have uniform size.

6326 | Anal. Methods, 2013, 5, 6325–6330 This journal is ª The Royal Society of Chemistry 2013
View Article Online

Paper Analytical Methods

classier by removing redundant features. Four standard


methods of feature selection were used in our experiments
because of their good performance. The four feature selection
methods include Correlation-based Feature Selection (CFS),18
Chi-square feature evaluation,19 Gain Ratio (GR) attribute eval-
uation20 and RELIEF feature selection method.21 A short
Published on 19 September 2013. Downloaded by Christian Albrechts Universitat zu Kiel on 27/10/2014 12:12:59.

summary follows of the attributes of the tested selection


schemes:
 CFS evaluates the worth of a subset of attributes by
considering the individual predictive ability of each feature
along with the degree of redundancy between them.
Fig. 2 The framework for pattern analysis.  Chi-square based feature selection evaluates the worth of
an attribute by computing the value of the chi-squared statistic
with respect to the class.
 GR attribute evaluation attempts to evaluate the feature
importance by computing an information gain measure given
by a feature subset candidate.
 RELIEF algorithm weights features iteratively by adjusting
Fig. 3 Peaks and valleys of gray intensities were estimated from the pre-
feature importance according to their ability to discriminate
processed image. Red and green stars were used to differentiate the peaks and
the valleys for the gray values. The relative position and the amplitude of the
between neighboring patterns by maximizing an expected
peaks and the valleys were recorded to serve as features for quantification of the margin through scaling of features.
HPTLC image. During our experiments, the chosen feature subsets were
those which provided the most satisfactory results.
3.2.3 Ensemble feature selection. To make use of the
bar in the tested image is made up of about 12 pixels and thus merits characterized by different features, an ensemble feature
the statistical characteristics of each bar were also calculated to selection technique was conducted to nd out the compact but
serve as feature values. In our experiment, a bar area was rich information conveyed by feature subsets. Ensemble feature
dened to be a block with a width of 12 pixels. selection22,23 is a popular supervised machine learning method.
3.2.2 Feature selection. Based on the aforementioned It rstly generates an ensemble of classiers including several
feature calculation method, four feature sets were obtained. The base classiers and then integrates these class predictions by a
rst feature set had a dimension of 504 containing all of the voting strategy. Traditional feature selection algorithms have
features extracted. The second feature set had a dimension of the goal of nding out the best feature subset which is bound by
258 aer removing the statistical characterization of the true both the learning task and the selected inductive learning
color image. The third feature set had a dimension of 176 aer algorithm, while the task of ensemble feature selection has an
removing both the bar characterization of the gray and color additional goal of nding out a set of feature subsets which will
images. The fourth with 53 features contained only the char- promote disagreement among the base classiers. Ho22 has
acterization of the gray images, including the numbers and shown that simple random selection of feature subsets may be
densities of both the peaks and valleys, the statistical charac- an effective technique for ensemble feature selection. In this
terization of the chromatographic bands. method, one randomly selects N* < N features from the N-
The obtained feature sets were then processed by classical dimensional training set T and obtains a new N* – dimensional
feature selection schemes to enhance the performance of the random subspace of the original N – dimensional feature space.

Table 1 Experimental results after various classifiers on feature subset with/without PCA processing. The best results for each scheme are highlighted in italics. The
overall performance of the second feature subset was the best. Processing by PCA did not enhance the classification performance obviously

Feature set Classication accuracy (%)

Feature subset
(#features) PCA processing NB SMO RBF-NN KNN Logistic Average

I (504) No 79.69 93.75 82.81 85.94 85.94 85.62


Yes 82.81 84.38 85.94 87.5 81.25 84.38
II (258) No 82.81 89.06 90.63 92.19 87.5 88.49
Yes 82.81 87.5 92.19 85.94 90.63 87.81
III (176) No 75 89.06 81.25 87.5 81.25 82.81
Yes 82.81 84.38 87.5 85.94 87.5 85.63
IV (53) No 81.25 85.94 87.5 85.94 76.56 83.44
Yes 81.25 84.38 81.25 90.63 76.56 82.81

This journal is ª The Royal Society of Chemistry 2013 Anal. Methods, 2013, 5, 6325–6330 | 6327
View Article Online

Analytical Methods Paper

Table 2 Experimental results of combinational performance of feature selection method with various classifiers. The best results for each scheme are highlighted in
italics. The overall accuracy of the second feature subsets was far better than that of the other feature subsets

Classication accuracy (%)

Feature set Method #Features NB SMO RBF-NN KNN Logistic Average


Published on 19 September 2013. Downloaded by Christian Albrechts Universitat zu Kiel on 27/10/2014 12:12:59.

I CFS 134 78.13 82.81 84.38 82.81 76.56 80.94


Chi 304 75 87.5 81.25 85.94 79.69 81.88
II GR 20 93.75 85.94 90.63 95.31 89.06 90.94
ReliefF 15 87.5 84.38 90.63 92.19 82.81 87.5
III CFS 23 78.13 82.81 84.38 85.94 84.38 83.13
FSE 22 78.13 84.38 82.81 85.94 85.94 83.44
IV GR 9 79.69 87.5 84.38 84.38 81.25 83.44
ReliefF 12 81.25 81.25 85.94 82.81 76.56 81.56

Table 3 Experimental results of the performance of ensemble feature selection ensemble of the base classiers. This is achieved through three
technology combined with the Libsvm tool for the four feature sets. The overall steps: (1) elimination of low classication accuracy base clas-
accuracy of the second feature set is slightly superior to that of the other sets
siers; (2) removal of those base classiers that have identical
Classication predictions; (3) integration of the base classiers with criteria:
Feature set (#features) #Features accuracy (%) P
~
p ¼ F( ukvm) (1)
I (504) 40 87.5  3.1
II (258) 30 95.3  1.6 where ~p is the predicted result aer integration. vm is the pre-
III (176) 30 90.6  3.1 dicted result of the base classier of bm, and uk is its prediction
IV (53) 20 93.8  3.1
accuracy. The piecewise linear function F($) is an error function,

1; x . 0:5
and is dened by FðxÞ ¼ . Therefore, the inte-
0; otherwise
gration scheme predicts the base classier with high prediction
accuracy.

3.3 Classication schemes


The classication was conducted by using classical techniques
including naive Bayes,12 support vector machine (SVM)13 via
SMO25 optimization and SVM,24 K-nearest neighbors (KNN),14
radial basis function neural network (RBF-NN)15 and logistic.16
For all of the classiers used in this study, hyper-parameters
were estimated via 10-fold cross validation. The coding and
simulation experiments were conducted within Matlab
(Mathworks Co., Ltd., Boston, MA, USA) and Weka (http://
www.cs.waikato.ac.nz/ml/weka/), an open source soware for
Fig. 4 Experimental results of the classification accuracy of different classifier data mining. To eliminate the statistical variations, we con-
schemes for different feature sets. Based on different technology for feature
ducted ten experiments independently on each feature set and
selection, the best performance for each classifier was chosen. The ensemble
feature selection method combed with the Libsvm tool achieved significant averaged the classication accuracy to determine the perfor-
results. mance of a particular feature set. The classiers used are briey
described below.
3.3.1 Naive Bayes (NB) classier. The naive Bayes classi-
This strategy is repeated k times to build k feature subsets which er12 is obtained based on the famous Bayesian theorem and is
are then used to construct k base classiers. Finally, an inte- particularly suited when the dimensionality of the inputs is
grated decision rule is obtained by combining several base high. Despite its simplicity, naive Bayes can oen outperform
classiers. The integration of an ensemble of classiers has many sophisticated classication methods.26 Depending on the
oen been shown to achieve higher accuracy than the most precise nature of the probability model, naive Bayes classiers
accurate base classier alone in different real-world tasks. In can be trained very efficiently in a supervised learning setting.
this study, the ensemble feature selection technique consists of 3.3.2 Support vector machine (SVM). Support vector
two steps: learning and integration. In the learning phase, machine (SVM)13 is a famous machine learning technology,
based on the simple random selection method and the Libsvm24 whose main aim is to construct a hyper separation plane by
tool, an ensemble of k base classiers b1, b2,., bk is generated. maximizing a distance margin. As for the linear inseparable
In the integration phase, one achieves class predictions via problem, the observed sample is rstly converted into a highly

6328 | Anal. Methods, 2013, 5, 6325–6330 This journal is ª The Royal Society of Chemistry 2013
View Article Online

Paper Analytical Methods

dimensional feature space by using a non-linear mapping main experimental results are summarized in Table 1. The
algorithm to make it linearly separable. In this paper, two overall performance of the second feature set was superior to
improved training algorithms for SVM were used, that is the other feature sets as it reached an accuracy of 90%. Since the
Sequential Minimal Optimization (SMO)25 classier and color information was omitted in the second feature set, the
Libsvm.24 results imply that removal of the color information can enhance
SMO helps to accelerate the solving procedure by breaking a the discrimination power. The possible reason is that the color
Published on 19 September 2013. Downloaded by Christian Albrechts Universitat zu Kiel on 27/10/2014 12:12:59.

large quadratic programming (QP) problem down into a series information may be inaccurate due to imperfect imaging pro-
of smaller QP problems. SMO improves the scaling and reduces cessing, such as non-uniform lighting. A similar observation
computation time signicantly by utilizing the smallest was shown in the second example in which the classication
possible QP problems. performance of the rst feature set was vastly inferior to the
In ensemble feature selection, Libsvm is widely used as an latter ones. Another worthy point to note is that the classica-
efficient SVM tool. There are two steps involved in the LIBSVM: tion accuracy aer PCA did not show obvious improvement over
(1) the dataset is trained to obtain a model; (2) the model is used one without PCA preprocessing, possibly because of medium
to predict the information for the testing dataset. In this paper, feature dimensions.
a polynomial kernel was used. In the second experiments, various methods for feature
3.3.3 K-Nearest neighbors (KNN) classier. The K-nearest selection were rstly applied to the four feature subsets. The
neighbors algorithm14 was implemented for pattern recognition resulting feature subset was then fed into various classication
by using the so-called weighed vote formula to predict the herbal algorithms, including naive Bayes, SVM, RBF-NN, KNN and
species based on the spatial distances between observation and logistic classiers. The averaged performance aer each clas-
target vectors. sier of different feature subsets is summarized in Table 2. The
3.3.4 Radial basis function neural network (RBF-NN) clas- performance of the second feature set was the best in most of
sier. Neural network15 based on radial function is an efficient the cases. The results were similar to the rst experiment.
feed forward neural network.27 It has the best approximation of Furthermore, the performance of the classier dramatically
performance and global optimum characteristics, which other increased aer feature selection processing. For example, the
forward networks do not have. It has a simple structure and fast classier of naive Bayes reached an accuracy of 93.75% in
training speed. comparison with 82.81% which was achieved without feature
3.3.5 Logistic classier. Logistic16 is a classier for building selection. The accuracy of KNN reaches 95.31% with feature
and using a multinomial logistic regression model with a ridge selection, while 92.19% was achieved without processing. The
estimator. good performances implies that high accuracy can be obtained
3.3.6 Principal component analysis (PCA).28 Principal by removing redundant information in the feature set.
component analysis (PCA) is widely adopted as a preprocessing In the third experiment, the ensemble feature selection
procedure which uses an orthogonal transformation to convert method with the base classier of SVM was tested on the four
a set of observations of possibly correlated variables into a set of feature sets and achieved remarkable results. In our experi-
values of linearly uncorrelated variables, called principal ment, een base classiers were rstly constructed for
components. In many analysis case, the number of principal randomly selected features. Extensive experiments were con-
components which account for most of the variance in the ducted to search for a feature subset with good performance.
observed variables is signicantly less than that of the original The performance of the classier was evaluated via 10-fold cross-
variables.29 In our study, the correlations among the features of validation. Experimental results are summarized in Table 3.
the HPTLC images are high and thus PCA is expected to achieve Performance of the second and fourth feature sets produced
good performance by reducing the redundant features and optimal and suboptimal results. Similar to the previous two
improving classication performance. experiments, classication of the second feature set was the
best and an equal accuracy of 95.3% was achieved.
4 Experimental results In order to compare the performance of the different clas-
siers via different feature selection technologies, Fig. 4 was
In this section, we demonstrate the performance of the various plotted. As shown in Fig. 4, the ensemble feature selection
classiers on the four feature subsets by combining different method combined with the Libsvm tool achieved a signicantly
feature selection with ensemble feature selection methods. The superior accuracy for classication in comparison with the
purpose of this section is to show that the fully automated other methods.
classication models can achieve high accuracy in discrimina-
tion of authentic Chaihu samples from fake ones when the raw 5 Conclusion
images were characterized by accurate quantitative
measurements. HPTLC has been shown to be promising for the development of
In the rst experiment, the four feature sets were processed chromatographic ngerprint proling methods to determine
independently by PCA to get an economic representation by complex herb extracts. The pictorial nature of an HPTLC image
discarding 5% of the least informative components. The provides extra intuitively visible measurements for assessing its
resulting feature representation was then combined with chemical characteristics. However, quantitative image analysis
various classiers to evaluate their discrimination power. The of HPTLC remains open as well as its clinical potential. Besides,

This journal is ª The Royal Society of Chemistry 2013 Anal. Methods, 2013, 5, 6325–6330 | 6329
View Article Online

Analytical Methods Paper

various contents of saikosaponins among different samples of 10 P. Torrione, K. D. Morton and L. Collins, Chemometrics and
Chaihu species were observed, which calls for not only assess- Machine Learning for Spectral Analysis, Optical Society of
ing the clinical quality by analyzing the multiple marker America, 2012, vol. 1, pp. 3–10.
components individually but also recognizing the entire 11 Y. Q. Wang, H. X. Yan, R. Guo and F. F. Li, Int. J. Data Min.
ngerprint pattern for consistency assurance and authentica- Bioinf., 2011, 5, 369–382.
tion purposes. 12 G. John and P. Langley, Estimating Continuous Distributions
Published on 19 September 2013. Downloaded by Christian Albrechts Universitat zu Kiel on 27/10/2014 12:12:59.

In the current study, various techniques for machine in Bayesian Classiers, Morgan Kaufmann, 1995, vol. 3, pp.
learning and image analysis were combined to evaluate the 338–345.
chemical quality of Bupleuri Radix through HPTLC. Four 13 C. Cortes and V. Vapnik, Mach. Learn., 1995, 3, 273–297.
inherent feature subsets were rstly derived to quantify the 14 D. W. Aha, D. Kibler and M. K. Albert, Instance-Based
pictorial characteristics of the HPTLC image. In order to test the Learning Algorithms, Springer, Netherlands, 1991, vol. 1, pp.
discrimination potential of the derived feature, various stan- 37–66.
dard machine learning schemes were used. Various feature 15 A. Guillén, I. Rojas and González, Neural Process. Lett., 2007,
selection methods, including lter scheme and ensemble 25, 209–225.
scheme combined with advanced classiers, were carried out to 16 S. L. Cessie and J. C. V. Houwelingen, J. Appl. Stat., 1992, 2,
assess the ngerprint pattern. Experimental results have 191–201.
conrmed the high accuracy in discriminating various samples 17 Q. Huang, Y. Zhuang, X. B. Qiao and X. J. Xu, Acta Phys.-
of Chaihu species. This study has revealed a promising way for Chim. Sin., 2007, 23, 1141–1145.
classifying the intrinsic inconsistency of herbal quality when 18 A. L. Blum and P. Langley, Artif. Intell., 1997, 97, 245–271.
their distribution of principal ingredients in this herb varied 19 H. Liu and R. Setiono, Chi2: Feature Selection and
from one batch to another. Discretization of Numeric Attributes, IEEE Computer
Society, Herndon, Virginia, 1995, vol. 2, pp.
388–391.
Acknowledgements 20 J. W. Han and M. Kamber, Data Mining: Concepts and
Techniques (The Morgan Kaufmann Series in Data
This work was supported by NSFC under award number
Management Systems), Morgan Kaufmann, 1st edn, 2000,
60902076, 61372141, and the Fundamental Research Funds for
vol. 2, pp. 179–220.
the Central Universities under award number 2013ZM0079.
21 K. Kira and L. A. Rendell, A Practical Approach to Feature
Selection, Morgan Kaufmann Publishers Inc., 1992, vol. 2,
pp. 249–256.
References
22 T. K. Ho, IEEE Trans. Pattern Anal. Mach. Intell., 1998, 20,
1 C. P. Commission, Pharmacopoeia of the People's Republic of 832–844.
China, Chemical Industry Press, 2011, vol. 1, pp. 196–197. 23 A. Tsymbal, S. Puuronen and D. W. Patterson, Inf. Fusion,
2 Z. H. Su, S. Q. Li and G. A. Zou, J. Pharm. Biomed. Anal., 2011, 2003, 4, 87–100.
55, 533–539. 24 C. C. Chang and C. J. Lin, ACM Trans. Intell. Syst. Technol.,
3 J. P. Committee, The Japanese Pharmacopoeia, Ministry of 2011, 2, 1.
Health, Japan Tokyo, 2000, vol. 1, pp. 876–878. 25 Advances in Kernel Methods, ed. B. Schölkopf, C. J. C. Burges
4 C. P. Commission, Pharmacopoeia of the People's Republic of and A. J. Smola, MIT Press, Cambridge, MA, USA, 1999, vol.
China, Peoples Health Publishing House, 1963, vol. 1, pp. 2, pp. 185–208.
237–238. 26 H. Wang, A Computerized Diagnostic Model Based on Naive
5 P. Xiao, Modern Chinese Materia Medica, Chemical Industry Bayesian Classier in Traditional Chinese Medicine, IEEE
Press, 2002, vol. 1, pp. 784–785. Computer Society, 2008, vol. 1, pp. 474–477.
6 S. B. Chen, H. P. Liu and R. T. Tian, J. Chromatogr., A, 2006, 2, 27 K. Priddy and P. Keller, Articial Neural Networks: An
114–119. Introduction, Society of Photo Optical, 2005, vol. 2, pp. 205–
7 R. T. Tian, P. S. Xie and H. P. Liu, J. Chromatogr., A, 2009, 18, 234.
2150–2155. 28 H. Abdi and L. J. Williams, Wiley Interdiscip. Rev.: Comput.
8 A. Zlatkis and R. Kaiser, HPTLC: High Performance Thin-Layer Stat., 2010, 2, 433–459.
Chromatography, Elsevier, 1977, vol. 6, pp. 95–126. 29 C. Y. Wang, Z. Y. Chen, C. G. Wu and Y. C. Liang, Medicine
9 J. Tamaoka and K. Komagata, FEMS Microbiol. Lett., 1984, 25, Composition Analysis Based on PCA and SVM, Springer,
125–128. 2005, vol. 9, pp. 1226–1230.

6330 | Anal. Methods, 2013, 5, 6325–6330 This journal is ª The Royal Society of Chemistry 2013

You might also like