0% found this document useful (0 votes)
62 views5 pages

An SVM Based Scoring Evaluation System For Fluorescence Microscopic Image Classification-Note PDF

Uploaded by

林昌樺
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views5 pages

An SVM Based Scoring Evaluation System For Fluorescence Microscopic Image Classification-Note PDF

Uploaded by

林昌樺
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

An SVM Based Scoring Evaluation System for

Fluorescence Microscopic Image Classification


Dongyun Lin∗ , Zhiping Lin∗§ , Shakeela Sothiharan† , Lei Lei† , Jingbo Zhang‡
∗ School
of Electrical & Electronic Engineering, Nanyang Technological University, Singapore
† Water Optics Technology Pte.Ltd, Singapore
‡ AEBC, Nanyang Environment & Water Research Institute, Nanyang Technological University, Singapore
§ E-mail: [email protected]

Abstract—This paper proposes a scoring evaluation system to measure is applied in the area of human face matching [11–
the fluorescence microscopic image classification based on the 13], text string similarity measures [14], etc.
support vector machine (SVM). We define the similarity scores In this paper, we use the scale-invariant feature transform
for each testing sample based on its relative distance to the SVM
separating hyperplanes and the training clustering centers in (SIFT) feature to train an SVM and evaluate the similarity
the feature space. The method proposed calculates similarity scores of the testing sample based on the relative L2 norm
scores through a two-stage process that converts the SVM’s distances between the training and testing patterns as well as
classification results to a quantitative description. The scores can the separating hyperplanes in the feature space. The results
precisely reflect how similar a testing sample to all the categories of the experiments on the Chinese Hamster Vocary (CHO)
and provide a reference to further investigation of fluorescence
microscopic images. dataset [2] show the improved performance of our system
Keywords—similarity score; support vector machine (SVM); comparing with the traditional image similarity evaluation
scale-invariant feature transform (SIFT); nearest neighbor methods based on Pearson correlation coefficient [15] and the
nearest neighbor technique [16].
I. I NTRODUCTION
II. M ETHOD D ESCRIPTION
It is critical to determine the location of a protein in a
subcellular level which reveals its specific function, sequence A. Feature Selection
and structure. The common method to acquire the subcellular To enhance the discriminative performance, we represent
location information is visually evaluating the fluorescence the image information by Lowe’s SIFT feature with patchsize
microscopic images which can be obtained by applying the 16×16 which is proved invariant to the scale, illumination
monoclonal antibodies against the relative endogenous protein intensity and rotation of images [17]. The popular bag-of-
[1]. The identification of microscopic images through human words model with the spatial pyramid match kernel (SPM)
visual interpretation lacks efficiency and accuracy. Moreover, is applied to incorporate the spatial geometry correspondence
even experts can hardly give an accurate numerical description information. It subdivides the image into uniform grids and
of a sample image about its similarity scores to all the calculate the SIFT features in the subimages [18]. In our
categories. application, we manually choose the level number of the
In the literature, the majority of the methods about sub- spatial pyramid as 3 and build up 1000 “visual vocabularies”
cellular fluorescence microscopic image classification apply for the bag-of-words model using the K-means clustering
hard decision classifiers (neural networks [1, 2] or the support algorithm. Based on this codebook, we calculate a histogram
vector machine [3, 4]) trained by selected features like Zernike with respect to all these “visual vocabularies” in each grid.
moments combined with Haralick texture features [1, 2], local The final feature representation is formed by cancatenating
binary patterns descriptors [3], etc. These methods are able the histograms into a high dimensional vector. This feature s-
to achieve high accuracy classification performance. However, election procedure converts the pixel information into the bag-
the hard decision classifiers provide less information about of-words histogram representation which is proved successful
how to evaluate the possible misclassification cases or apply in the natural scene/object identification problems [18–20].
further investigations after classification.
A similarity score is a measure of correspondence between B. The SVM Based Scoring Evaluation
two images. If the similarity measure is maximal, the two On the classification stage, we train the support vector
images are considered to be mostly correlated. Various sim- machine (SVM) with a linear kernel and test new images based
ilarity measures have been formulated throughout the years on the feature selected in Section II-A. The support vector
[5–10]. In recent years, similarity measures combined with machine is a supervised statistical learning algorithm that is
hard decision classifiers become increasingly popular because able to achieve good performance in many modern pattern
of their extensive generalization ability which can be used to classification problems [21–23].
describe the similarity between a specific sample and a general The basic idea of SVM for binary classification is finding
category. The support vector machine (SVM) based similarity an optimum hyperplane to separate the training samples in

978-1-4799-8058-1/15/$31.00 ©2015 IEEE. 543

Authorized licensed use limited to: National Taipei Univ. of Technology. Downloaded on March 12,2023 at 07:35:38 UTC from IEEE Xplore. Restrictions apply.
the corresponding feature space [24]. The hyperplane can be However, Dh only reflects the discriminative scores based
represented by on the separating hyperplane which is determined by the
w·x+b=0 (1) geometry distribution of the support vectors. An outlier which
is far from the training samples may still have a short distance
where that parameter w is normal to the hyperplane, b is the
to the hyperplane. In this situation, we should not assign a
bias term and x represents the training data.
high score to this image. Therefore the basic score is not
Assume the training data are denoted as {(xi , yi )}N i=1 , sufficient to represent the similarity as other training samples
where N is the number of training sample, xi is the feature
which are not the support vectors should also play a role in
representation for the ith training sample, yi ∈ {−1, +1} is
the determination of the final score. Thus we also measure
the labels.
the L2 norm Dck between the testing feature to the kth class
The optimization criterion of SVM is maximizing the mar-
training center which can be simply calculated as the mean
gin 2/||w|| with constraints
of the training samples. Clearly, a smaller Dck means that the
yi (w · xi + b) ≥ 1 i = 1, 2, 3, . . . , N (2) testing sample stays nearer to the kth training set in the feature
space. The whole geometric relationship is shown by Fig.1.
where the equality is satisfied when xi has the minimum
perpendicular distance to the separation hyperplane. These
training samples are called support vectors.
The optimization problem can be solved by convex quadrat-
ic programming for the optimum hyperplane parameter w.

N
w= αi yi xi (3)
i=1

where the {αi }Ni=1 are Lagrangian multipliers of the training


samples which are non-zero values for support vectors.
In the testing phase, for a new testing data sample xt , the
decision function of a linear kernel SVM is

N
ypredict = sgn[ αi yi (xi · xt ) + b] (4)
i=1
Figure 1. Illustration of the relationship for the 2D binary
where sgn[·] is a sign fuction. classification case. The red squares and blue triangles are
For the microscopic image identification problem, we can two different categories of training samples, the green circle
achieve accurate classification results using a linear kernel represents the new testing sample, and the two star-shape
SVM. This linearly separable property allows us to investigate points represent the two training sample centers. Dc1 , Dc2 and
the relative perpendicular distance between a testing sample Dh represent the relative distances we need to investigate.
to the SVM hyperplane in the feature space. The distance Dh
can be calculated as
Inspired by the nearest neighbor techniques in designing
|w · xt + b| scoring systems [25], we define a second similarity score
Dh = (5)
||w|| called the prior score in (8) calculated based on Dck . The
where || · || denotes the L2 norm, | · | denotes the L1 norm, w prior score can reflect the information of the prior knowledge
and b can be determined by solving the constraint optimization about the distance between the testing sample and the training
problem. Exploiting the SVM results, we set a rule that the sample clusters.
distance Dh has the positive sign if the testing feature and Dci
the training clustering center are on the same side of the P riorScorei = 1 − i = 1, 2 (8)
Dc1 + Dc2
hyperplane while the negative sign if they are on the different
sides. Clearly, for the linearly separable case, a larger absolute The final score should be a weighted sum of the basic
value of the distance between the testing sample and the score and the prior score. The weight coefficients can be
hyperplanes shows more confidence on the predicting results. determined by tuning according to the specific application. In
From Dh , a basic score can be determined by rewarding the our scoring evaluation system, we think the SVM classifier can
positive Dh while penalizing the negative at the same time. A provide more reliable classification confidence than the nearest
sigmoid function shown in (6) is chosen to map the Dh into neighbor techniques. Therefore we assign the weights of 75%
the range of [0,1] by (7). and 25% to the basic score and prior score, respectively, shown
1 as
f (x) = (6)
1 + e−8x F inalScorei = 75% × BasicScorei + 25% × P riorScorei
BasicScore = f (Dh ) (7) (9)

544

Authorized licensed use limited to: National Taipei Univ. of Technology. Downloaded on March 12,2023 at 07:35:38 UTC from IEEE Xplore. Restrictions apply.
From a statistical point of view, the basic score can be TABLE II. Similarity Scores of Giantin Testing Images
regarded as a likelihood score in the framework of Bayesian Pattern Category Giantin Hoechst Lamp2 Nop4 Tubulin
philosophy. We apply the training clustering center informa- 0.4796 0.3247 0.1817 0.2793 0.2377
tion as the prior knowledge which can modify the likelihood 0.7945 0.2284 0.1834 0.2529 0.2327
score by (9). 0.6793 0.2477 0.1818 0.2278 0.3019
0.8036 0.2445 0.1864 0.2199 0.2535
For multiclass classification tasks, we take the 1-vs-the-rest Similarity Score
0.8996 0.2123 0.1959 0.2328 0.2100
0.9140 0.2179 0.1912 0.2271 0.2142
strategy to train the SVM to find the separation hyperplanes by 0.4569 0.3304 0.1822 0.2248 0.3261
fixing one specific category and treating the other categories 0.5640 0.2376 0.1978 0.2472 0.2285
combined as another class. This strategy reduces the K-class 0.8241 0.2507 0.1867 0.2341 0.2083
0.7899 0.2008 0.1906 0.2678 0.2641
classification task to K binary classification problems and
Average 0.7206 0.2495 0.1878 0.2414 0.2477
solve for one hyperplane for each category. Maximum 0.9140 0.3304 0.1978 0.2793 0.3261
Minimum 0.4569 0.2008 0.1817 0.2199 0.2083

III. E XPERIMENTS

A. Dataset Description TABLE III. Similarity Scores of Hoechst Testing Images

We implement our scoring system to images of Chinese Pattern Category Giantin Hoechst Lamp2 Nop4 Tubulin
Hamster Ovary (CHO) dataset. CHO is a microscopic image 0.2915 0.9178 0.1800 0.2165 0.2164
0.2663 0.9063 0.1866 0.2345 0.2132
dataset with five categories of subcellular patterns. The mon- 0.2509 0.9317 0.1811 0.2177 0.2155
oclonal antibodies used in the dataset are against the Golgi 0.2279 0.8929 0.1835 0.2490 0.2465
0.3912 0.9014 0.1839 0.2085 0.2048
protein giantin, the Hoechst, the lysosomal protein LAMP2, Similarity Score 0.2949 0.9146 0.1892 0.2346 0.1994
the yeast nucleolar protein NOP4, and tubulin (Sigma), re- 0.3957 0.8658 0.1833 0.2032 0.2170
0.2739 0.9140 0.1833 0.2297 0.2225
spectively. The VLFEAT [26] package is used for SIFT feature 0.2623 0.9324 0.1793 0.2230 0.2100
extraction and the LIBLINEAR [27] package is used for SVM 0.3778 0.8479 0.1860 0.2330 0.2122
training and testing. Average 0.3032 0.9025 0.1836 0.2250 0.2158
Maximum 0.3957 0.9324 0.1892 0.2490 0.2465
Minimum 0.2279 0.8479 0.1793 0.2032 0.1994
B. Results and Analysis
In the scoring evaluation system, we choose 20 images for
training and 10 images for testing. For each testing image, TABLE IV. Similarity Scores of Lamp2 Testing Images
we calculate five scoring values based on the linear SVM Pattern Category Giantin Hoechst Lamp2 Nop4 Tubulin
results. We apply the five-fold cross validation method to 0.1990 0.2562 0.9470 0.2034 0.3422
evaluate the accuracy of classification by randomly choosing 0.2044 0.1985 0.9628 0.2165 0.2068
80% images for training the linear SVM and leave the rest 20% 0.2160 0.1939 0.9609 0.2459 0.2042
0.2085 0.1949 0.9642 0.2360 0.1996
for testing for each category. The classification performance 0.2051 0.1955 0.9621 0.2974 0.1995
Similarity Score 0.2050 0.1943 0.9621 0.2424 0.2056
can be viewed in Table I. For each category, the average ac-
0.2019 0.1992 0.9580 0.2169 0.2329
curacy rates are 92%, 100%, 99%, 91% and 92% respectively. 0.2637 0.1964 0.9583 0.2344 0.2050
This result shows the microscopic images can be effectively 0.2192 0.1989 0.9569 0.2155 0.2113
0.2018 0.1958 0.9595 0.2476 0.2194
classified by SVM with a linear kernel which is an important
Average 0.2125 0.2024 0.9592 0.2356 0.2227
precondition for the subsequent scoring evaluation algorithm. Maximum 0.2637 0.2562 0.9642 0.2974 0.3422
Minimum 0.1990 0.1939 0.9470 0.2034 0.1995
TABLE I. Confusion Matrix For CHO Dataset
Ground Truth Output of SVM
Label TABLE V. Similarity Scores of Nop4 Testing Images
Giantin Hoechst Lamp2 Nop4 Tubulin
Giantin 92% 0% 1% 0% 2% Pattern Category Giantin Hoechst Lamp2 Nop4 Tubulin
Hoechst 7% 100% 0% 0% 0% 0.2355 0.2042 0.1934 0.7804 0.2524
0.2390 0.2001 0.1893 0.8385 0.3021
Lamp2 0% 0% 99% 0% 0% 0.2165 0.2132 0.1874 0.6869 0.4281
Nop4 0% 0% 0% 91% 7% 0.2310 0.2075 0.1861 0.5899 0.4635
0.2436 0.2046 0.1828 0.8975 0.2454
Tubulin 1% 0% 0% 9% 92% Similarity Score 0.2472 0.1985 0.1909 0.6589 0.4450
0.2113 0.2052 0.1859 0.8849 0.3006
0.2184 0.2081 0.1874 0.6252 0.3935
0.2884 0.2005 0.1954 0.7510 0.2741
The specific scores of the five categories patterns are shown 0.2290 0.2042 0.1867 0.8472 0.2797
from Table II to TableVI. In most cases, there is an unique Average 0.2360 0.2046 0.1885 0.7560 0.3384
dominant score which is consistent to SVM results that indi- Maximum 0.2884 0.2132 0.1954 0.8975 0.4635
Minimum 0.2113 0.1985 0.1828 0.5899 0.2454
cates the most similar category.

545

Authorized licensed use limited to: National Taipei Univ. of Technology. Downloaded on March 12,2023 at 07:35:38 UTC from IEEE Xplore. Restrictions apply.
Figure 2. Hoechst Scores of Our System Figure 3. Hoechst Scores of PCC Figure 4. Hoechst Scores of NN

TABLE VI. Similarity Scores of Tubulin Testing Images TABLE VII. The Average Scores For Hoechst Samples
Pattern Category Giantin Hoechst Lamp2 Nop4 Tubulin
Scoring Method Giantin Hoechst Lamp2 Nop4 Tubulin
0.2353 0.2590 0.1843 0.2387 0.7626
0.2442 0.2080 0.1861 0.2814 0.8222 Our System 0.3032 0.9025 0.1836 0.2250 0.2158
0.2281 0.2309 0.1820 0.2814 0.7778 PCC 0.7922 0.8254 0.6331 0.7045 0.7634
0.2527 0.2092 0.1845 0.3051 0.7915 NN 0.8097 0.8528 0.6606 0.7683 0.7734
0.2079 0.2084 0.1842 0.4883 0.7429
Similarity Score 0.2380 0.2309 0.1846 0.2723 0.7628
0.2106 0.2084 0.1871 0.4328 0.7905
0.2502 0.2001 0.1936 0.3713 0.8452 In Fig.3 and Fig.4, the results of the correlation coefficient
0.2360 0.2518 0.1831 0.2602 0.6991 similarity measure and the nearest neighbor technique are
0.2759 0.1982 0.1926 0.3962 0.8116
neither accurate nor convincing because even the testing
Average 0.2379 0.2205 0.1862 0.3328 0.7806
Maximum 0.2759 0.2590 0.1936 0.4883 0.8452 pattern is definitely belong to a specific category by visual
Minimum 0.2079 0.1982 0.1820 0.2387 0.6991 inspection, the similarity scores to other categories can still be
relatively high. The correlation coefficient similarity measure
We compare our scoring system with the traditional image is very sensitive to the images chosen as the reference because
similarity measuring method which is based on calculating even the two images are very similar to each other, a slight
the Pearson correlation coefficient between the testing samples illumination or posture difference will cause a big fluctuation
and all the training samples in the same feature space [15]. evaluated by the correlation coefficient measure. While the
The testing settings we apply are the same as what we use in nearest neighbor similarity measure is very sensitive to the
our evaluation system. For each testing image, we calculate geometric distribution of the chosen training samples. The
the Pearson correlation coefficient (PCC) with the reference center of each category actually do not necessarily indicate
images (training images) by (10). For one category, we choose the true clustering center.
the maximum correlation coefficient value to represent the Our SVM based scoring system makes use of the extensive
similarity score about the corresponding category. generalization property of the support vector machine which
can give a good identification performance relied on a small

D number of training images.
(xi − x̄)(yi − ȳ)
i=1
r= 1 1 (10)

D
2
2

D
2
2 IV. C ONCLUSION
{ (xi − x̄) } { (yi − ȳ) }
i=1 i=1
In this paper, we provide a scoring evaluation system based
where xi is the ith component of the feature vector of the on the support vector machine for the fluorescence microscop-
testing sample, yi is the ith component of the feature vector ic image classification. The proposed scoring evaluation algo-
of the reference sample, D is the dimension of the feature rithm incorporates the extensive generalization performance

D 
D of SVM that can reduce the number of reference images
space. x̄ = n1 xi , and ȳ = n1 yi .
i=1 i=1 compared to traditional image similarity measuring methods.
We also compare the results with the nearest neighbor (NN) The algorithm is computationally efficient that is able to
[16] similarity measure by (8) without modified by the support achieve the online evaluation for big datasets. Our work in this
vector machine. We only include the comparing results for one paper can be applied not only to the fluorescence microscopic
specific category called Hoechst in Table VII and the bar charts image classification, but also to other biomedical identification
from Fig.2 to Fig.4 by the limitation of space. For the rest four applications related to unstained grey level images that require
categories, the results are similar to the Hoechst samples. specific quantitative descriptions on the classification results.

546

Authorized licensed use limited to: National Taipei Univ. of Technology. Downloaded on March 12,2023 at 07:35:38 UTC from IEEE Xplore. Restrictions apply.
R EFERENCES stage cross correlation approach to template matching,”
Pattern Analysis and Machine Intelligence, IEEE Trans-
[1] M. V. Boland and R. F. Murphy, “A neural network actions on, no. 3, pp. 374–378, 1984.
classifier capable of recognizing the patterns of all major [16] A. K. Jain, M. N. Murty, and P. J. Flynn, “Data cluster-
subcellular structures in fluorescence microscope images ing: a review,” ACM computing surveys (CSUR), vol. 31,
of hela cells,” Bioinformatics, vol. 17, no. 12, pp. 1213– no. 3, pp. 264–323, 1999.
1223, 2001. [17] D. G. Lowe, “Distinctive image features from scale-
[2] M. V. Boland, M. K. Markey, R. F. Murphy et al., invariant keypoints,” International journal of computer
“Automated recognition of patterns characteristic of sub- vision, vol. 60, no. 2, pp. 91–110, 2004.
cellular structures in fluorescence microscopy images,” [18] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags
Cytometry, vol. 33, no. 3, pp. 366–375, 1998. of features: Spatial pyramid matching for recognizing
[3] L. Nanni and A. Lumini, “A reliable method for cell natural scene categories,” in Computer Vision and Pattern
phenotype image classification,” Artificial intelligence in Recognition, 2006 IEEE Computer Society Conference
medicine, vol. 43, no. 2, pp. 87–97, 2008. on, vol. 2. IEEE, 2006, pp. 2169–2178.
[4] S. Hua and Z. Sun, “Support vector machine approach [19] J. Sivic and A. Zisserman, “Video google: A text retrieval
for protein subcellular localization prediction,” Bioinfor- approach to object matching in videos,” in Computer
matics, vol. 17, no. 8, pp. 721–728, 2001. Vision, 2003. Proceedings. Ninth IEEE International
[5] C. Kuglin, “The phase correlation image alignment Conference on. IEEE, 2003, pp. 1470–1477.
method,” in Proc. Int. Conf. Cybernetics and Society, [20] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong,
Sept. 1975, 1975, pp. 163–165. “Locality-constrained linear coding for image classifi-
[6] C. Spearman, “The proof and measurement of association cation,” in Computer Vision and Pattern Recognition
between two things,” The American journal of psychol- (CVPR), 2010 IEEE Conference on. IEEE, 2010, pp.
ogy, vol. 15, no. 1, pp. 72–101, 1904. 3360–3367.
[7] L. Shapiro and G. Stockman, “Computer vision. chap. [21] C. Schuldt, I. Laptev, and B. Caputo, “Recognizing
12,” p. 219, 2001. human actions: a local svm approach,” in Pattern Recog-
[8] A. Venot and V. Leclerc, “Automated correction of pa- nition, 2004. ICPR 2004. Proceedings of the 17th Inter-
tient motion and gray values prior to subtraction in digi- national Conference on, vol. 3. IEEE, 2004, pp. 32–36.
tized angiography,” Medical Imaging, IEEE Transactions [22] H. Zhang, A. C. Berg, M. Maire, and J. Malik, “Svm-knn:
on, vol. 3, no. 4, pp. 179–186, 1984. Discriminative nearest neighbor classification for visual
[9] A. Venot, J.-F. Lebruchec, J.-L. Golmard, and J.-C. category recognition,” in Computer Vision and Pattern
Roucayrol, “An automated method for the normalization Recognition, 2006 IEEE Computer Society Conference
of scintigraphic images.” Journal of nuclear medicine: on, vol. 2. IEEE, 2006, pp. 2126–2136.
official publication, Society of Nuclear Medicine, vol. 24, [23] W. M. Campbell, D. E. Sturim, D. A. Reynolds, and
no. 6, pp. 529–531, 1983. A. Solomonoff, “Svm based speaker verification using a
[10] A. Venot, J. Devaux, M. Herbin, J. Lebruchec, L. Du- gmm supervector kernel and nap variability compensa-
bertret, Y. Raulo, and J. Roucayrol, “An automated tion,” in Acoustics, Speech and Signal Processing, 2006.
system for the registration and comparison of photo- ICASSP 2006 Proceedings. 2006 IEEE International
graphic images in medicine,” Medical Imaging, IEEE Conference on, vol. 1. IEEE, 2006, pp. I–I.
Transactions on, vol. 7, no. 4, pp. 298–303, 1988. [24] C. J. Burges, “A tutorial on support vector machines
[11] L. Wolf, T. Hassner, and Y. Taigman, “The one-shot for pattern recognition,” Data mining and knowledge
similarity kernel,” in Computer Vision, 2009 IEEE 12th discovery, vol. 2, no. 2, pp. 121–167, 1998.
International Conference on. IEEE, 2009, pp. 897–902. [25] N. Liu, Z. Lin, J. Cao, Z. Koh, T. Zhang, G.-B. Huang,
[12] L. Wolf and N. Levy, “The svm-minus similarity score W. Ser, and M. E. H. Ong, “An intelligent scoring system
for video face recognition,” in Computer Vision and and its application to cardiac arrest prediction,” Informa-
Pattern Recognition (CVPR), 2013 IEEE Conference on. tion Technology in Biomedicine, IEEE Transactions on,
IEEE, 2013, pp. 3523–3530. vol. 16, no. 6, pp. 1324–1331, 2012.
[13] L. Wolf, T. Hassner, and I. Maoz, “Face recognition in [26] A. Vedaldi and B. Fulkerson, “VLFeat: An open and
unconstrained videos with matched background similari- portable library of computer vision algorithms,” http:
ty,” in Computer Vision and Pattern Recognition (CVPR), //www.vlfeat.org/, 2008.
2011 IEEE Conference on. IEEE, 2011, pp. 529–534. [27] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and
[14] M. Bilenko and R. J. Mooney, “Adaptive duplicate de- C.-J. Lin, “LIBLINEAR: A library for large linear classi-
tection using learnable string similarity measures,” in fication,” Journal of Machine Learning Research, vol. 9,
Proceedings of the ninth ACM SIGKDD international pp. 1871–1874, 2008.
conference on Knowledge discovery and data mining.
ACM, 2003, pp. 39–48.
[15] A. Goshtasby, S. H. Gage, and J. F. Bartholic, “A two-

547

Authorized licensed use limited to: National Taipei Univ. of Technology. Downloaded on March 12,2023 at 07:35:38 UTC from IEEE Xplore. Restrictions apply.

You might also like