Comparing Texture Analysis Methods
Comparing Texture Analysis Methods
through Classification
Philippe Maillard
Abstract used for the experiments. Next comes the results and their
The development and testing of two techniques of texture analysis followed by the main conclusions.
analysis based on different mathematical tools—the semi-
variogram and the Fourier spectra—are presented. These are Background
also compared against a benchmark approach: the Gray-Level Reed and du Buf (1993) claim that most development in texture
Co-occurrence Matrix. The three methods and their implemen- has been concentrated on feature extraction methods (some-
tation are briefly described. Three series of experiments have times called channel-based methods) which seek to extract rel-
been prepared to test the performance of these methods in evant textural information and map it onto a special dedicated
various classification contexts. These contexts are simulated channel called a feature. The authors classified the various fea
by varying the number, type and visual likeness of the texture ture extraction methods as belonging to one of three possible
patches used in classification tests. More specifically, their classes: feature-based, model-based, or structural. Cocquerez
ability to correctly classify, separate, and associate texture and Philipp (1995) have used a similar classification of image
patches is assessed. Results suggest that the classification segmentation methods which they compare in varioussitua-
context has an important impact on performance rates of all tions (including textured images).
methods. The variogram-based and the Gray-Tone Depen- In feature-based methods, characteristics of texture (such as
dency Matrix methods were generally superior, each one in orientation, spatial frequency, or contrast) are used to classify
particular contexts. homogeneous regions in an image. Model-based methods rely on
the hypothesis that an underlying process governs the arrange
Introduction ment of pixels (such as Markov chains or Fractals) and try to
As scientists and researchers of the remote sensing community extract the parameters of such processes. Structural methods
began to use high spatial resolution data, it soon became clear assume that a texture can be expressed by the arrangement of
that spectral-based methods of computer classification and seg- some primitive element using a placement rule. Feature-based,
mentation were doomed to yield unsatisfactory results. At model-based, and hybrid methods have overwhelmingly domi-
high resolution, conceptual objects like forests or pasture usu- nated the scene in the last 20 years or so. One of their findings
ally show significant variations in their pixel values (Strahler was that, although so many different methods have been devel
et al., 1986). Stationary in nature, these variations can give rise oped, no rigorous quantitative comparison of their results had
to an apparently regular spatial pattern referred to as texture ever been done, which is a major theme of the present work.
(Kittler, 1983). One of the key elements that the interpreters use Because Bela Julesz (1965) has shown evidence that human
to identify and analyze images is clearly the spatial arrange- perception of texture could be modeled using second-order sta-
ment of color and tone that form natural visual entities: visual tistics (although he would later change his theory for the “tex-
texture (Haralick et al., 1973; Pratt et al., 1978). ton” approach; see Julesz (1981)) many researchers have explored
Because there is no universally accepted definition of second-order statistics as possible features for texture analysis.
visual texture, one has to choose a definition that best reflects Among the most common second-order statistics that have been
the objective or the results being sought. The definition used are the co-occurrence matrix, the spatial-autocorrelation,
adopted here was given by Pratt (1991, p. 505): natural scenes the covariogram, and the semi-variogram.
containing semi-repetitive arrangements of pixels. The prob- The frequency domain approach, also referred to as the
lem of analyzing and classifying texture has generated a wealth Fourier Spectra approach, has been a long time favorite for tex-
of studies and techniques that are seldom compared in a sys ture analysis. From the early attempts at using it as a texture
tematic way. This study is an experimental analysis of the analysis tool by Rosenfeld (1962) to the recent use of Gabor
problem of classifying texture using different mathematical functions as filters in the frequency domain to create frequency-
tools. In particular, the specific classification context is ana- and orientation-specific texture features (e.g., Fogel and Sagi,
lyzed in terms of the effect of between-class variation and num- 1989; Jain and Farrokhinia, 1991; Manjunath and Ma, 1996),
ber of classes on classification accuracy. To achieve the latter, a the Fourier transform offers infinite possibilities not only for
special experimental framework has been prepared and experi- texture analysis but for applications requiring the analysis of
mental results are presented and discussed. spatial frequencies and their orientation.
The paper is organized in six sections. A short background In order to evaluate a technique, it is necessary to have some
review of feature extraction methods for texture analysis fol- base for comparison. In this research, the comparison will take
lows the introduction. Then the three approaches are described
individually and compared through sample data. The fourth
section describes the experimental framework and the data
Photogrammetric Engineering & Remote Sensing
Vol. 69, No. 4, April 2003, pp. 357–367.
Universidad Federal de Minas Gerais, Departamento de Carto 0099-1112/03/6904–357$3.00/0
grafiá, Av. Antônio Carlos, 6627, Belo Horizonte MG 31270- © 2003 American Society for Photogrammetry
091, Brazil ([email protected]). and Remote Sensing
(b) (c)
Figure 1. The construction of texture features for the Variogram approache to texture
classification. (a) The logarithmic scale used to average SRPD values. (b) The original
SRPD graph showing all the values. (c) The six SRPD values based on averaging the semi-
variance according to a logarithmic scale.
average value is computed for each filtered result of each direc- lick et al., 1973). Because many of the features first described by
tion for a total of 24 directional features. Figure 2 illustrates Haralick are highly correlated among themselves (Haralick et
the process of filtering; in Figure 2a the Gaussian filters are al., 1973), a pre-selection was done to reduce the 14 possible
presented while Figure 2b shows the effect of applying the measures to less than half. The selection was done by combin-
filters on a sample transform of a forest image.
• The 24 directional features are then transformed to 18 rotation-
ing all the features used by many different research teams that
invariant features: the mean, standard deviation and sum of have used the GLCM method. Table 1 gives a listing of the
perpendicular ratios are computed for each frequency band. authors considered and the texture features they have used.
Analyzing the table revealed that the most commonly used fea-
tures are in decreasing order of popularity: Angular Second
The Gray-Level Co-occurrence Matrices Moment, Entropy, Inertia (initially contrast), Correlation, and
The Gray-Level Co-occurrence Matrices (GLCM) method was Inverse Difference Moment.
implemented in a manner similar to its original form (in Hara- Apart from the texture features used, pixel pair sampling dis-
tances have to be chosen with respect to the expected spatial fre-
quencies present in the images. The choice of sampling distance
is as important as the types of measurements. In order to be as
objective as possible, the sampling distances have been chosen
based on the visual analysis of the semi-variograms of the sample
texture patches. This analysis yielded the following distances:
three, six, and twelve pixels. In their original setting, Haralick et
al. would choose a particular sampling distance and then rotate it
by steps of 45 degrees so that, for a distance of three pixels, the
x,y sampling distances setting would be (3,0), (3,3), (0,3), and
(3,3) for the 0°, 45°, 90°, and 135° orientations, respectively.
Then, for each sampling distance, the mean and standard devia-
(a) tion would be computed over the four orientations instead of
using each orientation separately. Therefore, the features are not
orientation-specific but still account for some effect of anisot-
ropy. This approach is meant to obtain rotation-invariant features
similar to those adopted for the variogram and Fourier methods.
The following steps summarize the implementation of the
GLCM method:
multi-dimensional spaces can help separate textures of various i.e., forest, residential, desert, crops, shrub, and waves (samples
origin. A graph representation was preferred for its simplicity of these classes are presented in Figure 4).
and ease of interpretation. Figure 3 illustrates the data gener- The semi-variogram (Figure 3a) displays variance as a
ated by each method as a series of graphs for six texture classes: function of lag distance. The sill reached by each texture
(a) (b)
(c)
Figure 3. Six texture classes mapped from the three different methods of texture analysis. (a) Semi-variogram. (b) Fourier
transform. (c) Five measures taken from the GLCM.
(a) Z Z Z
Data set Statistic Result Statistic Result Statistic Result
T6a ⫺44.60 S 24.21 S 20.50 S
T6b ⫺24.26 S ⫺7.40 S 31.68 S
T6c 1.96 NS ⫺11.59 S 9.64 S
T6d ⫺53.16 S 18.31 S 34.78 S
(b) T6e ⫺4.20 S 2.35 S 1.85 NS
T6f ⫺28.02 S 6.13 S 21.90 S
of which were significant at the 99 percent confidence level. As the other experiments (Table 5). This was predictable to a cer-
for the other two cases (R6 and S6), the GLCM method was supe- tain extent because, by increasing greatly the number of
rior by about 16.6 percent and about 9.4 percent, respectively. classes, the chance of misclassification was also increased.
In all cases, the VGM and the GLCM methods were significantly Both the VGM and GLCM methods produced Khat results of
better than the FFT approach by a margin of 2.5 percent to 30 about 65 percent to 67 percent. The FFT came in last with a score
percent in Khat scores. The results for no edge scores are as approximately 8 to 10 percent lower. If these results are overall
high as 100 percent in some cases. The difference between over- quite similar, the detailed analysis of their graphical counter-
all and no edge results are about 14 percent on average for the part shows that the behavior of each method can be different.
three methods which can be considered high given the sample Figure 6 shows the difference image between the classification
size. results of the three methods and what would be the ideal clas-
The first texture set for which the GLCM proved superior, sification. The most striking difference lies in the size and fre-
R6, is also characterized by relatively low Khat for all three quency of wrongly classified pixels and in the patches they
methods. One conclusion that this brings is that the R6 texture form. In the GLCM results, these patches are relatively large,
set is a poor candidate with ill-defined textures. Another obser- infrequent, and more concentrated around the edges of the tex-
vation is that the fact that the GLCM method includes a broader ture patches; hence, the Khat of 93.8 percent for the no edge
variety of measurements is possibly the reason why it scores bet- result. In the VGM classified image, these patches appear
ter whereas the other methods are more “specialized.” In the smaller on average but more frequent and sometimes give a
second texture set for which the GLCM shows superior results, speckled impression. However, the edges and borders account
S6, the situation is different but still keeps similar elements. for a significant part of errors (a Khat difference of 26.2 percent).
Although the Khat scores are higher (roughly between 65 per- The classification result generated with the FFT feature set
cent and 80 percent), a visual inspection of the individual tex- appears to suffer even more from a salt and pepper look: the
ture patches (see Figure 4) reveals that some of them are not patches are mostly small but very frequent. While about 23 per-
very homogeneous, having sometimes a dual textural charac- cent of misclassified pixels are attributable to edges and bor-
teristic (patches #3, #4, and #6 in particular), which tends to ders, another 20 percent are found within the central parts of
give more weight to cues other than the simple spacing of the texture patches. In all cases the differences between the
apparent objects on the background scene (which is the basis three methods are quite significant, as can be seen in the pair-
for the variogram approach). In this regard, the GLCM method wise test of significance of Table 6.
has a definite superiority over the other two. This suggests that The observations above suggest the following facts:
measurement type diversity can prove an important asset for
textures that are not necessarily blessed with a homogeneous ● The GLCM method scores higher than the other two methods for
visual aspect. complex classification situations,
As for the FFT method, its generally poorer performance ● The VGM method yields comparable scores but the difference
can be attributed to two different facts. The first one is inherent from the GLCM is significant,
to the approach (or its implementation) that was chosen, in- ● The fact that patches of misclassified pixels are generally larger
volving the appending of consecutive lines in a semi two- but less frequent for the GLCM method suggests that the method
dimensional approach instead of the full two-dimensional Fou- is not easily affected by small differences and is spatially more
consistent, and
rier transform. This approach might have created undesired
● The FFT feature sets are more likely to be affected by small
artifacts (for instance, the phase of the frequencies is not variations in textures than the GLCM approach.
respected in this approach). The second one is that, unlike the
real time series for which the Fourier transform was developed, Reclassification of the set of 36 textures. Table 7 presents
spatial frequencies in these texture sets are ill-defined and the Khat results obtained after reclassification of the classified
often require a complex set of sine-like waves to describe results of Table 5 (third experiment), and Figure 7 shows the
square-like shapes (as in the case of residential areas). difference image of the generic classes reclassification.
Although, in the overall classification of the set of 36 textures
Third Experiment: Separating a Mixture of Both Different and Similar Textures
In the first part of this experiment, the whole texture set of Fig-
ure 4 has been classified to assess the capacity for each method
to deal with a complex situation where both different and simi- TABLE 5. KHAT SCORES FROM THE CLASSIFICATION OF THE SET OF 36 TEXTURES.
lar texture samples are mixed. In the second part, a reclassifica- GRAY COLUMNS SHOW NO EDGE RESULTS. BEST RESULTS ARE IN BOLD
tion has been performed to assess the good association capa-
GLCM Method VGM Method FFT Method
bility by observing the nature of the errors of the first phase:
i.e.,whether the wrongly classified pixels were at least within K̂ ⫻ 100 K̂ ⫻ 100 K̂ ⫻ 100
the good generic class or not. Overall No edges Overall No edges Overall No edges
The classification of the set of 36 textures yielded results 67.2% 93.8% 65.4% 91.6% 57.1% 80.3%
that are much poorer than those which had been achieved in
TABLE 6. RESULTS OF PAIRWISE COMPARISON OF KHAT VALUES FOR OVERALL gains from 18 percent to 28 percent accuracy when not consid-
RESULTS OF TABLE 5; S ⫽ SIGNIFICANT, NS ⫽ NOT SIGNIFICANT AT THE 99% LEVEL ering exact class membership but, in these cases, it appears to
OF CONFIDENCE be due to high within-class variability marked by variations of
GLCM vs VGM VGM vs FFT FFT vs GLCM the between-trees distances. As for the Desert generic class, a
combination of high within-class variability and low contrast
Z Statistic Result Z Statistic Result Z Statistic Result with relation to the other texture patches might have combined
19.6 S 87.5 S ⫺107.2 S to increase accuracy from 14 percent to 19 percent.
Conclusions
Three methods of texture classification have been tested in this
the GLCM method scored better, it was this approach that bene- paper, two of which have received a novel implementation: the
fited the least from the reclassification into generic classes mov- semi-variogram and the Fourier spectra. Both have been imple-
ing from an all-classes Khat statistic of 67.2 percent to 72.2 mented to be computationally efficient and to relate in some
percent (a difference of 5 percent) compared with the FFT way to psychophysical evidence about human vision (Mail-
method that increased from 57.1 percent to 66.1 percent (a dif- lard, 2001). All three methods have proved to be powerful tools
ference of 9 percent) or even the VGM whose TSC score increased for texture classification, but the gray-level co-occurrence
from 65.1 percent to 71.4 percent (a difference of 6 percent). matrix has shown superior results for dealing with simple situ-
Still, the results tend to show that all three methods cannot reli- ations where the textures are visually easily separable. The
ably be expected to perform good association and that not hav- semi-variogram was, however, slightly superior for distin-
ing proper training areas for all classes can be costly in terms of guishing very similar texture patches, but more extensive test-
classification errors. One could conclude that these methods ing is needed to confirm this. In complex situations (a large
are generally better at separating than associating. number of classes), the VGM and GLCM have performed almost
It is interesting to look at which generic classes have gained equally but with generally poorer results (but better than the
the most out of reclassification because it gives an insight on FFT). Much of this poorer performance can be attributed to bor-
the factors that might affect the texture classification accuracy. ders and edges, which tends to show the importance of using a
The Residential class gains from 31 percent to 50 percent of resolution finer than the “optimal” resolution as given, for
accuracy when accepting misclassified pixels that fall into instance, by a measurement like the local variance. The good
another Residential texture class as correctly classified. This is association test proved the GLCM method slightly superior but
understandable because this generic class stands out from the showed that none of the methods tested can be expected to per-
rest by its contrast and square-like objects. The Shrub class also form good association for classes not accounted for in the train-
TABLE 7. KAPPA STATISTICS OF THE GOOD ASSOCIATION ANALYSIS THROUGH THE RECLASSIFICATION OF THE 36 TEXTURES SET. BEST RESULTS ARE IN BOLD
GLCM Method VGM Method FFT Method
K̂ ⫻ 100 K̂ ⫻ 100 K̂ ⫻ 100
Generic Class correct class correct generic class correct class correct generic class correct class correct generic class
Forest 55.0% 66.0% 48.9% 59.3% 43.2% 58.9%
Residential 65.6 96.9 61.3 93.1 37.4 87.5
Desert 51.3 70.5 52.1 66.5 40.8 54.4
Crops 72.1 75.4 71.7 75.0 63.3 72.8
Shrub 63.9 81.4 54.9 82.5 49.2 76.1
Waves 62.8 67.8 68.2 77.0 64.1 76.6
All classes 67.2 72.2 65.4 71.4 57.1 66.1
ing phase: the methods are better at separating than asso- Congalton, R.G., 1988. A comparison of sampling schemes used in
ciating. generating error matrices for assessing the accuracy of maps gener-
The project can be said to have reached three goals: (1) new ated form remotely sensed data, Photogrammetric Engineering &
Remote Sensing, 54(5):593–600.
implementations of mathematical tools for texture analysis
were realized with success, (2) insights on the behavior of these ———, 1991. A review of assessing the accuracy of classification of
tools for texture analysis has provided some understanding of remotely sensed data, Remote Sensing of Environment, 37:35–46.
the characteristics of texture that are difficult to classify, and (3) Conners, R.W., and C.A. Harlow, 1980. A theoretical comparison of
texture algorithms, IEEE Transactions on Pattern Analysis and
a special use of classification using Bayes’ theorem has proven
Machine Intelligence, PAMI-2(3):204–222.
an effective tool for testing and comparing the performance of
texture analysis methods. Cressie, N., and D.M. Hawkins, 1980. Robust estimation of the vario-
gram, Journal of the International Association for Mathematical
The proposed experimental design for testing and compar- Geology, 12:115–125.
ing the results proved most valuable because it clearly showed
Davis, L.S., S.A. Johns, and J.K. Aggarwal, 1979. Texture analysis using
the importance of context in at least two aspects: (1) number of generalized co-occurrence matrices, IEEE Transactions on Pattern
classes and generic groups and (2) spatial properties such as Analysis and Machine Intelligence, PAMI-1(3):251–259.
sample size and surroundings. Further studies will aim at test- Dikshit, O., 1996. Textural classification for ecological research using
ing these methods with other classifiers or segmentation ATM images, International Journal of Remote Sensing, 17(5):
schemes and in more realistic texture interpretation situations. 887–915.
Dunn, D., W.E. Higgins, and J. Wakely, 1994. Texture segmentation
Acknowledgments using 2-D Gabor elementary functions, IEEE Transactions on Pat-
This work is part of a doctoral thesis concluded in 2001 at the tern Analysis and Machine Intelligence, 16(2):130–149.
University of Queensland and was made possible through the Ferro, C.J.S., and T.A. Warner, 2002. Scale and texture in digital image
financial support of the Brazilian Federal Government (CAPES) classification, Photogrammetric Engineering & Remote Sensing,
and the Universidade Federal de Minas Gerais (UFMG). 68(1):51–63.
Fogel, I., and D. Sagi, 1989. Gabor filters as texture discriminator,
Biological Cybernetics, 61:103–113.
References
Foody, G.M., 1992. On the compensation for chance agreement in
Anys, H., and D.-C. He, 1995. Approche multipolarisation et texturale image classification accuracy assessment, Photogrammetric Engi-
pour la reconnaissance des cultures à l’aide de données radar neering & Remote Sensing, 58(10):1459–1460.
aéroporté, Canadian Journal of Remote Sensing, 21(2):138–157.
Franklin, S.E., and D.R. Peddle, 1989. Spectral texture for improved
Atkinson, P.M., 1995. Regularizing variograms of airborne MSS imag- class discrimination in complex terrain, International Journal of
ery, Canadian Journal of Remote Sensing, 21(3):225–233. Remote Sensing, 10(8):1437–1443.
Bonn, F., and G. Rochon, 1992. Précis de Télédétection: Volume 1, Franklin, S.E., A.J. Maudie, and M.B. Lavigne, 2001. Using co-occur-
Principes et Méthodes, Presses de l’Université du Québec, Qué- rence texture to increase forest structure and species composition
bec, Canada, 485 p. classification accuracy, Photogrammetric Engineering & Remote
Caelli, T., 1982. On discriminating visual textures and images, Percep- Sensing, 67(7):849–855.
tion & Psychophysics, 31(2):149–159. Gonzalez, R.C., and R.C. Woods, 1992. Digital Image Processing, Addi-
Cao, C., and N. Lam, 1997. Understanding the scale and resolution son-Wesley Publishing Company, Reading, Massachusetts, 716 p.
effects in remote sensing and GIS, Scale in Remote Sensing and Goresnic, C., and R.S. Rotman, 1992. Texture classification using the
GIS (D.A. Quattrochi and M.F. Goodchild, editors), CRC Press, cortex transform, CVGIP: Graphical Models and Image Proc-
Lewis Publishers, Boca Raton, Florida, pp. 57–72. essing, 54(4):329–339.
Clark, W.A.V., and P.L. Hosking, 1986. Statistical Methods for Geogra- Haralick, R.M., 1979, Statistical and structural approaches to texture,
phers, John Wiley and Sons, New York, N.Y., 518 p. Proceeding of the IEEE Transactions Systems, Man and Cybernet-
Cocquerez, J.P., and S. Philipp (editors), 1995. Analyse d’Images: fil- ics, 67:786–804.
trage et Segmentation, Masson, Paris, France, 457 p. Haralick, R.M., K. Shanmugan, and I. Dinstein, 1973. Texture feature
Cohen, J., 1960. A coefficient of agreement for nominal scales, Educa- for image classification, IEEE Transactions Systems, Man and
tional and Psychological Measurements, 20(1):37–40. Cybernetics, SMC-3:610–621.