Land-Use Classification From Digital Orthophotos and Landsat TM Data
Land-Use Classification From Digital Orthophotos and Landsat TM Data
Land-Use Classification From Digital Orthophotos and Landsat TM Data
Dept. of Geography, Kansas State University, Manhattan, Kansas, KS 66506, USA Tel: (785)532-6727, Fax: (785)532-7310, E-mail: jografer@ksu.edu
Dept. of Soil, Crop & Atmospheric Sciences, Cornell University, Ithaca, NY 14853, USA Tel: (607)255-2179, Fax: (607)255-1132, E-mail: jw67@cornell.edu
ABSTRACT Digital orthophotos (DO), designed for the requirement of high resolution use, have recently been available for many areas in the U.S. They are being used as a critical framework layer for creating and referencing other geospatial data. Satellite remotely sensed data have been widely used for deriving land-use (LU) maps. This study compared LU maps identified from DO (LU/DO) and from Landsat Thematic Mapper (TM) data (LU/TM) and assessed the accuracy of two LU maps. Two 7.5-minute U.S. Geological Survey quadrangles (quad) DO data in Kansas, USA and a certified LU/TM were collected. Digital classification of DO data was unable to distinguish LU categories because of the wide-spread gray levels of DO within the same LU category and severe overlap of gray levels among different categories. Then, on-screen interpretation and digitization were conducted to delineate LU types. Ground checking of LU/DO proved its high accuracy. Manual digitization of one quarter quad DO on a SUN Spark 20 workstation took four to eight hours, depending on parcel size and complexities of land-use. By comparing LU/DO with LU/TM, we found that croplands and grasslands were severely mis-classified on LU/TM. The total error and Kappa for the first quad were 54.8% and 25.0%, and the total error and Kappa for the second quad were 68.7% and 54.1%, respectively. The results proved the quality of the DO map. Our results support the use of DO for LU determination. They alerted us to be cautious when using an existing geographic data base. KEY WORDS Digital Orthophotos, Land-use Classification, Landsat TM Image, Error Matrix, Kappa
1. Introduction Digital orthophotos (DO) are computer-compatible representations of aerial photographs with displacements and distortions due to terrain relief, atmospheric conditions, and camera systems removed. They are designed to meet the users increasing demand for high resolution and rectified digital images. By the year 2001, DO will cover all the conterminous U.S. and will be updated on 10-year cycle for most areas and a 5-year cycle in areas where land-use (LU) change is more rapid (Natural Resources Conservation Service (NRCS), 1995). DO are being used to develop and /or revise vector files of soil, land-use/land cover, transportation, cadastral information, and other natural resource mapping, analysis, and planing. They are also used as a critical framework layer for creating and referencing other geospatial data in the U.S. In Sweden, DO are used to create 1:20,000 economic maps, 1:50,000 topographic maps, 1:100,000 road maps, and 1:250,000 general maps (Johansson et al., 1995). DO offer cost-effective GIS maintenance applications (Thiel, 1995). Nowadays, the use of digital orthophotography has become one of the fastest growing geo-technologies (NRCS, 1995). Satellite remotely sensed data have been widely used as routine approach to create LU maps (Anderson et al., 1976). However, traditional users
of satellite data find that the high resolution of aircraft images are very desirable (TeSelle et al., 1994). In reviewing the progress of Geographic Information Systems (GIS), Merchant and Ripple (1996) stress that the topics of error in GIS, data quality and accuracy have been remained important concerns for the decade despite the remarkable recent technological advances and accomplishments. Accurate specification of entities (classification) and high locational precision in creating geographic information layers are always desired and important (Aronoff, 1989). However, people have to balance the tradeoff between the gain (accuracy) and the loss (cost). This study was initiated from a project, where we were modeling the pesticide runoff from cropland at a watershed level. A LU layer that accurately identified LU categories, especially cropland was critical to develop a quality runoff model of pesticide. Although a certified LU classified from Landsat TM data (LU/TM) was available in our working area, its accuracy was not examined independently. By using DO to interpret LU categories (LU/DO) and using the available LU/TM maps, we compared these two LU maps and obtained some quantitative accuracy parameters. Also, man power used to generate LU/DO map was reported. 2. Materials and Methods
Two 7.5-minute (7.5') USGS quadrangles (quads) were randomly selected from the available quads in early 1996 within the state of Kansas, USA. One was the Keats quad in Riley County. The county is in the northeastern part of Kansas. It has a total land area of about 158,164 ha. About 35% is used for rangeland and 28% for crops. The county is located in the Flint Hill, where there are limestone outcrops and the land is well suited to grassland for range (Jantz et al., 1975). The other was the Lyndon NW Quad in Osage County. The county is in the east-central part of the state. It has a total area of 186,405 ha. About 49% of the acreage was used for cultivated crops, 9% for tame grass pasture, and 34% was rangeland or native hay land (Dickey and Penner, 1985). The DO source imagery was from National Aerial Photography Program, black-and-white photos at a scale of 1:40,000. It was geo-referenced with 1.0m ground sampling distance and met the National Map Accuracy Standards with a shift of <10m. It was stored in 256 gray levels and projected in Universal Transverse Mercator coordinates, North American Datum of 1983. DO data were archived as part of the National Spatial Data Infrastructure framework data base (NRCS, 1995). The DO data for Keats Quad were provided by NRCS and the data for Lyndon NW Quad were from the Data Access and Support Center (DASC), Kansas Geological Survey. They were stored in a band intercepted by line format with approximately 48-50 MG bytes per one quarter. Erdas Imagine (Erdas Inc., 1994) and ARC/INFO (ESRI Inc., 1993), run in a SUN SPARC 20 Workstation, were used in this study. After importing DO into Erdas Imagine system, we displayed the image on the screen, inspected the gray levels of LU categories, visually identified ground objects, and manually delineated boundaries on computer-screen for each LU type. While identifying the ground objects, we used photo features such as shape, tone, texture, shadow, pattern, association, size and site (Campbell, 1987). Then, a raster LU map was generated with one meter resolution and five categories: Cropland, Grassland, Woodland, Water, and Others. The raster file was converted into vector for later use. The LU/TM was provided by DASC. Using a minimum-distance-to-means classification algorithm, single-date TM imagery was classified into the five categories: Cropland, Grassland, Woodland, Water, and Others (Martinko et al., 1990). The TM data used for classification of Keats quad and Lyndon NW quad were acquired on September 24, 1990 and June 7, 1988, respectively. LU/TM was certified and distributed in vector format (DASC,
1999). After displaying LU/TM over DO for the Lyndon NW quad, we found there were geometric shifts between them. Based on 18 points of ground objects that could be clearly identified on both DO and LU/TM, we adjusted the LU/TM coordinates of Lyndon NW quad with -30 m in x-direction and +180 m in y-direction. Then, overlay operations were performed with Erdas Imagine (Erdas Inc., 1994) and ARC/INFO (ESRI Inc., 1993). 3. Results and Discussions 3.1. Statistics of DO Data The histogram of cropland showed that its values were widely spread throughout 255 gray levels and appeared to have two peaks (Fig. 1). After checking individual fields, we noticed that the cropland with well growing crops appeared in low gray values, while for these croplands with crop being harvested and bared soils, especially for those with coarser texture and low water contents, they could have very high gray values, such as with a mean of 230 in a field. The histogram of grassland showed uni-mode with a mean of 157, which was very close to the mean, 159, of the cropland (Table 1). The gray values of grassland were Riley county, less spread-out with an Inter-Quartile Range (IQR) of 25, while IQR of cropland was 62. Water and woodland both
3.0 2.5 2.0 Water Wood Grass Crop
64
192
224
256
Table 1 Mean, 1st, 3rd quartile, and Interquartile Range (IQR) of Land-use categories from Digital Orthophotos of Southeastern quarter quadrangle of Keats, Riley County,
Crop Grass Wood Water
Mean
st
159
157
112
108
1 Quartile
rd
127
145
94
94
3 Quartile
189
170
128
122
Crop IQR 62
Grass 25
Wood 34
Water 28
Kansas
presented a uni-mode. Although the water had the lowest mean (108) among the four land cover types, it was very close to the mean of the woodland (112). From their histograms and relevant statistics, it was clear that the gray values among LU categories severely overlapped. Other quarter quads studied had the similar statistical features among the LU categories. These results indicated that by only using the gray values of DO, it was hard to distinguish LU types with any supervised or unsupervised automatic classification schemes. Fig. 1 Gray value distributions (%) of water body, woodland, grassland, and cropland on digital orthophotos, southeastern quarter quadrangle of Keats, Kansas 3.2. On-screening Interpretation and Field Verification Because of the large amount of data, the computation is quite intensive for zooming in or out, and moving around the screen. Each SUN Workstation was assigned for a single user. The interpretation was fairly straightward, similar to the regular aerial photos. After a modest amount of training, the interpreter could conduct immediate feature recognition. From the eight quarter quads interpreted in this study, we estimated that delineating LU categories of one quarter Quad could take 4 to 8 hours, depending on the complexity of LU. The experience of the interpreter could also affect the speed. Currently, DO were free to use for all state agencies. Therefore, the cost of using DO to generate LU map was only from the man power and relevant facilities used in interpretation. After finishing the on-screen interpretation, we conducted field verification for Keats quad by selecting several traverses. Totally, 27 map units were verified. The verification was focused on those map units which had some uncertainties and one land parcel where we were unable to assign a LU category. Among the field verified map units, the interpretation of 26 map units were correct. The one map unit where its LU category was not assigned in the previous step was found to be a newly-cleaned woodland used as a vehicle training site. In general, the identity recognition of ground objects was of high certainty and quality. For the locational accuracy, we did not make any field examination.
3.3. Cropland Interpreted from DO and TM Data Since cropland was of the most concern in our project of the pesticide runoff modeling, we selected all croplands from all eight quarter quads of DO. By overlaying these cropland with LU/TM, we obtained the hectares of various LU categories on LU/TM and their corresponding percentages (Table 2). For the four quarter quads of Keats, more than 80% were correctly classified as the cropland while the rest were mis-classified as grass, etc. For the four quarter quads of Lyndon NW, the areas ranged from 67.4% to 98.9% were correctly designated as the cropland, while about 30% the area in the SE or SW quarter quad was mis-classified as grassland. The total hectares of cropland delineated from DO and classified from TM data were listed in Table 3. For all four quarter quads of Keats, the cropland on LU/TM were larger, ranged from 1.7 times in the SE quarter to 27.4 times in the SW quarter. That is, the acreage of cropland on LU/TM were severely exaggerated. In the Keats area, topography was hilly and there were many outcrops of limestone, which were hazards for farm machinery. Most lands, in fact, were remained in rangeland. By comparing the two LU maps, we found that large areas of grasslands over the hilly areas were mis-classified as cropland on LU/TM. On the lower river valley areas where soils were deep, almost all land was actually cropped except for the residential, roads, or narrow tree strips. Few areas were used as grassland. On LU/TM, however, some of the area was classified as grasslands, which was obviously mis-classification. For the SE quarter quad of Lyndon NW, two LU maps had almost identical acreage of croplands (Table 3). By overlaying these two LU maps, we found the discrepancies. North of a large water body (Pomona Lake) (Dickey and Penner, 1985), almost all areas were classified as cropland on LU/TM. However, about 50% of them were cropland and the rest were grassland on LU/DO. South of the lake, many cropped fields on LU/DO were classified as grassland on LU/TM, which indicated that severe mis-classifications did occur. For the SW quarter, the hectares of cropland on LU/DO were about 10% larger than that on LU/TM. For the NE quarter, the ratio of total cropland on LU/TM over thaton LU/DO was 2.6. All these numbers showed that severe mis-classification existed on the LU/TM map. 3.4. Error Matrices and Estimations To further quantitatively estimate errors (Congalton, 1991), we analyzed two SE quarter quads, built the classification matrix (Table 4), and calculated
producers and consumers error (Table 5). For the SE quarter of Keats quad, 86% of grassland, 41% of wood land, and 44% of Others were mis-classified as cropland, and about 17% cropland
was classified as grassland on LU/TM. Excluding the LU category of Others, the total error on the quarter quad was 54.8%, and the Kappa was 25.0%. The matching
Keats Quad
Lyndon NW Quad
NE
NW
SE
SW
NE
NW SE SW
---------------------------------------------------------------(ha)------------------------------------------------------------
Crop
232.9
402.9
1240.9
94.5
1401.4
1707.5
951.7
965.0
Grass
11.9
16.8
260.3
8.6
10.1
308.0
458.0
406.2
Wood
6.1
8.1
16.1
1.6
5.2
7.0
1.3
3.1
Others
1.6
2.1
7.7
3.9
0.6
1.9
0.8
1.1
Total
252.5
429.9
1525.0
108.6
1417.3
2024.4
1411.8
1375.4
---------------------------------------------------------------(%)------------------------------------------------------------
Crop
92.2
93.7
81.3
87.0
98.9
84.3
67.4
70.2
Grass
4.7
3.9
17.1
7.9
0.7
15.2
32.4
29.5
Wood
2.4
1.9
1.1
1.5
0.4
0.3
0.1
0.2
Others
0.7
0.5
0.5
3.6
0.0
0.1
0.0
0.1
Table 2 Cropland Areas Identified from Digital Orthophotos and Their corresponding Areas Showed on Land-use Map Classified from Landsat Thematic Mapper Data
Keats Quad
Lyndon NW Quad
NE
NW
SE
SW
NE
NW SE SW
DO
252.5
429.9
1525.0
108.6
1417.3
2024.3
1411.8
1375.4
TM
3206.8
3134.8
2525.3
2970.4
3637.2
2578.6
1425.1
1163.3
TM/DO
12.7
7.3
1.7
27.4
2.6
1.3
1.0
0.9
H.L.Seyler et al./Landuse Classification from Digital Orthophotos and Landsat TM Data Table 3 Hectares and Ratios of Cropland Classified from Digital Orthophotos (DO) and from Landsat Thematic Mapper (TM) data
Wood
Water
Wood
Water
Table 4 Classification Matrix (%) of Digital Orthophotos (column) and Landsat Thematic Mapper data (row), SE Quarter Quad, Keats, Riley County and SE Quarter Quad, Lyndon NW, Osage County, Kansas
rates between the two LU maps were extremely low for grassland. For the SE quarter of Lyndon NW, 22% of grassland, 27% of wood land, and 38% of Others were classified as cropland, and about 33% of cropland was classified as grassland on LU/TM. When excluding the Others category, the total error of the quarter quad was 68.7%, and the Kappa was 54.1%.
Table 5 Producers Errors (P-Error) and Consumers Errors (C-Error) of SE Quarter Quad of Keats, Riley County and SE Quarter Quad of Lyndon NW, Osage County
SE of Keats SE of Lyndon NW
(n = 15) SE of Keats SE of Lyndon NW Mean P-Error C-Errors P-Error C-Errors STD Crop 81.5 48.9 67.4 66.6 9.4 -14.7 -11.6 2.3
(n = 5)
4.8
-7.6
4.1
6.2
Grass
6.4
11.1
74.7
55.3
Table 6 The Mean and Standard Deviation (STD) of Geometric Shifts of X, and Y coordinates between Land-use Maps Delineated fromDigital Orthophotos and Classified from Landsat Thematic Mapper Data
Wood
52.9
80.2
42.7
95.7
Water
81.1
64.2
95.9
81.7
Others
20.2
92.7
3.3
90.4
3.5. Geometric Errors By displaying LU/TM on top of DO, we were able to identify geometric coordinate differences. Since LU/TM was a vector file and only boundaries between categories were preserved, we selected some ground features that could be accurately located on both DO and LU/TM. The coordinates
of fifteen points for the SE quarter of Keats and 5 points for the SE quarter of Lyndon NW were obtained and their mean and standard deviation were calculated (Table 6). For the SE quarter quad of the Keats, the geometric shift had a mean of -11.6 m in x-direction and 2.3 m in ydirection. For the SE quarter of Lyndon NW, after moving -30 meters in x-direction and +180 meters in y-direction, the geometric differences between the two maps were small. If DO met the National Map Accuracy Standards (TeSelle et al., 1994), the geometric error of Lyndon NW quad of LU/TM was very much exceeded the normal range (0.5 pixel of TM data). Possibly, it was caused by an operators mistake or by the difficulty of selecting quality ground control points in rural areas when the geometric corrections of TM images were performed. 3.6. Error Source Analyses Although we should not state that LU/DO were 100% correct, the results above showed that there were severe errors on the LU/TM map. Many factors could be the possible sources of errors . First, image resolutions differed greatly. DO were 1 m1 m while TM data were 30 m30 m per pixel. One TM pixel covered 900 DO points. Many ground objects, not recognizable on TM image, could be easily delineated on DO, such as highway, dirt roads, streams, small ponds, grass-water-ways constructed for soil conservation, narrow bands of trees, and some small area of cropland, etc. Second, mapping procedures and standards were different. After TM data were classified, single isolated pixels for classes of crop, grass, and woodlands, short rows or columns of pixels jutting out anomalously from large homogeneous areas, and polygons did not meet the minimum mapping unit requirements (woodland, 1 acre; cropland and grassland, 5 acres; other categories, 3 acres) were all removed (Whistler et al., 1995). However, we did not make any generalization on LU/DO after on-screen digitization. Third, acquisition instruments and mechanisms used for obtaining DO and TM data were different and therefore geometric distortion between two data sources varied. As we discussed before, geometric shifts existed between two LU layers. For the digital overlay analysis of two coverages, any local (positional) mismatch would be treated as the mis-classifications, although actual identity recognition was correct. Fourth, TM data had seven spectral bands, the recorded spectral reflectance for each pixel was mixed from a 30 m30 m area. It was very possible that cropped land, with average crop growing condition, had a closed spectral value with
the grassland where grass grew well, or had a closed value with woodland. This mixed spectral reflectance might cause some error. Last, and maybe the most important factor, LU/DO was manually delineated and LU/TM was automated by computer. From the histograms, means, and IQR of LU categories, we knew that it was extremely hard to distinguish them by only using gray values of DO. For the manual delineation, in addition to gray levels of an object, the interpreter used personnel experience to determine the category of an object based on other image features, such as tone, shadow, shape, texture, location, and surrounding objects (Campbell, 1987). In this study, we used the possible image features above, incorporated with the crop growing calender, and other ancillary information such as a countys soil survey report to help identify ground objects. For the computer automated mapping, to this extent, only spectral information was used (Whistler, et al, 1995). There were many reports that Landsat TM data could be reliably used for land use/cover classification and it has become a routine approach used to derive land-use/cover maps (Anderson et al., 1976). However, many other factors, such as the operators knowledge and experience and acquisition date of the TM data, all could be correlated to the job quality (Whistler et al., 1995). A LU category of an object was assigned by an operator. If the operator was familiar with the study area, experienced in classification scheme, and knew vegetable growing calender, he or she might have better results than others. The acquisition date of imagery was also very important. Whistler et al. (1995) noted that for the most part, poor automated classifications were the result of a sub-optimum date of acquisition for the imagery. Early to mid-spring dates were preferred over summer or early fall dates because this is when maximum discrimination between grassland and cropland occurs. The Landsat TM data used to derive LU/TM of Keats were acquired on September 24, 1990, which was thought as poor for LU classification in this area. This date was close to the date when DO data were acquired (October 17, 1991). The severely overlapped gray level values of DO data among LU categories, to some extent, might explain why there was such a high mis-classification among different LU types. LU/TM of Lyndon NW quad was classified from TM data acquired on June 8, 1988. The match rate of two LU maps was higher than that of Keats, which could be attributed to the better image-acquisition time. Furthermore, only single date imagery was used for the classification, which might be another factors that caused large errors. Martinko et al.,
(1990) reported that multi temporal imagery was critical in accurately classifying LU. The data from multi-temporal dates were desired in order to reach a high accuracy (Whistler et al., 1995). But it was sometimes difficult to find out good quality images (less or free of clouds within sensed area, optimal acquisition dates) for humid or sub-humid zones. In addition, ancillary information might be helpful to improve the classification of TM data (Lee et al., 1988 ). For example, Keats quad was located in the Flint Hill (Jantz et al., 1975). Cropland was distributed on level or gentle slope valley areas. Hilly areas were generally used for grassland. If we incorporated elevation data and delineated river valleys from hilly areas, the classification accuracy of TM data might be greatly increased.
7 H.L.Seyler et al./Landuse Classification from Digital Orthophotos and Landsat TM Data http://gisdasc.kgs.ukans.edu/dasc/coredata.html Dickey, H. & H. Penner, 1985, Soil Survey of Osage County, Kansas, USDA-SCS in cooperation with Kansas Agricultural Experimentation Station, Erdas Inc., 1994, Erdas Field Guide, Erdas Inc., Atlanta, Georgia ESRI Inc., 1993, ARC/INFO users manual, ESRI
4. Conclusions Accuracy issues were of central concern in all scientific disciplines. When using GIS data, users should be always aware of accuracy problems. From this study, we checked geometric shift and compared the classification accuracy of LU maps derived from TM data and DO. For this special case, the apparent failure to distinguish grassland and cropland should alert us to be very cautious when using this LU/TM maps classified from imagery with a single temporal and sub-optimum date of acquisition. There was the problem of geometric accuracy of LU/TM in the rural area. To delineate a quarter quad DO, it approximately took 4 to 8 hours. If time and budgets permitted and when the data were available and in the case that appropriate LU data did not exist, we recommended using DO to derive a LU map. References
Anderson, J. R., E. E. Hardy, J. T. Roach & R. E. Witmer,1976, A land use and land cover classification system for use with remote sensor data, U.S. Geol. Survey Prof. Paper 964. U.S. Gov. Print. Office, Washington D.C. Aronoff, S., 1989, Geographic Information Systems: A Management Perspective, WLD Publ. Ottawa, Canada Campbell, J. B., 1987, Introduction to remote sensing, The Guilford Press, New York Congalton, R., 1991, A review of assessing the accuracy of classification of remotely sensed data, Remote sens. Environ. 37:35-46 DASC, 1999, Core database catalog [Online], Kansas Geol. Survey, Lawrence, KS, Available at
Inc., Redlands, California Jantz, D. R., R.F. Harner, H.R. Rowland & D.A. Gier, 1975, Soil Survey of Riley County and Part of Geary County, Kansas, USDA-SCS in cooperation with Kansas Agricultural Experimentation Station Johansson, M., S. Miller & S. Walker, 1995, Digital orthophotography at the national land survey of Sweden, Proceedings of GIS/LIS95, Nashville, Tennessee, U.S.A., American Society for Photogrammetry and Remote Sensing and America Conference on Surveying and Mapping, pp. 522-529 Lee, K., G.B. Lee & E. J. Tyler, 1988, Thematic Mapper and digital elevation modeling of soil characteristics in Hilly Terrain, Soil Sci. Soc. Am. J., 52:1104-1107 Martinko, E.A., K.P. Price, J. Whistler & G. Robinson, 1990, Vegetation mapping of prairie environments for use in development of a GIS database, Kansas Applied Remote Sensing Program. USDA Report, Project No. 85 CRSR-2-3193 Merchant, J.M.& W. J. Ripple, 1996, Special issue: geographic information systems, Photogrammetric Engineering and Remote Sensing, 62(11):1243-1244 NRCS, 1995, National digital orthophoto program, NRCS-USDA TeSelle, G., J. Plasker, A. Mikuni & K. Wortman, 1994, A national digital orthophoto program, Proceedings of GIS/LIS94, Phoenix, Arizona, U.S.A., American Society for Photogrammetry and Remote Sensing and America Conference on Surveying and Mapping, pp. 741-751 Thiel, P.J., 1995, Not for base mapping only: digital orthophoto for GIS maintenance, Proceedings of GIS/LIS95, Nashville, Tennessee, USA, American Society for Photogrammetry and Remote Sensing and America Conference on Surveying and Mapping, pp. 977-986 Whistler, J.L., S.L. Egbert, M.E. Jakubauskas, E.A. Martinko, D.W. Baumgartner & R.Y. Lee, 1995, The Kansas state land cover mapping project: regional scale land use/land cover mapping using Landsat Thematic Mapper data. ACSM/ASPRS 95 Annual convention and exposition, Charlotte, North Carolina