Crop Classification Cookbook: Draft
Crop Classification Cookbook: Draft
BUREAU OF RECLAMATION
DRAFT
May, 2004
2. Directory Structure
Create Working and Archiving Directory Structure
File Naming Convention
3. Imagery Preparation
Directory Structure
Directory Definitions and Contents
5. Signature Preparation
Buffer coverage and imagery
Run segmentation (IPW) seeds generation
Create signature names
7. Classification
Perform supervised classification
9. Accuracy Assessment
Create Error Matrix
10. Reclassifying
Perform summary of mislabeled pixels
Identify mislabeled fields
Further classification iterations
This document will take you through all of the steps necessary to complete crop classifications. The procedures
were created to aid Bureau of Reclamation personnel in classifying crops in the Lower Colorado Region as part
of the Lower Colorado River Accounting System (LCRAS). These procedures provide standard vegetation
mapping methods using field survey data and multispectral (satellite) imagery.
Note on filename conventions: Several different file formats are used in this procedure: ERDAS images, ARC
coverages, ARC grids, ARC frequency files, ASCII files, IPW images, and Microsoft Excel files. All file types
are named by the associated processing area (ARC field-border coverage name). The convention for the prefix
is <cov>. Note <> signs are used like a wildcard to surround a ‘filename’. ‘cov’ refers to an arc coverage name.
Note: there are many references to viewing imagery with Arc coverages and selecting polygons based on various
attributes. If you have ArcMap, I have found that review and selection is much easier in ArcMap (rather than an
Imagine viewer), as you can set up review .mxt files, zoom to selected polygons (which you cannot do in Imagine
viewers), etc. You may want to consider this option when reviewing results.
Software:
1. ERDAS Imagine 8.2p or 8.3+.
2. IPW software.
3. Arc Info 7.0 or 8.0+, GRID.
4. Ecognition (optional)
Hardware:
1. Sun SPARC or current PC. Enough available disk space (2 gigs is sufficient) for your project, or equivalent
processing capabilities. Segmentation algorithms, which run in IPW, are UNIX based. For PC based processing,
Ecognition can be used to generate the equivalent to segmentation.
2. Networked printer.
Notes/Terms:
Lower Colorado Region Processing Areas refer to five coverages, LOW1, LOW2, MID1, MID2 & TOP
The term <cov> refers to root coverage name of these five processing areas (i.e. mid1, mid1label1, etc.) Though
field data for LOW1 and LOW2 are in separate arc coverages, these processing areas are now combined into a
single coverage using the Arc ‘Append’ command (with poly option) and processed as one coverage identified
as ‘LOW12’.
The pixel size for Landsat-5 & Landsat-7 TM is now ordered as 30 meter pixels, nearest neighbor resample,
orthorectified. Scripts and amls used in these processes should be reviewed for proper pixel size where grid
operations, etc. are used.
Note: This chapter is for projects which include BOR based fieldwork in the processing. Please contact Jeff
Milliken for additional field sampling information.
1. Review hard copy field maps and make a list of field boundary coverage changes noted by field personnel.
Put this file in the same directory as the field border ARC coverage to be edited.
2.Field data is collected using a program in DOS called “truth.exe”. Truth.exe generates comma-delimited
flat files (text files). These files need to be combined into one file if this has not been done. Use DOS or
UNIX utilities to combine files. (See Appendix 1 for details) If your field data is in the comma delimited
format perform the following;
DAVE: We can update and work on this section in the future.
Quality Control: Check the original ASCII file by viewing and looking at each record closely. Common errors
to look for are:
incorrect crop codes,
comments that are longer than 80 characters (for BOR defined attributes)
no spaces between words in the comments
Quality Control: the original coverage which has the field data joined with the polygons. Be sure that the
items in the coverage are as follows:
ITEM FORMAT
AREA
PERIMETER
<COV>#
<COV>-ID
DATE 88 C
QUADNAME 13 13 C
FIELD-ID 77I
EXTRA-FIELD 22N
CROP-TYPE 88N2
HEIGHT 4 12 F 2
GROWTH-STAGE 22I
CROP-PCT 33I
OTHER-PCT 33I
CONDITION 22I
MOISTURE 22N
SIGNATURE 22N
BORDER-CHANGE 44N
COMMENTS 80 80 C
STUDY-AREA 22I
AA 11I
ACRES 12 12 N
7. Print out this list. Combine this list with the one from the hardcopy maps. This represents the polygons
which require editing, adding, etc.
8. Run Arc Edit with the current Landsat scene as a backdrop (displaying bands 4,3,2 or the new 15 meter
panchromatic band for Landsat 7) and use this image as a visual que for the edits if the location of the
correction is not clear on the map.
Be sure that all the items widths, output, type, Dec formats are correct. You may have to add items “AA”, “ACRES”
Calc acres = area / 4046.856 (units meters).
9. Ag fields which contain an entry of 1.0 in the “SIGNATURE” item should not be used for classification
work or accuracy assessment as these have been deemed inappropriate for this use based on field
observations.
~/cy200* Denotes calendar year of the classification work being done and nests related directories
supporting work being completed for that year.
~/feb0* Contains Arc/Info ground-truth data for month and year listed
in the directory. When ground-truth data is added
to a new directory under ~/groundtruth a
directory should be created and named after the
appropriate month and year it was collected.
~/feb**work Working directory for crop classifications. When new directory names are
added for subsequent classification dates the directory name should be
changed to reflect the month and year of the classification.
~/top Working directories for processing areas (i.e. Top, Mid1 Mid2, Low1,
Low2). Subdirectories listed below are called in aml programs.
1. Load the Landsat TM Imagery in Imagine on your disk. Imagery is currently ordered as an orthorctified
product in Imagine format. Other formats may require conversion from binary format or Landsat FAST
format. Go to Import/Landsat TM, and import six bands (bands 1-5, 7). The thermal band, band 6 is not
used. It can be removed from an image by using either of these two methods:
Quality Control: Display each band individually and check for anything that doesn’t look right. Look
for line drop outs, clouds, contrails, fire smoke, etc. Also look for shadows indicating fire smoke, etc.. Your
image might be acceptable with a few clouds if there is sufficient field data in the cloudy areas. You may not
know if the image is acceptable until you run through the entire procedure! Also, be sure that if you requested
an image shift, check that it is properly shifted (For Lower Colorado Region this may apply to the image
path/row 38/37 for Landsat 5 data).
3. Display the image and look at Image Information to make sure that the correct parameters are in the
header file, projection: UTM, Zone 11, spheroid: Clarke 1980, NAD83. X and Y upper left coordinates
(from header) and the pixel size (from header). This data is found in the documentation (softcopy and/or
hardcopy) that accompanies the image.
Lower Colorado Region: Low1, Low2 , Mid1 and Mid2 should fall completely within 38/37and 38/36. Top
usually falls completely within 38/36 - sometimes 39/35 is needed.
Quality Control: Overlay the field border vectors on top of the imagery and be sure that each coverage
completely overlays an image (Mid1 spans both 38/36 and 38/37).
1. Load the crop coverages. Import the Arc Exchange (.e00) files to Arc Coverages if necessary.
Quality Control: Check the coverage, and be sure that it does not have any label errors or slivers, and that
topology has been “built”.
Quality Control: Overlay your coverages on top of the imagery in Imagine and check for proper registration.
Field border should align with Landsat data within +/- one pixel (sometimes 1 ½ pixels). If the imagery is off
more than this, and the amount and direction of offset is fairly consistent, adjust coordinates for the upper left
corner in the image header file for better registration. Amount and direction of shift to subtract or add to the
image source coordinates can be determined by taking an average throughout the image. If the mis-registration
is not generally consistent with respect to amount and direction, the image may need to be returned to the
vendor for proper orthorectification (there have been instances where imagery has been more than 2 to 3 pixels
Crop Cookbook Lower Colorado Region
U.S. Bureau of Reclamation 7 5/26/20
BUREAU OF RECLAMATION
off - also when compared to previously ordered images - and these should be returned for correction).
Overlay the field-border coverage over the multi-spectral Landsat data and review for single fields with
multiple spectral conditions (i.e. Looks half fallow and half vegetated). It may be necessary to add field-borders
in arc edit with the Landsat image backdrop. Double lines are used for dividing fields. If there is no obvious
road, put the two arcs reasonably close together. A new label with unique field-id will be required for the half
without a label. Ensure that the field doesn’t happen to be a ground-truth field; if it is, you will have to
determine which “half” the coding belongs with. Request starting numbers for new unique field ID’s from the
GIS analyst. After editing is complete, pass the edited coverage and information back to the GIS analyst for
permanent inclusion into the field border databases. At this point, you are just working with a coverage specific
to image classification, and source files also need updating.
4. Once you are satisfied with your coverages and imagery, go into Arc and convert your vector coverages into
raster Imagine files for each coverage, by using the origcov2img.aml. Commands used are polygrid, then
gridimage. (see Appendix 2) Start arc/info and run this script from the low12 covs directory. This process can
also be executed by hand if desired using the commands listed above.
Note: This aml is specific to LCRAS files and would need modification for other projects.
5. These new images- <processing area>g.img - will be used to mask out the crop areas from the full scenes.
Note: You will have to correct the projection information for the new images by adding the datum NAD83 and
spheroid GRS1980. Also change ‘METERS’ to ‘meters’ (lower case) This must to be done because of a
bug present in Arc/Info. To do this, open the image in a viewer, choose ‘utility’ , ‘layer info’ and then first
edit ‘map model’ to meters (instead of METERS) and then edit ‘map projection’ to NAD83, GRS1980.
Select Interpreter/ Utilities/Mask and use the intersection option. Perform this for each of the coverages.
EXAMPLE: Input source image i.e. 3836_012699b1-5_7.img (the orig. terrain corrected TM scene)
The next step is to check your field coverages and omit any unnecessary data. This is an important step to guarantee
that the correct field data is carried through the crop process and must be stepped through manually. Run frequency
in Arc on crop-type and look for data classed as 99, 55, or some other code not related to a known crop-type and
calculate to 0 (however, ‘strange’ codes should be checked to ensure there has not been a data code entry error).
You may find crop-type = 14.10 which doesn’t exist. When you identify something similar to this, it is a
data entry error and should be 14.01.
Quality Control: Run frequency in ARC on crop-type; (you can use cropfreq.aml in aml directory under)
In addition, for crop classification purposes, you should be aware of instances where collected field conditions do
not match the spectral imagery. For example, depending on crop growth stage, date of field data collection, and
image date, it is possible that a given crop may have been harvested between the field data collection date and the
Crop Cookbook Lower Colorado Region
U.S. Bureau of Reclamation 8 5/26/20
BUREAU OF RECLAMATION
image date. This usually represents only a few fields. These fields may need to be dropped from the spectral
signature set, and unless they can be explained with confidence, they may need to be dropped from the accuracy set
if they were randomly chosen. If a given field has high IR reflectance with crop-type = 14.03 (and the image
predates the field work), then a crop may have been harvested between the time field data was taken and the date of
the imagery. If there are comments in the field data indicating a crop residue, then the analyst could change the crop-
type code from 14.03 to the correct crop assuming that this crop was present at time of imagery. This is left to the
discretion of the analyst as other data (such as crown closure) will not be available and this choice may also depend
on the number of fields in this condition and whether these fields are required for a satisfactory sample.
Compare the results of the frequency to aa_sel.aml, and make sure that all of the CROP-TYPE, CROP-
PCT, or GROWTH are accounted for in aa_sel.aml. Make any necessary adjustments in aa_sel.aml.
Typically, percentage cutoffs now being used for accuracy field selection (40%) are adequate so this aml
can be considered ‘generic’.
If the crop is fairly ‘common’, a minimum of 25 to 30 accuracy assessment fields is desirable. You can
raise the percentage higher than 40% if it is a case where an extra 5 % will achieve the desired minimum
number of accuracy assessment fields. If you have to raise the percent more than 5%, then it is an
indication that this crop type is not that common and therefore we probably do not have a large enough
sample to statistically assess the accuracy (and typically this is not a problem because error associated
with this crop is also not that significant since there is not much of it). Refer to <cov>crop.fre frequency
to see how many fields have been chosen for accuracy assessment.. If editing of this aml is required, save
that version and rename the aml <cov>aa_sel.aml and place it in the /aml directory nested beneath
your /top, /mid1, /mid2, low1 or /low2 directory as changes you have made may only be applicable to that
particular processing area and classification date. There is no need to save the aml to a specific processing
directory if no edits were necessary.
It is understood that rare crop-types may be under represented in the overall ground-truth data set and do
not provide a statistically valid sample for accuracy assessment. However, this is due to the fact that some
crops are not grown in enough abundance to be statistically represented in this manner. However, both
training and accuracy matrices can be used to help improve classification results on these crops as well.
This will create two new coverages; one will contain just those fields that will be used to create training
signatures, and the other will contain only those fields that will be used for accuracy assessment purposes.
Training field anomalies: In some instances, coding a crop-type in the field can be somewhat ‘interpretive’.
For example, a recently harvested crucifer field still has an abundance of green leaf material on the field.
Some will give this field a crucifer code with “harvested” for growth stage. Some may give this a fallow
code with a comment indicating that the field has crucifer residue – dependent on the amount of residue
present. The remote sensing analyst must determine how these fields are best coded for spectral training
purposes and accuracy purposes and codes may need to be changed to accommodate this.
Quality Control: Check frequency <cov>_aa.fre (run by aml) for totals of crop-types reserved for accuracy
Crop Cookbook Lower Colorado Region
U.S. Bureau of Reclamation 9 5/26/20
BUREAU OF RECLAMATION
assessment to ensure adequate sample.
Visually review the fields chosen for accuracy assessment over the imagery to ensure that the crop-type and it’s
‘condition’ make sense with the multi-spectral imagery. For example, if an aa field has a code of fallow (14.03)
but the imagery has a high IR (infrared) response indicating the field is vegetated, there are several
possibilities:
1. The field crew coded the wrong field
2. The crop was harvested between the time of field data collection and image date (check your dates of
field data collection and the image date)
3. You have inadvertently used the wrong image.
4. An error occurred when field data was related into the GIS database
5. There could be more possibilities but these would be the first to check.
In any case, you will have to correct the situation and in this case these would be the typical choices:
1. First check the image date vs. the field date relationship. You may be able to change crop-type to the
correct crop code (change fallow to proper crop), if residue is listed in field data or field data indicates crop is
near harvest. If this does not explain the discrepancy, then ensure you are using the correct imagery. Otherwise
ask the field coordinator to check the field data for this field (check the hard copy maps against the digital field
data file). If it cannot be reconciled, then you may need to eliminate the field from the accuracy assessment set
by changing the value for aa to a ‘0’, as it is not valid to assess the classification accuracy with this field.
The following is a procedure for creating signatures within the agricultural training fields. Arc/Info,
Imagine and the UNIX based IPW software programs are all needed to run the Autosig procedure.
Ecognition software can also now be used to generate segmentation regions but specifics of using this
software are not discussed in this document.
1. Prepare the polygons by buffering the training set of field polygon coverages by 30 meters. Run for each
region if necessary.
Quality Control: In ArcMap, Arcedit, Arcplot or ERDAS Imagine, display the original image, buffered
coverage and the coverage used to buffer from, and do a visual check.
2. Due to an “Arc” bug, you must fix the Datum(s) and Unit(s) of the mask images in Imagine using
‘Utility’ / ‘layer info’ / edit map model and edit projection. Make them GRS1980 & meters in map model
and NAD 83 datum in ‘edit projection’. Now mask the imagery using the <cov>_TRBG.img to create
<cov>_tr_ag_.img. Image Interpreter - Utilities- Mask. Use Intersection and Ignore Zeros in stats.
If you are using Unix and batching process areas, you can also run the unix script ‘mask.scr’ located in
Crop Cookbook Lower Colorado Region
U.S. Bureau of Reclamation 10 5/26/20
BUREAU OF RECLAMATION
/*01work/amls directory. Not recommended if doing a single processing area. This script will mask each cover but
you must do an edit to ensure the proper path is designated and the proper Imagine environment and home directory,
etc. are correct for you platform.
Quality Control: View the resultant image to make sure this process worked correctly.
3. Use Imagine Export to subset <cov>_tr_ag.img into three individual band files to use in the IPW software.
Create separate generic binary files for band 3, band 4, and band 5, using a .blk extension. These are the
only bands needed for segmentation. Export using the Generic Binary option, and then click on Export
Options and select the correct band for exporting. Output name should follow this format:
<cov>_tr_ag_3.blk.
Unfortunately, there does not seem to be a way to script this process. Note:
Ecognition can now be used to run directly on Imagine files. This will not be
covered at this time. Ecognition output also requires edited amls to deal with
export formats from Ecognition (which differ from this process). This doc will
be modified when we migrate to Ecog for this process.
4. In the project /img directory, run the IPW script seeds.exe (seeds.scr is the same) at the UNIX prompt.
Note: for input into IPW, you must determine the number of rows and the number of columns in your
<cov>_tr_ag.img. You can find this information in Imagine viewer, under Image Info or under
Sessions/Tools/Image Info, with height = rows and width = columns. Run the aml for each image. See the
appendix for the gory details on IPW segmentation parameters.
6. Use Image Info to specify the coordinates, the pixel size and map projection of <cov>_tr_ag_seed.img.
Manually set the Datum (Clarke1980) and the Units (meters) of this file. Bring up the image info for
<cov>tr_ag.img, and use the coordinates and projection information from that for <cov>tr_ag_seed.img.
7. Convert <cov>tr_ag_seed.img to a polygon coverage. Output file will be <cov>seedfld. Run this aml in
the project covs directory. This a ‘generic’ aml that can reside in /*01work/amls/ and can be run by
pathing to the aml from each covs directory (i.e. &r ../../amls/seedfld )
Quality Control: Do a visual QC for variability within IPW polygon regions by viewing the vector coverage
over the training image in Imagine.
This AML adds items, calculates items, drops items as shown below, and redefines items in order to generate a
Signame item with the signature name automatically input. This a ‘generic’ aml that can reside in
/*01work/amls/ and can be run by pathing to the aml from each covs directory (i.e. &r ../../amls/signame )
Quality Control: Compare the AML with the items in your coverage to be sure that they will match up properly.
Also, check to be sure that your new item called signame has been created properly. Go tables, SEL
<COV>SEEDFLD.PAT, and LIST SIGNAME. It should be very obvious if it is wrong. If it is wrong, re-check
the AML against the .pat and try again.
Format of signature name is : FIELD-ID - SEED(unique identifier) - QUADNAME - CROP TYPE - CROP
PCT - GROWTH STAGE - CONDITION
Spaces are unavoidable in the signame due to item value ranges in the dataset.
NOTE: There are processes in this section that can take awhile to run, so it may be advantageous to have other
things to do as well while waiting …
Use this coverage - <cov>seedfld - to generate signatures in ERDAS Imagine. For the BOR process,
remember not to take any signatures from bermuda (8.00,8.01, 80.2,8.03), citrus (9.00, 9.01, 9.02,9.03),
dates (15), deciduous orchards (17.01, 17,02, 17.03), Idle (14.01), and other ‘static’ types such as wetlands,
nurseries, etc.. This is because these crops are determined through ground truth identification and air photo
interpretation, not through classification. These crop types should have been excluded from the training set
by running the <cov>aa_sel.aml process. If they appear during signature extraction procedures you should
edit <cov>aa_sel.aml to exclude these types.
A. Open the image in viewer 1. Use the source multispectral image or the <cov>.img. If using
<cov>.img, you may need to adjust the image stretch to ensure that fallow fields look close to
the source image color and vegetated fields look close to the source image colors. This is
because <cov>.img is a masked image and the image colors are now probably skewed. It is
important to have the image colors look like the source image as these are later represented in
your signature editor and come in handy for general signature evaluation.
B. Open a vector layer (<cov>seedfld) in viewer 2. This vector layer is going to be used to
create the AOIs. Once the coverage appears on the screen, go to file menu: Vector, Viewing
Properties and Select polygons, and Deselect Arcs. Click on Apply, the display should draw
up the polygons filled with cyan.. Because you will later be selecting by polygon, the actual
polygons must be displayed. NOTE: At this point arrange your viewers side by side with
about a quarter screen left at the bottom for the vector properties window (Viewer #2) and the
Signature Editor under Viewer # 1 – (in 8.6 you can also use a split screen viewer)
Crop Cookbook Lower Colorado Region
U.S. Bureau of Reclamation 12 5/26/20
BUREAU OF RECLAMATION
C. Open the Signature editor from the classifier module and arrange under Viewer #1
D. In viewer #2, select the features using the vector attribute editor menu. Arrange the vector
attribute window under Viewer #2. All features (i.e., polygons) can be selected by selecting
all rows with the right button on the mouse in the "record" column. In the viewer, you should
see all the polygons turn the selection color. At this point, due to Imagines slowness with very
large files, I recommend generating signature sets of about 500 max. To do this, you can
select the desired records from the vector attribute window (i.e.. - 1st 500 in the file, 2nd 500
in the file, etc.) and save to separate signature files after generating sigs from AOIs - next step
described.
E. In viewer #2, after you have the desired polygons selected, pull down the AOI menu and
In Viewer #1, especially if you are planning on taking x number of signatures at a time,
you need to "center" the image in the window without fields touching the edge of the
window so that you can select them (an ERDAS thing…). So, use the "rotate and
magnify" tool under view/zoom - this allows you to create a box larger than the viewer
area, move the box to cover all fields (box edge will end up out of view where training
fields touch the edge of the viewer), then double click to "reset" view to this bigger area.
Now you should be able to see all training fields with space between them and the edge
of the viewer.
Select the file menu (in the viewer) AOI Pull-down and select link. Then Click in
Viewer #2 to link the AOI's in viewer 2 to the image in viewer 1.
In viewer # 1, All the selected polygons in the vector attribute window become AOI boundaries
and are highlighted on the image. But, they are not yet actually selected yet.
F. Select all the AOIs in the image window (Viewer 1) . To do this, open TOOLS under the AOI
menu, and then drag the lasso tool (the dashed line square) over all the AOIs. Little "x-boxes"
appear over all the AOIs when they're all selected. It took several minutes for 300+ fields to be
selected (this will be platform specific with respect to your processing capabilities).
G. Next, in the Signature Editor, use the "Create New Signatures from AOI" button to add all
the selected AOIs. Be patient, depending on how many AOI's you have selected, it may take a
few minutes before signatures begin to be automatically added to the signature editor (in other
words, it looks like nothing is happening). It may take several minutes to 15 or 20 minutes to add
signatures depending on your system and number of signatures being generated. All the AOIs are
added to the Signature file in the record number order from the field attributes. If the number of
polygons you selected does not match the number of signatures created, then you will need to go
back and re-lasso the AOIs, and recreate your signature file. NOTE: Record # 1 of your Vector
cover is the universe polygon and should be unselected. This all works very well for generating
large signature sets "automatically" though there should be a way to do it without the graphic
interface - let us know if you figure out a way (we have not had luck with ERDAS scripting on
this).
H. Cut and paste the SIGNAME column FROM the VECTOR ATTRIBUTE editor TO the
SIGNATURE editor. To do this, in the VECTOR ATTRIBUTE editor, select the SIGNAME
column and select ALL the appropriate rows (actually they should already be selected from the
last step), then use the right mouse button at the top of the SIGNAME column to initiate "COPY".
In the SIGNATURE editor, select the "Signature Name" column and ALL the rows, then use the
right mouse button at the top of the "Signature Name" column to initiate PASTE.
Crop Cookbook Lower Colorado Region
U.S. Bureau of Reclamation 13 5/26/20
BUREAU OF RECLAMATION
Quality Check: Check to make sure that the signames and signatures make sense - for example,
look at the change between a group of fallow field signatures and a vegetated one to ensure that
the signature name did not end up offset by a record of something of this nature - you can easily
see this as a fallow sig will look fairly blue in the color box and high crown closure crop will be
red.
I. Save the signature file (may take awhile) (e.g., <cov>_1-500.sig) in the /sig directory. Once
you have created your set of 500 signatures, you can either continue taking the next set of 500
signatures, or go through the winnowing process. If you want to continue taking signatures, you
must first get rid of your AOI’s in Viewer #1 and Viewer #2. Go to View/Arrange Layers and
delete the AOI layers. Next, deselect all records in the vector attribute editor and open a new
signature editor. Go back to step C, and start the next set of 500.
J. To begin the signature winnowing process, open one of your signature files in the signature
editor (or leave your existing set of signatures open) and go into View/Columns in the signature
editor menu bar. Highlight Signature Name and Color, then select statistics/standard deviation.
Apply.
K. Right click under Class # and select criteria. A new window will open. In this window you
will be setting the criteria by which signatures will be selected. You want to select all signatures
that have standard deviations in all bands of <= 3. Keep in mind that these cutoffs have worked
well for us in general after testing a variety of SD cutoffs. For your particular area, a different
cutoff may be appropriate (lately we’ve been using 4). To set these criteria, you selection should
look like:
$Std. Dev. (Layer_1) <= 3 and $Std. Dev. (Layer_2) <=3 and $Std. Dev. (Layer_3) <= 3 and
$Std. Dev. (Layer_4) <= 3 and $Std. Dev. (Layer_5) <= 3 and $Std. Dev. (Layer_6) <= 3
At this point, selected records that meet the criteria will be highlighted in yellow. Do not single click
in any area with the mouse or all records but the one you clicked will be unselected. If you want to
unselect records or select additional records (signatures), hold the shift key down at all times when
selecting and unselecting.
Now, right mouse click in the column under “Class #” and sort by Signature Name. This allow you
to see all signatures for each field grouped.
The next part of the process is fairly subjective but here are some guidelines – take your time –
even though it seems like there are too many signatures, they are easy to review and this is
much faster than hand generating signatures using the ERDAS seed function:
1. Make sure that all crop types are represented when reviewing the selected set.
2. Some crops tend to have higher spectral variance (like melons) and you may need to individually select
additional signatures that have the lowest standard deviations possible in all bands (in other words,
some crops will not meet the SD cutoff of 3 or less in all bands), to ensure that all crops are
represented in your signature set. I generally try not to have to select sigs with a SD greater than 5
in any one band but exceptions to the rules sometimes need to be made.
3. If possible, have at least one signature represented from every training field. But if a particular field has
high SD in all bands and you have plenty of representation of the crop, do not select the signatures
– usually, there are more than enough signatures after the selection is complete.
4. Make sure that the general color representation shown in the ERDAS sig editor,makes sense with the
crop attributes you can see included in the Signature Name. For example: Signature name
contains the crop code and percent crown closure. If you see a crop with 80% crown closure
indicated but the color is light blue (no infrared) , this is an indication of a) coding error b)
Crop Cookbook Lower Colorado Region
U.S. Bureau of Reclamation 14 5/26/20
BUREAU OF RECLAMATION
condition of field changed between time of ground truth data collection and time of imagery (i.e.
The crop was harvested between ground truth date and image date). This signature, as is, should
not be used as it will cause much error in the classifier and it is easier to omit it now than have to
track it down later after your classification accuracy indicates the problem. (note: this could also
be true for accuracy assessment fields that have been chosen – discussed earlier). Check suspect
fields visually in the imagery, check field data, make sure your signatures are labeled properly,
etc, to try and explain possible discrepancies. Data that does not make sense could be an
indication of a larger data problem – make sure it isn’t.
5. You may need to unselect signatures for crops that are not mature enough to include in the classifier. For
example, most field personnel (not being remote sensors), should just collect the data but not be
tasked to make subjective decisions about whether a given crop at a given crown closure is going
to cause error in the classifier. So, you will learn by experience and there are always exceptions,
but generally, if the crown closure is less than 10 to 15%, the crop will typically be spectrally
confused with a fallow condition. Unselect these signatures so they are not used. This process
assumes that, for example, at a given time for one set of mature target crops, there may be others
just emerging that are better left classified as fallow, because, you will return at the proper time
of the growing season to collect data on that crop(s) for a second classification using the
imagery for that particular time. Avoid omitting signatures that you are not sure about (i.e.
windrowed alfalfa with 20% crown closure and looks ‘kind of’ fallow). Sometimes, these are
still unique enough to be classified correctly – you will find out on these after the first
iteration - if this signature causes unacceptable error of commission against known fallow
fields you may end up omitting it for the second classification iteration.
L. Once you have successfully winnowed down your signatures, go to File/Save As; your new
filename will be the same as the old one, but with a .1 on the end of it, e.g., <cov>_1_500.sig
becomes <cov>_1_500.1.sig. In the Save As window, highlight the Selected box, and it will
save only those signatures that are highlighted to the new file.
To reduce the levels of miss-classification (remember her ?) among alfalfa, sudan and cotton during the July
classification period it may be necessary to collect a larger number of cotton and sudan signatures. This may need to
be done even if the overall classification results are > 91% if the errors of commission by alfalfa are significant
towards sudan and/or cotton.
This can be done by in increasing the SD cutoffs used to select the overall signature set, by increasing the standard
deviation selection to < 5 (from < 3) for class_names containing “- 200-“ and class_names containing “- 1100-“.
This will bolster the signature set and reduce or eliminate any errors of commission by alfalfa against cotton and
sudan , but may also create or increase errors of commission by cotton and sudan. A couple more iterations may be
required to have optimum results.
The result of this procedure should increase the overall accuracy of the classification as well as the accuracy in
alfalfa, sudan and cotton crop classes, then again…
M. You are now ready to take the next set of signatures from same coverage. To do so, delete the
AOI's in the Viewers, deselect the records in the vector attribute editor and open a new signature
editor. Return to Step C to gather the next 500.
N. Once all of the signature sets for one processing area (e.g., mid1), have been created, and
winnowed, append them all together. Open one of the reduced signature files in the signature
Note: If you are re-classifying an image (second iteration) save your new signature set to a
filename such as <cov>_all.2.sig (this will be discussed in Chapter 9).
CHAPTER 7. CLASSIFICATION
Now that you’ve completed the autosig portion of the project, you will move into the exciting realm of image
classification, image recoding, polygon labeling, and creating an accuracy matrix.
1. Classification! Open your signature editor with your "all" file (<cov>_all.1.sig). Once your signature
editor is open go to Edit/Image Association. This will tell you which .img file the signatures are
associated with; be sure that it is associated with the image file that is of raw imagery of all the fields in
that area. If necessary, change association to <cov>.img, then save the sig file.
The classification should take no more than 15 minutes to run but obviously depends on the image size and
number of classes to be classified.
In Imagine, open the classification in a viewer and then open the raster/ attribute table for <cov>_agsup*.img.
In the attribute table window, pull down the edit/column properties option. Add a new column named “crop”,
set column type to integer, and make column width = 5.
Then right click on the ROW column and select criteria. You want to select all signatures by crop class by
entering , for example class 200: “class_name” contains “- 200-“ and “crop” = 0.
Scroll through the highlighted rows to be certain only crop-type 2 is selected. Don’t worry about crop-type 2
classes that have not been selected yet; just make sure that everything that IS selected is what you expect it to
be. Unselect any incorrect classes by holding down the Shift key while left clicking on the record. Holding
down the shift key enables the deselection of a single record.
Next, left click at the top of the crop column to highlight the entire crop column. With the right mouse button,
click at the top of the crop column and choose the “formula” option for the crop column. Choose int<a> in the
dialogue box. Do NOT select “crop”, because this is already selected by having the column highlighted. Edit the
formula to read
Now apply. The crop column in all the selected rows should now contain the number 2. Save, and repeat this
procedure for all crop-types changing the crop column to the single integer correlating to the crop group.
Crop Cookbook Lower Colorado Region
U.S. Bureau of Reclamation 16 5/26/20
BUREAU OF RECLAMATION
Save after each crop type calculation. After running through all crop types you can select crop type = 0 to find
rows that were not selected in the initial pass. To do this, clear the criteria dialog box, and replace the selection
criteria with Crop = 0. The remaining crop type codes can either be edited manually or by formula.
3. Recode the classification results based on the crop group integer you just entered into the crop column.
Image Interpreter/GIS Analysis/ Recode is used. In the dialog box: Enter file names
Input is <cov>_agsup*.img
Output is <cov>_agsup*_rc.img
Click "set up recode" in the recode cell array, copy the crop column and paste it into the new value column.
(Scroll to the end of the columns and you will find your added crop column, cross select it (select both the crop
column and all of the rows), right click on top of crop column, and select copy. Scroll back over to the new
value column, cross select it, right click on new value and select paste. You should see the values from the
crop column pasted into the “new values” column.) Click OK to return to the dialog box, and click OK again
to proceed with recode.
4. Label ag fields based on “majority” rule ( = plurality in Imagine) using Zonal Attribute function:
a. Imagine: Image Interpreter / GIS Analysis / Zonal Attributes
Vector Layer: <cov> Raster Layer: <cov>_agsup*_rc.img
Choose: Ignore zero in zonal calculations
Zonal Function: Majority
(this program adds the item “majority” to your field border database <cov> and assigns a value
based on the majority of your recoded pixel classification in each ag field)
This amls copies the source cov to <cov>label*, adds item crop-label, calcs crop-label = majority, redefines
crop-type to non-floating point crop group code (item “crop”), drops majority from the input cov for future
iterations, and produces a frequency from the output cov ( <cov>label* ) of aa, crop, crop-label (groundtruth vs.
classification label) for use in an accuracy assessment matrix. This file is written out to a standard text file with
the same name as the Arc frequency file - <cov>label*.fre
(note: the aml calls another aml called ‘listinfo.aml’ which converts the frequency file to a text file – make sure
listinfo.aml is properly pathed in this aml if an error occurs during this operation)
Prepare the accuracy matrix using the frequency from <cov>label* generated in the previous step. To do this.
import the <cov>label*.fre text file into an Excel spreadsheet. <cov>label*.fre contains the following
information:
In Excel select /File/Open and navigate to the directory containing the <cov>fre.txt. Be sure to select the file
type to be All Files (*.*). Once a file is select the Import Wizard window will appear and walk you through
Crop Cookbook Lower Colorado Region
U.S. Bureau of Reclamation 17 5/26/20
BUREAU OF RECLAMATION
steps 1-3. Steps 1-2 can be ignored by selecting the next button. In step three select only the FREQUENCY,
CROP, CROP-LABEL, AA and ACRES columns to be imported and then select next and finish to import the
data into Excel.
Now that the data is imported into an Excel workbook two worksheets will needed to be created if they do not
already appear. This is done under /Insert/Worksheet from the main tool bar. Label one work sheet
<cov>feb99_aa_data and <cov>feb99_tr_data. Copy paste the accuracy assessment data (AA column = 2) to
<cov>feb99_aa worksheet and the training data (AA column = 1) to <cov>feb99_tr worksheet. Note, the
column titles should be copied to each worksheet as well. Please note, Feb99 is only being used as an
example. The date within the Excel file names should reflect the appropriate month and year for the
data.
To ensure error matrix tables are consistent throughout the year and symmetrical, crops that are not represented
in the classification need to be added to the worksheets. To do this, first identify the crops not represented in
the classification. There should be a valid codes file that lists all crop groups. Some crop groups are not used if
they do not exist in the Lower Colorado region (i.e. rice) See previous classification matrices for this setup. To
add crops not represented see below:
0 4 4 2 0
Note: for crop codes being added to fill out the matrix, the frequency and acres column must be set to 0. The crop
and crop-label column must be set to the correlating crop number being added. The aa column should be set to 2 for
the <cov>feb99_aa data and to 1 for the <cov>feb99_tr data. Please be aware that there are 22 crop types possible,
1,2,4-23. Crop code 3 corresponds to rice which is not currently included in Lower Colorado work (no rice being
grown), but is included in Central Valley CA work.
To create the accuracy matrix (aka error matrix), make active the <cov>feb99_aa worksheet and highlight the cell in
the upper left corner of the spreadsheet. Next go to data in the upper menu bar and select pivot table report. This
will open the PIVOT TABLE WIZARD and walk you through steps 1-4.
A. Step 1 - Select Microsoft Excel list or database and then select NEXT
B. Step 2 - You should see all data cells selected by flashing bounding box. If the cells selected are
incorrect they can be changed in the range setting box so that they are all selected. Select NEXT
C. Step 3 - To input data into the pivot table select and drag the box labeled crop to column, select and
drag crop-label to row and select and drag acres to data. Select NEXT
D. Step 4 - Select New Worksheet and then select FINISH. Rename the new work sheet created for
the pivot table <cov>feb99_aa_mtx (for aa fields : aa = 2 )
(steps 1&2 above usually default to the choice you make the first time. If this is the case, for future matrices, you
can just select ‘finish’ at Step 1 and then complete Step 3.)
Next, supply formulas to the pivot table to calculate overall errors, fallow corrected overall errors and individual
errors of commission and omission (please contact J Milliken with questions regarding the “fallow” correction
which is only valid in certain circumstances and too involved to discuss in this document). Excel formulas can be
used to calculate these numbers and is best explained through examination of a completed example. Please refer to
a currently existing error matrix. Formulas in an existing error matrix can be copied and pasted in to newly created
error matrix tables. Once complete save under /LCRAS/ACCURACY99/<COV>feb99 directory on your PC as
<cov>feb99_mtx (or wherever…)
NOTE: these are general guidelines and an overall accuracy of 90% does not always indicate an acceptable
classification. If there are low crop accuracies of target crop types (significant mature crop for that time
period), further iterations may be warranted, even though the overall accuracy is 90% or greater. In
addition, one classification may be preferred over another as a function of the nature of the error and
intended use of the data. For example, two iterations may have a similar overall accuracy, but the nature of
confusion (which crops are confused with other crops) may differ. One of these may be a better choice based
on the differences in CU (water consumptive use) between the crops that are confused. Also, some types of
error may be preferred as a function of best requirements for the annual summary process. If you are
unclear about some of these principles, contact Jeff Milliken and he will confuse you further.
At this point the operator is faced with an iterative loop attempting to identify (bad) signatures which are
contributing to the mislabeling of an agricultural field.
This routine aids in the identification of specific bad signatures which may have caused problems in the
classification, and thus may need to be eliminated in the next classification round. Typically, I look for signatures
causing significant errors of commission against a known crop type (for example, a significant number of alfalfa
fields have been misclassified as lettuce. I would like to find the lettuce signatures responsible for these errors of
commission against alfalfa).
2. Once this step is completed, display the image, open the Image Info for this file <cov>bad1.img, and
correct the projection parameters (due to bug in Arc where projection info is not written in proper format).
Menu Edit/Change Map Model: set to meters and UTM, menu Edit/Add change projection: set
Projection to Clarke 1980, UTM Zone 11, NAD 83. Then rerun statistics with skip factor of 1, and
Edit/Change Layer type to toggle the layer type to thematic.
It is not necessary to put the data in a file. Choose interactive cell array and Summary will create a spreadsheet
on screen with the data and you can click through the “Zones” and copy /paste suspect signatures to a text editor
window for future use.
Zone no. : in this case zone no. represents the known crop group (i.e. zone 1 = alfalfa). Zones represent those
training fields for each crop group that were mislabeled in the classification. (e.g. for Zone no. = 1, this would
represent all the known [training only – not aa fields] alfalfa fields that were misclassed after the majority rule
was used to label those fields. Data in the cell array is presented as a ‘sum’ of all misclassed alfalfa fields (not
each individual misclassed alfalfa field – hence the term ‘zone’).
Class name = signame containing the field-id, quad, crop-type, crown closure, etc.
Count = number of pixels within zone (i.e. alfalfa for zone 1) that were classified as this class
% = % of pixels within zone that were classified as this class
Acres (you can use options menu to choose area units) = number of acres in zone classified as this class.
Example: Scroll through cell array and look for class names that have classified a significant % of the zone but
are a different crop type than the zone. These are possible bad signatures. For zone = 1, this is all known
training alfalfa fields that were mislabeled by the classifier. One record shows 14 % of the zone classified and
class name indicates this to be a lettuce signature. This would be a suspect signature for causing errors of
commission against alfalfa (in other words, 14% of the pixels within known mislabeled alfalfa fields have been
erroneously classed as lettuce with this signature). What represents a “significant” % is somewhat relative as
you are looking at multiple mislabeled fields rather than each individual mislabeled field, so usually you are
looking for “anomalous” percentages of pixels that were misclassed. You can generate information on each
individual mislabeled field but this is much more time intensive and and be used if the ’zone’ procedure does
not improve the classification.
To copy/paste out of the cell array to a text editor, select the columns you want to copy and cross-select the
record, then right click in one of the column headings and choose copy, then paste into your text editor. This
comes in handy as if you copy paste suspect signatures for different zones (crop types), you will often find that
there are signatures that cause commission errors for multiple crops (the same bad signatures are misclassifying
different crop types and showing up in the cell array under multiple zones). When copy/pasting to your text
editor, keep track of each zone you are copying from for evaluation. Your text file should look something like
this:
5. IDENTIFICATION OF BAD FIELDS. Additionally you may check to see if there are bad fields.
The badsig.aml previously run also generates an arc file <cov>bad1 that only contains mislabeled fields for that
particular iteration and runs a frequency on this file that is written out to a separate text file in the /covs
directory (i.e. <cov>bad*.fre) You can use this frequency file to identify individual misclassed fields if
necessary.
To interactively view the misclassed fields: Display the arc coverage <cov>bad* in an ERDAS Imagine
viewer (or ArcMap) over your pixel classification.. Click on viewing Properties for the vector coverage and
view only the polygons and set the viewing properties so there is no fill. Bring up the vector attributes table and
select desired fields – they should highlight in yellow.
Using the cursor, pan around the bad field and get a feel for which signature(s) are causing the field to be
incorrectly classified.
Once the signatures which may have caused the bad classification of the field(s) have been identified bring
up in the Signature Editor, the last iterations signature set, and remove the offending signature and
reclassify.
This approach is more qualitative and time consuming but may be necessary on occasion.
The evaluations discussed above have focused on omitting signatures – there may be cases when
additional signatures are warranted.
6. After rerunning the classification, go to Chap. 8 for recoding, labeling, etc. for evaluation of this iteration.
To be completed…
APPENDIX
This entire appendix needs review and significant changes – do not use at this
point.
AMLS AND PROCEDURES
Note: The most recent updates AML’s and scripts will always be in the executable “bin” directory on the disk. The
codes listed here are for reference. See the actual disk file for the exact steps.
Background: This aml is used only to check the coverage(s) that are to be used in the classification of the image for
your project. Coverages come to us with the field attributes intact from many different field survey periods. Of
course your are only interested in the date which coincides with the current “time”. The user must make sure that the
working coverage contains crop-type information only for fields that were field checked during the month and year
of the time period you are working on. The unneeded fields are set to crop-type = 0, so that they are not used as
either training or accuracy selection sites.
Important exceptions to this rule in the BOR processing are the two crop-types that are verified by aerial
photography and are mostly consistent from year to year: citrus and dates. The crop-type data for citrus and dates
should be retained in the final coverage, no matter what month or year the field information is from. It is a good idea
to check for the number of citrus (9.00, 9.01, and 9.02) and date (15.00) crop-type fields at the beginning of this
process, and check again at the end to verify that none of these fields have accidentally been calculated to 0.
This aml also checks for crop-types that are unknown or not part of the classification system, such as crop-types
greater than 16… So do the following:
A. If a crop-type is > 16, then it needs to be calculated to 0. Crop-types > 16 are used for
crops that don’t fit into any of the crop categories (e.g., peaches), or are unknown crop
types. Sometimes, a crop-type code of 12.99 is used for an unknown crop type; if that
crop-type code is found, calculate it to 0. Check out the example below.
B. If a field was not actually visited during the field visit that you are working on, then calc
crop-type = 0. For instance, if you’re working on the coverages for May 1996, the first
Crop Cookbook Lower Colorado Region
U.S. Bureau of Reclamation 22 5/26/20
BUREAU OF RECLAMATION
step is to see if, in the data field called DATE, the numbers ‘95’ exist. If it does, then that
means that the last time that field was visited was sometime in 1995, and you therefore
need to calculate the crop-type = 0, unless it is a citrus or date field. Below is an example
of the procedures to go through.
Note: The “03” below refers to the month of March which, in this example is a field
survey performed earlier within the same year which is not needed.
C. If, in the COMMENTS field, it shows that a field has been split, then the crop-type needs
to be calculated to 0. Sometimes, a farmer will be growing multiple types of crops in one
field, and when that happens, we do not want to use that field for training sites, or for
accuracy assessment. When searching for a word in the “comments” field, you’ll need to
look for that word in three ways, e.g. ‘HALF’, ‘Half’, and ‘half’. It is up to the discretion
of the image processor to figure out which fields are appropriate to recalculate, and which
fields should be used for training sites. The word “half” may also be used in sentences
like, “Irrigating half of the field currently.” Check out the examples below.
Note: It may be easier and quicker to select all fields where crop-type ne 0, and simply
list
Crop Cookbook Lower Colorado Region
U.S. Bureau of Reclamation 23 5/26/20
BUREAU OF RECLAMATION
Field-id, Crop-type, Comments.
FIELD-ID = 167
COMMENTS = WHALF SUDAN; EHALF EMERGENT
COTTON
FIELD-ID = 228
COMMENTS = EHALF ALF; WHALF 1402
FIELD-ID = 256
COMMENTS = EQtr Sudan; CenQtrs Fallow;
Wqtr Alf
FIELD-ID = 478
COMMENTS = w2thirds alf; ethird wheat
This buffers training fields 25 meters to the inside so that a mask can be generated from this coverage
(rasterized) to mask imagery that segment regions will be grown on.
Arc: gridimage low1t2trbufgr # low1t2trbuf.img imagine (You can also use the Import function in
Imagine to import the Grid to an Imagine file.)
When converting, be sure that the pixel size is 25, the output pixel data type is Unsigned 16 bit and the attribute
item is <cov-id>.
NOTE: For the mkbih command: -l is equivalent to lines, height, and rows; -s is equivalent to samples,
width, and columns.
hist low1t2seeds2_5_15.armap.*|rdhist>low1t2seeds2_5_15.armap.hist
Before you run seeds.exe, bring it up in a text editor and specify the appropriate lines and samples, and
be sure that your filenames are accurate.
mkbih creates an IPW header file for the generic binary files
texture creates a texture band from band 4
mux puts all four bands together (bands 3,4, 5 and the texture band)
segment creates the segmentation polygons
NOTE: Experimentation on time 1 low1 area was used to determine segment parameters.
In order to get regions for training fields with some having standard
deviations in all (6) bands (after bringing regions back to Imagine for
signature generation) of less than 5.0, the following parameters were
used:
* -t spectral threshold = 2
* absolute minimum region size and minimum region size = 5 pixels
* viable region size = 15 pixels
* in the mkbih command -l = rows and -s = columns
This tends to result in more signatures than necessary for areas with
less variability (TOP), but data set is easily reduced in the signature
editor (discussed later).
Crop Cookbook Lower Colorado Region
U.S. Bureau of Reclamation 25 5/26/20
BUREAU OF RECLAMATION
The output of this process results in several files with the following type of filename:
low1_ag_seed.armap.21 (region map) (the number after .armap will be different for each file)
low1_ag_seed.armap.21.hist (histogram to determine total # of regions.)
Thus there are a total of 2892 regions in this file. You will add 1 to this total and in Step 7 check to be sure that the
number is correct.
10. Use identity to bring field attributes back into "region" (or IPW) polygon file (low1t2seedcov)
11. Create new cov containing only segment regions (after identity with original training cov the area between
the buffer and outside field boundary
is again included in the file but has GRID-CODE = 0 (gridpoly brings segment
region numbers into low1t2seedcov as item GRID-CODE):
Run IPW routines for mkbih (make basic image header) and rmap2lut (region map
to look up table. A script can be written to run these routines.
This is an explanation of the syntax: mkbih -l are lines = rows and -s are samples = cols
rmap2lut specs:
-r rmap is gridded polygon coverage <cov>poly.ipw (remember, based on field-ids)
-i the stitched out classified/recoded areas <cov>agsup.ipw
(stitched to match <cov>poly.ipw)
-c calculation is p (plurality)
-f output file <cov>agpoly.rlut, a look up table with field-id in column one
and the calculated crop label in column 2.
Note that the resulting <cov>poly.ipw only needs to be made once per time-set.
If the scripts are run more than once, the routine:
does not need to be redone, but the output <cov>poly.ipw is needed in:
Thus, <cov>poly.ipw should be saved, and the path name to it may need to be specified if the script is
run from a different place.
In tables, perform a redefine on your original coverage to create a new redefined item called crop
which will contain the first two digits from the crop-type item.
ENTER COMMAND>ITEMS
DATAFILE NAME: LOW1T1.PAT
24 ITEMS; STARTING IN POSITION 1
COL ITEM NAME
1 AREA
5 PERIMETER
9 LOW1T1#
13 LOW1T1-ID
17 QUADNAME
30 FIELD-ID
37 CROP-TYPE
45 MIN-HT
49 MAX-HT...
ENTER COMMAND>REDEFINE
ENTER DATAFILE DEFINITION
ITEM STARTING COLUMN>40
ITEM NAME>CROP
ITEM WIDTH>2
ITEM OUTPUT WIDTH>2
ITEM TYPE >I
Then rasterize the misclassified field coverage <cov>bad with crop item as attribute and with 25 meter
pixel size.