Section2 Exercise2 ClassifyingData
Section2 Exercise2 ClassifyingData
MOOC
Exercise
Classifying Data
Section 2 Exercise 2
January 27, 2021
Cartography. MOOC
Classifying Data
Instructions
Use this guide and ArcGIS Pro to reproduce the results of the exercise on your own.
Note: The version of ArcGIS Pro that you are using for this course may produce slightly
different results from the screen shots that you see in the course materials.
Time to complete
Approximately 10-20 minutes
Software requirements
ArcGIS Pro 2.7
ArcGIS Pro Standard license (or higher)
Note: The MOOC provides a separate ArcGIS account (user name and password) that you
will need to use to license ArcGIS Pro and access other software applications used
throughout the MOOC exercises. This account (user name ending with _cart) provides the
appropriate ArcGIS Online role, ArcGIS Pro license, ArcGIS Pro extensions, and credits.
We strongly recommend that you use the provided course ArcGIS account to ensure that
you have the appropriate licensing to complete the exercises. Exercises may require
credits. Using the provided course ArcGIS account ensures that you do not consume your
organization's credits. Esri is not responsible for any credits consumed if you use a different
account. Moreover, Esri will not provide technical support to students who use a different
account.
Introduction
All maps are made from data. Part of making a good map is being able to understand and
work with numbers and to appreciate how your manipulation of the data plays a vital role in
the message that your map communicates.
For a lot of topographic mapping, you are symbolizing data that has been surveyed, which
encodes meaning into the coordinates by symbolizing them as points, lines, and areas, often
of different types.
For thematic mapping (a map designed to focus on a particular theme in a geographic area),
you are often dealing with a dataset that represents a variable of interest. Your map will likely
show certain trends, such as where the place with the highest or lowest value is or where
certain areas share similar characteristics. The key is understanding that how you manipulate
the data can tell different stories. It is important to ensure that you are not inadvertently
telling a false story.
This exercise uses ArcGIS Pro to explore alternative methods of classifying numerical data for
thematic mapping. Data classification is not unique to thematic mapping, but the techniques
explored here can be used to understand and classify data more generally. You will create a
range of choropleth maps (https://bit.ly/3ngFJ85) to illustrate how changing the classification
changes the map.
The first step of classifying your data is understanding the data itself. In this exercise, you will
explore the data that you will be using, familiarizing yourself with the attributes and what they
represent.
a Start ArcGIS Pro and, if necessary, sign in using your provided course ArcGIS credentials
(user name ending with _cart).
b From the main ArcGIS Pro start page, click Open Another Project.
c Browse to the location where you saved the exercise data file and open the
ExploringDataClassification.ppkx project package.
The project opens to the UK Election 2015 election map, which includes a single layer
showing the electoral constituencies for the United Kingdom of Great Britain and Northern
Ireland and the Dark Gray Canvas basemap layer (which is not turned on).
The project also contains numerical election data for each of the constituencies. You will
classify this data using a variety of methods to visualize the results of the 2015 UK election.
d From the Project tab, click Save As and type a name for your project, such as
ExploringDataClassification_<your first and last name>.aprx.
e Save the project to the folder on your computer where you are saving your work.
Note: It is important to save your work regularly in ArcGIS Pro. Remember to save periodically
as you go through this exercise.
a In the Contents pane, double-click the Constituencies layer to open the Layer Properties
dialog box.
Note: You can also right-click the layer name and choose Properties.
b Click Source and view the data source information for the layer.
c Scroll down, if necessary, and click Spatial Reference to expand the section.
The spatial reference uses the British National Grid projected coordinate system based on a
Transverse Mercator projection. This option is the most common coordinate system and
projection used for UK data. If you looked at a map of a different part of the world, you would
likely see a different coordinate system and projection being used, one that is more relevant
to that specific area.
e In the Contents pane, turn on the Dark Gray Canvas layer, and then, if necessary, zoom
out to see all of the UK.
b Examine the field names in the attribute table and use the field descriptions in the
following table to learn more about the data.
Note: You can dock the attribute table pane in different parts of your window or make it larger
or smaller by clicking and dragging the border.
Your goal here is to get familiar with the data, which is important before working with it or
making a map.
c After you have examined the data, close the attribute table.
Now that you have an idea of the data available in the layer, you will create several choropleth
maps to see how changing the data classification changes the message of the map.
a In the Contents pane, right-click the Constituencies layer and choose Copy.
c Double-click the new layer name to open the Layer Properties dialog box.
d Click General, and then in the Name field, type Constituencies_natural breaks as the
name for the new layer and click OK.
Choropleth map symbolized with graduated colors, using the natural breaks (Jenks) data classification method.
The default classification method, Natural Breaks (Jenks) (https://bit.ly/2UefL9U), and the
number of classes are applied.
Now you have a default choropleth map showing the percentage turnout classified into five
classes by the natural breaks method. Lower turnout is shown with lighter symbols, and higher
turnout is shown with darker symbols, as you can see in the legend.
Looking at the map and the legend, what patterns do you see? What message does the
classification communicate? With all the different classification methods, you will likely see
areas of high rates relative to low rates. However, there are always subtle differences in the
resulting maps. So, for instance, does a particular technique make it easy to see the highest
and lowest areas, or does it cause these areas to become grouped with other areas? How is
the data distributed across the whole map? Are there sharp changes between some areas that
need to be examined further, or are the changes more gradual across space? Keep these and
other questions in mind as you change the classification methods because changing the
method does change the map and the map's message.
The natural breaks (Jenks) classification method uses classes based on the natural groupings
inherent in the data. This method identifies the class breaks that best group similar values and
maximize the differences between classes. It divides the features into classes whose
boundaries are set where there are relatively big differences in the data values.
j Near the bottom of the Symbology pane, notice the labels associated with the symbols.
The labels show the proportion of turnout as a value between 0 and 1. However, a
percentage would be more meaningful. You can change the labels to show a percentage by
clicking each label and typing a new value and the percent symbol (for example, 0.5940
becomes 59.4%). You could label each class in many different ways, including showing class
intervals (0–59.4, 50.5–64.4, and so on).
k Update the labels for the symbols to percentages using the following values:
Note: Change only the numbers to percentages. Do not change the dash symbols.
l In the Contents pane, examine the map legend and notice how the percentages make the
data and symbology more meaningful.
View Description
mode
Classes Allows you to manage the symbol, values, descriptive labels, and grouping of
the symbol classes
Histogram Offers a visual tool for editing the classes and understanding how the data is
represented by different classification methods
Scales Allows you to specify the scale ranges in which each symbol class draws (this tool
is not particularly useful for this exercise)
n In the Symbology pane, click the Classes tab and the Histogram tab to switch between a
label view and a histogram view.
o Ensure that you check the histogram view for this map and all subsequent classification
maps to see how the data distribution relates to the different classification schemes.
p Experiment with changing the number of classes by changing the Classes parameter in
the Symbology pane.
Do fewer classes help simplify and clarify the message of the map? Do more classes give a
different message? A good rule of thumb is to limit the number of classes to between four
and seven. Too few and you lose a great deal of variation in how the data is presented. Too
many and it becomes hard to see subtle differences between areas classified slightly
differently.
Note: For more information about data classification, refer to this ArcGIS Pro Help page about
classification methods (https://bit.ly/2XnDuVQ).
Now you will explore different classification techniques and see how they affect the way that
the data appears on the map. Remember that you can pan and zoom the map as you do this.
As you change settings, you will notice that the legend in the Contents pane updates, and the
changes are automatically applied to the map.
As you make changes, consider the visual impression of the data's pattern and distribution.
Ask yourself what each method shows and what the key aspect of the highlighted data is.
Differences will sometimes be pronounced and, other times, subtle. All have consequences
for how people read and interpret the pattern.
a In the Contents pane, right-click the Constituencies_natural breaks layer and choose
Copy.
d Turn off the Constituencies_natural breaks layer and turn on the Constituencies_quantile
layer, if necessary.
What differences or similarities do you notice with the change in classification method? What
patterns are visible? What is the key aspect of the data that is highlighted with this
classification method? Remember, you can vary the number of classes to see how that
changes things.
Choropleth map symbolized with graduated colors, using the quantile data classification method.
g Examine the histogram view for this map to see how the classification method has been
applied across the data distribution.
a In the Contents pane, copy the Constituencies_natural breaks layer and paste it to the UK
Election 2015 map.
c Turn off the Constituencies_quantile layer and turn on the Constituencies_equal interval
layer, if necessary.
Choropleth map symbolized with graduated colors, using the equal interval data classification method.
What is the key aspect of the data that is highlighted with this classification method?
Remember, you can vary the number of classes to see how things change.
f Examine the histogram view for this map to see how the classification method has been
applied across the data distribution.
You have explored three of the most common classification methods, but there are many
more. We encourage you to explore other methods on your own.
a In the Contents pane, copy the Constituencies_natural breaks layer and paste it to the UK
Election 2015 map.
c Turn off the Constituencies_equal interval layer and turn on the Constituencies_geometric
interval layer, if necessary.
Choropleth map symbolized with graduated colors, using the geometric interval data classification method.
What is the key aspect of the data that is highlighted with this classification method?
Remember, you can vary the number of classes to see how things change.
f Examine the histogram view for this map to see how the classification method has been
applied across the data distribution.
a In the Contents pane, copy the Constituencies_natural breaks layer and paste it to the UK
Election 2015 map.
Choropleth map symbolized with graduated colors, using the standard deviation data classification method.
What is the key aspect of the data that is highlighted with this classification method?
Remember, you can vary the number of classes to see how things change.
f Examine the histogram view for this map to see how the classification method has been
applied across the data distribution.
a In the Contents pane, copy the Constituencies_natural breaks layer and paste it to the UK
Election 2015 map.
f In the Symbology pane, click the Histogram tab to switch to the histogram view.
g Move the sliders up and down across the histogram so that you attempt to group
together values of data that display similar characteristics.
You might also incorporate a specific value or values as part of your classification scheme.
h Experiment until you achieve a classification scheme that makes sense to you, based on
the sort of characteristics in the data that you want to emphasize.
Choropleth map symbolized with graduated colors, using the manual interval data classification method.
What is the key aspect of the data that is highlighted with this classification method?
i Examine your manual breaks method compared to the other methods to see how the
different classification schemes change the way the map looks.
a In the Contents pane, paste another copy of the Constituencies_natural breaks layer.
d In the Symbology pane, from the Primary Symbology drop-down list, choose Unclassed
Colors.
Dragging the handles in the histogram shifts the way that color is applied to the unclassified
values.
h Experiment with these handles and see how the colors are applied to the map.
Sliding the histogram's handles is a quick way to change how all data values above or below a
particular number can be made to appear the same way.
What is the key aspect of the data that is highlighted with this classification method?
Unclassed techniques can be very useful in highlighting extremes in your data and picking out
the highs and lows. However, unclassed techniques are the inverse of classifying. It is
important to keep the difference between classified and unclassed data in mind when you
want to make a map that shows your reader how similar different places are.
Conclusion
You have explored a range of classification techniques and learned that there are numerous
ways to classify a single dataset. Therefore, there is ample opportunity to represent the data
well or poorly, objectively or persuasively.
For any dataset that you classify, first determine which method of classification will most
effectively communicate your message without obfuscating the truth.
You will use this same dataset in a future exercise to create different thematic maps and study
color and symbology in more detail. However, for now, the most important lesson to
understand is that data can be manipulated to tell different stories.
Stretch goal
If you would like to continue exploring the classification of numerical data, you can complete
the following high-level tasks:
Note: Any fields with "_Share" in their names are already percentages and do not need to be
normalized.
Post your questions and observations about this stretch goal in the Lesson Forum. Be sure to
include the #stretch hashtag in your post title. We would love to hear about any
experimenting that you did beyond the steps of the original exercise!