Unit-4 Data Exploration (E-Next - In)
Unit-4 Data Exploration (E-Next - In)
Tirup Parmar
UNIT –IV
Data exploration:
Exploration, attribute data query, spatial data query, raster data query, geographic
visualization
Descriptive statistics
o Mean: the average of data values. The mean can be calculated by Σxi/n,=1 where xi is
the ith value and n is the number of values.
o Variance: the average of the squared deviations of each data value about the mean.
The variance can be calculated by
https://E-next.in
Prof. Tirup Parmar
Graphs
Different types of graphs are used for data exploration. A line graph displays data as a
line. The line graph example in figure 1 shows the rate of population change in the United
States along the y-axis and the state along the x-axis. Notice a couple of “peaks” in the line
graph.
A bar chart, also called a histogram, groups data into equal intervals and uses bars to
show the number of frequency of values falling within each class. A bar chart may have vertical
bars or horizontal bars. Figure 2 uses a vertical bar chart to group rates of population change in
the United States into six classes. Notice one bar at the high end of the histogram.
https://E-next.in
Prof. Tirup Parmar
Fig. 4 Fig. 5
Fig. 4 : A scatterplot plotting percent persons 18 years old in 2000 against percent population
change, 1990-2000. A weak-positive relationship is present scatterplot uses markings to plot
the values of two variables along the xand yaxes. Figure 4 plots percent population
change19902000 against percent persons under 18 years old in 2000 by state in the United
States. The scatterplot suggests a weak positive relationship between the two variables.
Fig. 5 : A bubble plot showing percent population change 1990-2000, percent persons under
18 years old in 2000, and state population Bubble plots are a variation of scatterplots. Instead
of using constant symbols as in a scatterplot, a bubble plot has varying sized bubbles that are
https://E-next.in
Prof. Tirup Parmar
made proportional to the value of a third variable. Figure 5 is a variation of Figure 4, the
additional variable shows by the bubble size is the state population ins 2000. As an illustration,
Figure 5 only shown states in the Mountain region, one of the nine regions defined by the U.S.
Census Bureau.
Boxplots, also called the “box and whisker” plots, summarized the distribution of five statistics
from a data set the minimum, first quartile, median, third quartile, and maximum. By examining
the position of the statistics in a boxplot, we can tell if the distribution of data values is
symmetric or skewed and if there are unusually data points (i.e. outliers). Figure 6 shows a
boxplot based on the rate of population change in the United States. This data set is clearly
skewed toward the higher end. Figure 7 summarizes three basic types of data sets in terms of
distribution of data values. Boxplots are therefore useful for comparisons between different
data sets.
https://E-next.in
Prof. Tirup Parmar
Fig. 8 : A QQ plot plotting percent population change, 1990-2000 against the standardized
value from a normal distribution.
Some graphs are more specialized. Quantile-quantile plots, also called QQ plots,
compare the cumulative distribution of a data set with that of some theoretical distribution
such as the normal distribution, a bell-shaped frequency distribution.
The points in a QQ plot fall along a straight line if the data set follows the theorectical
distribution. Figure 8 plots the rate of population change against the standardized value from a
normal distribution. It shows that the data set is not normally distributed. The main departure
occurs at the two highest values, which are also highlighted in previous graphs.
Fig.9 : A 3D plot showing annual precipitation at 105 weather stations in Idaho. A north-
tosouth decreasing trend is apparent in the plot.
Some graphs are designed for spatial data. Figure 9, for example, shows a plot of spatial data
values by raising a bar at each point location so that the height of the bar is proportionate to its
value. This kind of plot allows the user to see the general trends among the data values in both
the x-dimension (east-west) and y-dimension (north-south).
https://E-next.in
Prof. Tirup Parmar
Dynamic Graphics
When graphs are displayed in multiple and dynamically linked windows, they become dynamic
graphs. We can directly manipulate data points in dynamic graphs. For example, we can pose a
query in one windows and get the response in other windows, all in the same visual field. By
viewing selected data points highlighted in multiple windows, we can hypothesize any patterns
or relationships that may exist in the data. This is why multiple linked views have been
described as the optimal framework for posing queries about data (Buja et al. 1996).
Fig. 10 : The scatterplot on the left is dynamically linked to the map on the right. The
“brushing” of two data points in the scatterplot highlights the corresponding states
(Washington and New Mexico) on the map.
A common method for manipulating dynamic graphs is brushing, which allows the user to
graphically select a subset of points from a scatter plot and views related data points in other
graphics (Backer and Cleveland, 1987). Brushing can be extended to maps (Monmonier 1989).
Figure 10 illustrates a brushing example that links a scatter plot and a map. May GIS packages
including ArcGIS have implemented brushing in the graphical user interface?
Q.2. Explain the term 'Attribute Data Query' with its various
aspects.
(A) Attribute Data Query
Attribute data query retrieve a data subset by working with attribute data.
The selected data subset can be simultaneously examined in the table, displayed in
chart and linked to the highlighted features in the map.
The selected data subset can also be set for further processing.
https://E-next.in
Prof. Tirup Parmar
To use SQL to access database, we must follow the structure i.e. syntax of the query
language
The basic syntax of SQL
select <attributes list >
from <relation>
where <condition>
The select keyword selects field from the database, the from keyword selects tables
from the database, and the where keyword specifies the condition or criteria for data
query.
We are considering the following Table A
Let us take an example. Suppose we have to find the sale date of the parcel coded
P101.:
Select Parcel.Sale_Date
From Parcel
Where Parcel.Pin=’P101’
The prefix for Parcel in Parcel.Sale_Date and Parcel. Pin indicates that the fields are from
the parcel table.
Suppose we have to find parcels that are greater than 2 acres and are zoned commercial
:
Select Parcel.Pin
From Parcel
Where Parcel.Acres>2 And
Parcel.Zone_code=2
https://E-next.in
Prof. Tirup Parmar
Type of operations:-
Attribute data query begins with a complete data set.
A basic query operation is to select a subset and divide the data set into 2 groups one
containing selected records and the other unselected records.
The different types of operations allow greater flexibility in data query for example,
instead of using an expression of Parcel.Acres > 2 and Parcel.Zone_code =2, we can first
use Parcel.Acres > 2 to select a subset and then use Parcel.Zone_Code =2 to select a
subset from the previously selected subset. Examples of query operations
Example 1
Q. Caller select a data subset and then add more records to it?
Example 2:
https://E-next.in
Prof. Tirup Parmar
[Switch selection]
Example 3 :
Select a Data subset and then switch select a smaller subset from it?
4 of 10 records selected
https://E-next.in
Prof. Tirup Parmar
Q.4. Explain the term 'Spatial Data Query' with all its aspects.
Spatial data query
Spatial data query refers to the process of retrieving data subset from a layer by working
directly with features.
We may select features using cursor, a graphic, or the spatial relationship between
features.
https://E-next.in
Prof. Tirup Parmar
As the geometric interface to the database, spatial data query complements attribute
data query in data exploration.
Similar to attribute data query, the results of spatial data query can be simultaneously
inspected in the map, linked to the highlighted records in the table and displayed in
charts.
So they can also be saved as a new data set for further processing.
Feature selection by cursor the simplest spatial data query is to select a feature by
pointing at it or to select features by dragging a box around them.
https://E-next.in
Prof. Tirup Parmar
o Intersect selects features that intersect features for selection. Examples include
selecting land parcels that intersect a proposed road, and finding urban areas that
intersect an active fault line.
o Proximity—selects features that are within a specified distance of features for
selection. Examples include finding state parks within 10 miles of an interstate highway,
and finding pet shops within 1 mile of selected streets. If features to be selected and
features for selection share common boundaries and if the specified distance is 0, then
proximity becomes adjacency. Examples of spatial adjacency include selecting land
parcels that are adjacent to a flood zone, and finding vacant lots that are adjacent to a
new theme park.
1. Locate all freeway exits in the study area, and draw a circle around each exit with a 1
mile radius. Select gas stations within the circles through spatial data query. Then use
attribute data query to find gas stations that have annual revenues exceeding $2 million.
2. Locate all gas stations in the study area, and select those stations with annual revenues
exceeding $2 million through attribute data query. Next, use spatial data query to
narrow the selection of gas stations to those within 1 mile of a freeway exit.
The first option queries spatial data and then attribute data. The process is reversed with
the second option. Assuming that there are many more gas stations than freeway exits, the first
option may be a better option, especially if the gas station map must be linked to other
attribute tables for getting the revenue data.
The Combination Of spatial and attribute data queries opens wide the possibilities of data
exploration. Some GIS users might even consider this kind of data exploration to be data
analysis because that is what they need to do to solve most of their routine tasks.
Q.5. Explain the term 'Raster Data Query' with all its aspects.
Raster Data Query
Although the concept and even some methods for data query are basically the same for
both raster data and vector data, there are enough practical differences to warrant a separate
section on raster data query.
https://E-next.in
Prof. Tirup Parmar
https://E-next.in
Prof. Tirup Parmar
Data Classification
Data classification can be a tool for data exploration, spatially if the classification is
based on descriptive statistics.
Suppose we want to explore rate of unemployment by state in the United States.
To get a preliminary look at the data, we may place rate of unemployment into classes
of above and below the national average. Figure J.a.
Although generalized, the map divides the country into contiguous regions, which may
suggest some regional factors for explaining employment.
To isolate does states that are way above or below the national average, we can classify
rate of unemployment by using the mean and standard deviation method figure J.b .
We can now focus our attention on states that are, for example, more than one
standard deviation above the mean.
Classified maps can be linked with tables, graph and statistics for more data exploration
activities. For example, we can link the maps in figure J with a table showing % change in
median household income and find out whether states that have lower unemployment
rates tend to have higher rates of income growth and vice versa.
https://E-next.in
Prof. Tirup Parmar
Spatial aggregation
Spatial aggregation is functional is similar to data classification except that it groups data
spatially.
Figure K shows % population change in the United States by state and by region.
Used by the U.S. Census Bureau for data collection, regions are Spatial aggregate of
states stop as shown in figure K.b , a map by region is a more general view of population
growth in the country than a map by state does.
Map Comparison
Map comparison can help a GIS user sort out the relationship between different maps.
For example, the display of wildlife locations on a vegetation layer may reveal the
association between the wildlife species and the distribution of vegetation covers.
If the maps to be compared consist of only point or line features, they can be coded in
different colors and superimposed on one another in a single view. But this process
becomes difficult if they include polygon features or raster data. One option is to use
transparency as a visual variable.
A Semi transparent layer allows another layer to show through. For example, to
compare two raster layers, we can display one layer in a color scheme and the other in
semitransparent shades of gray.
The gray shades simply darken the color symbols and do not produce confusing color
mixtures. Another example is to use transparency for displaying temporal changes such
as land cover change between 1990 and 2000. Because one layer is semitransparent, we
can follow the areal extent of a land cover type from both years. But it is difficult to
apply transparency to more than two layers.
There are three other options for comparing polygon or raster layers. The first option is
to place all polygon and raster layers, along with other point and line layers, onto the
screen but to turn on and off polygon and raster layers so that only one of them is
viewed at a time. Used by many websites for interactive mapping, this option is
designed for casual users.
https://E-next.in
Prof. Tirup Parmar
https://E-next.in