Geospatial Data I
Geospatial Data I
Geospatial Data I
This section provides an overview of Definition and Types of Geospatial Data;
Coordinate Systems and Projections; Common Data Formats; Vector vs. Raster Models;
and Time in Geospatial Data.
1
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
Geospatial Reference
2
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
Note that for data analytics and machine learning, geospatial references typically need
to be represented using coordinates (as quantitative measures). Therefore, geocoding
is often required to convert addresses or other geographical identifiers into
coordinates.
Geo-spatial Coordinates
3
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
In the Latitude and Longitude coordinate system, any point on Earth is described using
two numbers—latitude and longitude. Latitude corresponds to the polar angle, which
measures how far north or south a point is from the Equator (which is 0° latitude), with
values ranging from -90° at the South Pole to +90° at the North Pole. Longitude
corresponds to the azimuthal angle, which measures how far east or west a point is
from the Prime Meridian (which is 0° longitude), with values ranging from -180° to
+180°.
These values can also be expressed using directional indicators. So instead of using
positive/negative signs, you might see latitude and longitude expressed with
directional indicators (N, S for latitude, and E, W for longitude).
Unlike a full spherical coordinate system, the Latitude and Longitude system does not
include the radial distance, because it specifically describes locations on the Earth's
surface, assuming a constant radius (i.e., the distance from the Earth's center to the
surface). Together, latitude and longitude uniquely identify any location on Earth.
Here's an example of how latitude and longitude are used to describe a location. Let's
take the Eiffel Tower in Paris, France:
• Latitude: 48.8584° N
• Longitude: 2.2945° E
In standard databases, these coordinates are often stored in a more compact
numerical form, known as decimal degrees, without the directional indicators:
• Latitude: 48.8584
• Longitude: 2.2945
4
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
5
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
6
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
UTM Zones: Each zone in the UTM system has a unique grid and covers a specific
region of the Earth, running from 84°N to 80°S latitude. A location’s position is
described by two coordinates: easting (the horizontal component) and northing (the
vertical component). Easting measures the distance eastward from the central meridian
of the zone (with a false easting added to avoid negative numbers), while northing
measures the distance from the equator in the northern hemisphere, or from a point
10,000,000 meters south of the equator in the southern hemisphere.
In the UTM system, locations are uniquely described by three components:
• The UTM zone number, which identifies the longitudinal slice of the Earth
where the location lies.
• The easting, which is the distance from the central meridian of the zone (in
meters).
• The northing, which is the distance from the equator (or false origin in the
southern hemisphere, also in meters).
7
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
Accuracy and Usage: UTM is especially useful in regional mapping and engineering
projects because it maintains accurate distance and area measurements within
individual zones.
WGS84 in UTM: Like with latitude and longitude, the WGS84 datum is often used
with the UTM system to ensure accurate geographic positioning by defining the Earth's
shape and reference points. Together, UTM and WGS84 provide a highly precise and
standardized way of locating any position on Earth within a local zone using metric
units.
Common Geo-spatial Data Formats
Apart from point-wise geospatial data, which are commonly used to represent sensor
or measurement locations, well locations, GPS points, and more, there are several other
types of geospatial data tailored to specific use cases:
Line (Polyline) Data
Polylines represent linear features or paths by connecting multiple points and are
used to model things like river networks, pollutant dispersion paths, or migration routes
in environmental studies. Each line segment in a polyline is defined by at least two
geographic coordinates (e.g., latitude/longitude pairs or other coordinate systems). A
polyline is typically represented as a sequence of points, with each point having its own
8
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
set of coordinates, and the lines connect these points in the defined order, creating the
overall path.
Example: A river's flow path connecting three monitoring stations tracking water
quality:
• Station 1: (Lat: 45.5128, Long: -122.6587) — Upstream water quality sensor.
• Station 2: (Lat: 45.5017, Long: -122.6750) — Midstream water quality sensor.
• Station 3: (Lat: 45.4815, Long: -122.6548) — Downstream water quality sensor.
This polyline represents the river's flow path between the three monitoring stations,
helping environmental analysts track water quality changes along the river.
Polygon Data
Polygons represent areas or boundaries by connecting multiple points to form a closed
shape, commonly used to model regions such as lakes, forests, land parcels, or
protected areas in environmental studies. Each polygon is defined by a series of
geographic coordinates (e.g., latitude/longitude pairs) that represent its boundary. The
first and last points must be the same to form a closed loop, enclosing an area.
Example: A protected forest area defined by four boundary points:
• Point 1: (Lat: 42.0000, Long: -123.0000)
• Point 2: (Lat: 42.5000, Long: -123.0000)
• Point 3: (Lat: 42.5000, Long: -122.5000)
• Point 4: (Lat: 42.0000, Long: -122.5000)
• Back to Point 1 to close the polygon.
This polygon represents a forest area enclosed within the specified boundary points,
which can be used to analyze land cover, biodiversity, or environmental protection
efforts within the region.
9
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
10
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
These points form a triangle that represents a small portion of the landscape's surface.
Multiple triangles are combined to form a TIN that models the entire terrain. This TIN
can be used for environmental applications such as watershed delineation or
calculating slope gradients for erosion analysis.
productivity. Raster data is commonly used for satellite imagery, climate models, and
surface analysis.
Raster data is geo-referenced by providing the coordinates of a reference point in the
raster image, commonly the upper-left corner of the raster grid. This, along with
information about the pixel size (resolution) and the coordinate system, allows the
raster to be accurately placed within a geographic space.
Raster data differs from vector data in that it provides a continuous representation of
spatial information, while vector data represents discrete features like points, lines,
and polygons.
Raster Data.
Trajectory Data
Trajectory data differs from both typical vector and raster data as it represents the
movement of objects over time, capturing both spatial and temporal dimensions.
While vector data focuses on static, discrete features like points, lines, or polygons,
trajectory data tracks the dynamic movement of objects, such as animals, vehicles, or
weather patterns. Each trajectory is composed of a sequence of time-stamped
geographic points, representing the object's changing position as it moves through
space over time.
Example: In environmental data analytics, trajectory data can be used to track the
migration patterns of wildlife. For example, the movement of a tagged bird might be
recorded at various intervals:
• Time 1: (Lat: 45.5017, Long: -122.6750) — 8:00 AM
• Time 2: (Lat: 45.6098, Long: -122.7565) — 10:00 AM
12
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
Apart from trajectory data, which is inherently time-stamped, both raster and vector
data can also be time-stamped to represent dynamic, time-varying features.
1. Time-Stamped Raster Data:
• Dynamic raster data represents continuous variables (e.g., temperature,
precipitation, or air quality) that change over time.
• Each raster "snapshot" corresponds to a specific moment or time period, and by
time-stamping each snapshot, you can track how the variable changes over time.
• Example: Satellite imagery showing vegetation cover could be time-stamped
to observe seasonal changes in forest cover over a year.
2. Time-Stamped Vector Data:
• Dynamic vector data involves time-stamping features like points, lines, or
polygons to represent changes in their position or attributes over time.
• This is commonly used in scenarios where spatial features evolve, such as
shifting boundaries, moving objects, or fluctuating attributes.
• Example: Time-stamped polygons could represent the changing boundaries of
a floodplain over the course of a storm.
There are many file formats, some specifically developed for geospatial data and
others designed for storing general geometric data that can also be applied to
geospatial contexts. Here, we review some of the common formats, though this is not
an exhaustive list. It's important to note that none of these formats were developed
exclusively for environmental data analytics, but they are widely used for geospatial
data and are often applied in environmental analyses as well.
Shapefile (.shp extension)
A widely used format for geospatial vector data, developed by Esri. A Shapefile can
store basic vector data, including points, polylines (lines), and polygons, along with
associated attribute information. Shapefiles do not support TINs or raster data. They
are typically used for static data or static time-stamped snapshots, as they do not
13
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
natively support dynamic or time-series data. A shapefile requires additional files such
as:
• .shx: Index file to link geometry and attribute data.
• .dbf: Attribute data file containing tabular information related to the features.
A shapefile is not human-readable. The main .shp file and its associated files (e.g.,
.shx and .dbf) are binary formats, which means they are meant to be read and
processed by GIS software, not by humans directly.
GeoJSON (.geojson extension)
A widely used format for encoding a variety of geospatial data structures in JSON
(JavaScript Object Notation). GeoJSON can store basic vector data, including points,
polylines (LineString), and polygons, along with associated attribute information. Unlike
shapefiles, GeoJSON natively supports time-stamped and dynamic data, making it
suitable for tracking changes over time or representing movement.
GeoJSON files are human-readable and lightweight, making them ideal for web
applications and API-based data sharing. They use a coordinate system based on
WGS84 (World Geodetic System 1984).
GeoJSON files are self-contained and do not require additional index or attribute files
like shapefiles.
GeoJSON is not suitable for representing raster or TIN data.
KML (Keyhole Markup Language, .kml extension)
A widely used XML-based format for representing geospatial data, originally
developed for Google Earth. KML can store basic vector data, including points,
polylines, and polygons. KML is also capable of storing 3D geometries and can
include additional styling information (such as colors and icons for points) for better
visualization.
KML supports time-stamped and dynamic data, making it suitable for visualizing
moving objects or changes over time (e.g., animated paths or temporal changes in
geographic features). KML can also handle placemarks, network links (for fetching
data dynamically), and overlays (like images or raster data, though the raster data itself
is not stored directly in KML).
Limitations:
1. Raster Data: KML does not directly store raster data, but it can reference
external images or overlays that are draped over geographic features.
14
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
2. TIN Data: KML does not natively support TINs or topological data structures like
triangulated surfaces, making it unsuitable for detailed 3D surface modeling as
required for TIN data.
KML is widely used for web-based applications and virtual globes like Google Earth,
where visual representation and ease of sharing geospatial data are priorities.
GeoTIFF (.tif or .tiff extension)
GeoTIFF is a widely used format for storing raster data that includes geographic or
spatial information. It is an extension of the standard TIFF format (Tagged Image File
Format), commonly used for images, with additional metadata that allows the raster
data to be geo-referenced. GeoTIFF can store data like satellite imagery, digital
elevation models (DEMs), or any grid-based data, with each pixel corresponding to a
specific geographic location.
Key Features:
• Geo-referencing: GeoTIFF includes metadata such as the coordinate reference
system (CRS), origin (e.g., upper-left corner coordinates), pixel size (resolution),
and transformation parameters, which allow the raster image to be accurately
placed on a map.
• Continuous or Categorical Data: GeoTIFF can represent both continuous
data (e.g., elevation, temperature, or rainfall) and categorical data (e.g., land
cover types, soil classifications).
• Multiple Bands: GeoTIFF supports storing multiple bands of data within a
single file, which is useful for applications like remote sensing (e.g., RGB bands,
infrared bands).
Limitations:
1. Vector Data: GeoTIFF is designed for raster data and cannot store vector data
(points, lines, polygons).
2. TIN Data: GeoTIFF cannot represent TIN data, as it is a grid-based format and
does not handle topological relationships between points, which are necessary
for TINs.
OBJ (Wavefront OBJ, .obj extension)
OBJ is a widely used format for representing 3D geometric data, including meshes,
surfaces, and triangular irregular networks (TINs). The OBJ format is simple and
human-readable, using plain text to define the vertices, edges, and faces that make up
a 3D model. It is widely supported in both 3D modeling software and programming
15
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
16