0% found this document useful (0 votes)
21 views18 pages

1 What Is Geospatial Data? - Geospatial Data Science With Julia

This document introduces geospatial data, defining it as a combination of attribute tables and discretized geospatial domains. It presents a universal representation suitable for various geospatial analyses, emphasizing the importance of understanding tables and domains in data science. The document also discusses different representations of data in Julia, including built-in table structures and domain representations for efficient geospatial data processing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views18 pages

1 What Is Geospatial Data? - Geospatial Data Science With Julia

This document introduces geospatial data, defining it as a combination of attribute tables and discretized geospatial domains. It presents a universal representation suitable for various geospatial analyses, emphasizing the importance of understanding tables and domains in data science. The document also discusses different representations of data in Julia, including built-in table structures and domain representations for efficient geospatial data processing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

 Part I: Foundations > 1 What is geospatial data?

1 What is geospatial data?


Welcome to Part I of the book. Before we can start our journey in geospatial data science, we
need to introduce important concepts, which will be the foundations of Part II, Part III and Part
IV.

In this chapter, we define geospatial data and introduce a universal representation for it which
is ideal for geostatistical analysis. Unlike other representations in the literature (e.g., “raster”,
“vector” data), the proposed representation is suitable for encoding geospatial data over 3D
unstructured meshes, 2D images embedded in 3D space, and other types of complex geospatial
domains.

1.1 Definition

Definition
(Discrete) geospatial data is the combination of a table of attributes (or features) with a discretization
of a geospatial domain. Each row (or measurement) in the table corresponds to an element (or
geometry) in the discretization of the geospatial domain.

The definition depends on two other definitions that we clarify next.

1.1.1 Table
In data science the most natural data structure for working with data is the table. Generally
speaking, a table is any object that can be structured into rows containing measurements and
columns representing variables. For example, Table 1.1 has 5 measurements of 4 variables:

Table 1.1: Example of table

NAME AGE HEIGHT GENDER

John 34 1.78m male

Mary 12 1.56m female

Paul 23 1.70m male

Anne 39 1.80m female

Kate 28 1.72m female


:
In Julia, the concept of table is formalized in Tables.jl by Quinn et al. (2023). The definition is
independent of the machine representation, and various representations can co-exist in the
language.

1.1.2 Domain
The second definition that we need is that of a geospatial domain. In geosciences, questions are
often formulated within a physical region of interest. This physical region can cover a small area
of the surface of the Earth, the entire Earth surface, or any region of finite measure that can be
discretized into smaller geometries (a.k.a. elements):
:
(a) Coast line in Islay Province, Peru. View on Google Maps

(b) Synthetic carbonate reservoir model by Correia et al. (2015). See UNISIM-II for more details

Figure 1.1: Example of geospatial domains

Figure 1.1 illustrates two very different examples of geospatial domains. The domain in
Figure 1.1 (a) is widely studied in GIS books. It is a 2D domain that contemplates a small area
:
near Islay Province, Peru. The domain in Figure 1.1 (b) on the other hand is not considered in
traditional GIS literature. It is a 3D domain that has been discretized into hexahedron
geometries.

The concept of geospatial domain is formalized in Meshes.jl by Hoffimann et al. (2021).

1.1.3 Remarks
Images like the one depicted in Figure 1.1 (a) are often implemented in terms of the array
data structure. GIS books call it “raster data”, but we will avoid this term in our framework in
order to obtain a more general set of tools.

According to our definition of geospatial data, “raster data” is simply a table with colors as
variables (e.g., RGB values) combined with a grid of quadrangle geometries. We illustrate
this concept in Figure 1.2 by zooming in a satellite image of the Lena delta:

(a) “Raster data” of Lena delta


:
(b) Zoom reveals quadrangle geometries

Figure 1.2: Quadrangle geometries in “raster data”

There are no constraints on the geometries used in the discretization of the geospatial
domain. In Figure 1.3, Brazil is discretized into complex polygonal geometries that
represent country states:

Figure 1.3: Brazil’s states represented with complex polygonal geometries. View on Google Maps.

GIS books call it “vector data” because the geometries are stored as vectors of coordinates
in memory. We will also avoid this term in our framework given that it only highlights an
implementation detail.

Before we start discussing machine representation with actual Julia code, let’s make a final
(pedantic) distinction between the words geospatial and spatial. These words mean different
things in different communities:

In geosciences, an object is geospatial if it lives in physical space.


:
In statistics, a model is spatial if it exploits the vicinity of samples in the sample space.

Given that geospatial data science deals with both concepts, we must use these words carefully.

Note
In Geostatistical Learning, models can exploit both spaces to improve prediction performance, but that
is out of the scope for this book.

1.2 Representation

Based on the definition of geospatial data given in the previous section, we are now ready to
proceed and discuss an efficient machine representation for it with actual Julia code.

1.2.1 Table
The Julia language comes with two built-in table representations:

1. Named tuple of vectors


2. Vector of named tuples

The first representation focuses on the columns of the table:

coltable = (
NAME=["John", "Mary", "Paul", "Anne", "Kate"],
AGE=[34, 12, 23, 39, 28],
HEIGHT=[1.78, 1.56, 1.70, 1.80, 1.72],
GENDER=["male", "female", "male", "female", "female"]
)

(NAME = ["John", "Mary", "Paul", "Anne", "Kate"], AGE = [34, 12, 23, 39, 28],
HEIGHT = [1.78, 1.56, 1.7, 1.8, 1.72], GENDER = ["male", "female", "male",
"female", "female"])

Given that data science is often performed with entire columns, this column-major
representation of a table is very convenient. The second representation focuses on the rows of
the table:

rowtable = [
(NAME="John", AGE=34, HEIGHT=1.78, GENDER="male"),
(NAME="Mary", AGE=12, HEIGHT=1.56, GENDER="female"),
(NAME="Paul", AGE=23, HEIGHT=1.70, GENDER="male"),
(NAME="Anne", AGE=39, HEIGHT=1.80, GENDER="female"),
(NAME="Kate", AGE=28, HEIGHT=1.72, GENDER="female")
]
:
5-element Vector{@NamedTuple{NAME::String, AGE::Int64, HEIGHT::Float64,
GENDER::String}}:
(NAME = "John", AGE = 34, HEIGHT = 1.78, GENDER = "male")
(NAME = "Mary", AGE = 12, HEIGHT = 1.56, GENDER = "female")
(NAME = "Paul", AGE = 23, HEIGHT = 1.7, GENDER = "male")
(NAME = "Anne", AGE = 39, HEIGHT = 1.8, GENDER = "female")
(NAME = "Kate", AGE = 28, HEIGHT = 1.72, GENDER = "female")

The row-major representation can be useful to process data that is potentially larger than the
available computer memory, or infinite streams of data.

Although these two representations come built-in with Julia, they lack basic functionality for
data science. The most widely used table representation for data science in Julia is available in
DataFrames.jl by Bouchet-Valat and Kamiński (2023).

using DataFrames

df = DataFrame(
NAME=["John", "Mary", "Paul", "Anne", "Kate"],
AGE=[34, 12, 23, 39, 28],
HEIGHT=[1.78, 1.56, 1.70, 1.80, 1.72],
GENDER=["male", "female", "male", "female", "female"]
)

5×4 DataFrame

Row NAME AGE HEIGHT GENDER

String Int64 Float64 String

1 John 34 1.78 male

2 Mary 12 1.56 female

3 Paul 23 1.7 male

4 Anne 39 1.8 female

5 Kate 28 1.72 female

This representation provides additional syntax for accessing rows and columns of the table:

df[1,:]

DataFrameRow (4 columns)

Row NAME AGE HEIGHT GENDER

String Int64 Float64 String

1 John 34 1.78 male


:
df[:,"NAME"]

5-element Vector{String}:
"John"
"Mary"
"Paul"
"Anne"
"Kate"

df[1:3,["NAME","AGE"]]

3×2 DataFrame

Row NAME AGE

String Int64

1 John 34

2 Mary 12

3 Paul 23

df.HEIGHT

5-element Vector{Float64}:
1.78
1.56
1.7
1.8
1.72

df."HEIGHT"

5-element Vector{Float64}:
1.78
1.56
1.7
1.8
1.72

Note
Unlike other languages, Julia makes a distinction between the the symbol :HEIGHT and the string
"HEIGHT" . The DataFrame representation supports both types for column names, but that is not
:
always the case with other table representations.

Other popular table representations in Julia are associated with specific file formats:

CSV.File from CSV.jl (Quinn et al. 2023)


XLSX.Worksheet from XLSX.jl
Databases from JuliaDatabases

The choice of table representation is a function of the application.

1.2.2 Domain
All available domain representations come from the Meshes.jl module.

Let’s start with a simple list of disconnected geometries:

using GeoStats

p = Point(1, 2)
s = Segment((0, 2), (1, 3))
t = Triangle((0, 0), (1, 0), (1, 1))
b = Ball((2, 2), 1)

geoms = [p, s, t, b]

4-element Vector{Geometry{𝔼{2}, Cartesian2D{NoDatum, Quantity{Float64, 𝐋,


Unitful.FreeUnits{(m,), 𝐋, nothing}}}}}:
Point(x: 1.0 m, y: 2.0 m)
Segment((x: 0.0 m, y: 2.0 m), (x: 1.0 m, y: 3.0 m))
Triangle((x: 0.0 m, y: 0.0 m), (x: 1.0 m, y: 0.0 m), (x: 1.0 m, y: 1.0 m))
Ball(center: (x: 2.0 m, y: 2.0 m), radius: 1.0 m)

Because these geometries are unaware of each other, we place them into a GeometrySet ,
informally known in computational geometry as the “soup of geometries” data structure:

gset = GeometrySet(geoms)

4 GeometrySet
├─ Point(x: 1.0 m, y: 2.0 m)
├─ Segment((x: 0.0 m, y: 2.0 m), (x: 1.0 m, y: 3.0 m))
├─ Triangle((x: 0.0 m, y: 0.0 m), (x: 1.0 m, y: 0.0 m), (x: 1.0 m, y: 1.0 m))
└─ Ball(center: (x: 2.0 m, y: 2.0 m), radius: 1.0 m)

No advanced knowledge is required to start working with these geometries. For example, we
can compute the length of the Segment , the area of the Triangle and the area of the Ball
with:
:
length(s), area(t), area(b)

(1.4142135623730951 m, 0.5 m^2, 3.141592653589793 m^2)

More generally, we can compute the measure of the geometries in the domain:

[measure(g) for g in gset]

4-element Vector{Quantity{Float64}}:
0.0 m
1.4142135623730951 m
0.5 m^2
3.141592653589793 m^2

In the example above, we iterated over the domain to apply the function of interest, but we
could have used Julia’s dot syntax for broadcasting the function over the geometries:

measure.(gset)

4-element Vector{Quantity{Float64}}:
0.0 m
1.4142135623730951 m
0.5 m^2
3.141592653589793 m^2

The list of supported geometries is very comprehensive. It encompasses all geometries from the
simple features standard and more. We will see more examples in the following chapters.

One of the main limitations of GIS software today is the lack of explicit representation of
topology. A GeometrySet does not provide efficient topological relations (Floriani and Hui
2007), yet advanced geospatial data science requires the definition of geospatial domains where
geometries are aware of their neighbors. Let’s illustrate this concept with the CartesianGrid
domain:

grid = CartesianGrid(10, 10)

10×10 CartesianGrid
├─ minimum: Point(x: 0.0 m, y: 0.0 m)
├─ maximum: Point(x: 10.0 m, y: 10.0 m)
└─ spacing: (1.0 m, 1.0 m)

We can access the individual geometries of the domain as before:

grid[1]
:
Quadrangle
├─ Point(x: 0.0 m, y: 0.0 m)
├─ Point(x: 1.0 m, y: 0.0 m)
├─ Point(x: 1.0 m, y: 1.0 m)
└─ Point(x: 0.0 m, y: 1.0 m)

And even though we can manipulate this domain as if it was a “soup of geometries”, the major
advantage in this abstraction is the underlying topology :

topo = topology(grid)

10×10 GridTopology(aperiodic, aperiodic)

This data structure can be used by advanced users who wish to design algorithms with
neighborhood information. We will cover this topic in a separate chapter. For now, keep in mind
that working with the entire domain as opposed to with a vector or “soup of geometries” has
major benefits.

Note
The CartesianGrid domain is lazy, meaning it only stores the start and end points of the grid together
with the spacing between the elements. Therefore, we can easily create large 3D grids of Hexahedron
geometries without consuming all available memory:

grid = CartesianGrid(10000, 10000, 10000)

10000×10000×10000 CartesianGrid
├─ minimum: Point(x: 0.0 m, y: 0.0 m, z: 0.0 m)
├─ maximum: Point(x: 10000.0 m, y: 10000.0 m, z: 10000.0 m)
└─ spacing: (1.0 m, 1.0 m, 1.0 m)

grid[1]

Hexahedron
├─ Point(x: 0.0 m, y: 0.0 m, z: 0.0 m)
├─ Point(x: 1.0 m, y: 0.0 m, z: 0.0 m)
├─ Point(x: 1.0 m, y: 1.0 m, z: 0.0 m)
├─ Point(x: 0.0 m, y: 1.0 m, z: 0.0 m)
├─ Point(x: 0.0 m, y: 0.0 m, z: 1.0 m)
├─ Point(x: 1.0 m, y: 0.0 m, z: 1.0 m)
├─ Point(x: 1.0 m, y: 1.0 m, z: 1.0 m)
└─ Point(x: 0.0 m, y: 1.0 m, z: 1.0 m)

In computational geometry, a CartesianGrid is a specific type of mesh. It can only represent


“flat” domains sampled regularly along each dimension (e.g., images). To represent domains with
:
curvature such as terrain elevation models or complex 3D domains like the one in Figure 1.1 (b),
we can use the SimpleMesh domain:

# global vector of 2D points


points = [(0.0, 0.0), (1.0, 0.0), (0.0, 1.0), (1.0, 1.0), (0.25, 0.5), (0.75, 0.5)]

# connect the points into N-gons


connec = connect.([(1, 2, 6, 5), (2, 4, 6), (4, 3, 5, 6), (3, 1, 5)])

# 2D mesh made of N-gon elements


mesh = SimpleMesh(points, connec)

4 SimpleMesh
6 vertices
├─ Point(x: 0.0 m, y: 0.0 m)
├─ Point(x: 1.0 m, y: 0.0 m)
├─ Point(x: 0.0 m, y: 1.0 m)
├─ Point(x: 1.0 m, y: 1.0 m)
├─ Point(x: 0.25 m, y: 0.5 m)
└─ Point(x: 0.75 m, y: 0.5 m)
4 elements
├─ Quadrangle(1, 2, 6, 5)
├─ Triangle(2, 4, 6)
├─ Quadrangle(4, 3, 5, 6)
└─ Triangle(3, 1, 5)

The connect function takes a tuple of indices and a geometry type, and produces a connectivity
object. The geometry type can be omitted, in which case it is assumed to be a Ngon , i.e., a
polygon with N sides:

c = connect((1, 2, 3))

Triangle(1, 2, 3)

This connectivity object can be materialized into an actual geometry with a vector of points:

materialize(c, [Point(0, 0), Point(1, 0), Point(1, 1)])

Triangle
├─ Point(x: 0.0 m, y: 0.0 m)
├─ Point(x: 1.0 m, y: 0.0 m)
└─ Point(x: 1.0 m, y: 1.0 m)

The SimpleMesh uses the materialize function above to construct geometries on the fly,
similar to what we have seen with the CartesianGrid :
:
mesh[1]

Quadrangle
├─ Point(x: 0.0 m, y: 0.0 m)
├─ Point(x: 1.0 m, y: 0.0 m)
├─ Point(x: 0.75 m, y: 0.5 m)
└─ Point(x: 0.25 m, y: 0.5 m)

Don’t worry if you feel overwhelmed by these concepts. We are only sharing them here to give
you an idea of how complex 3D domains are represented in the framework. You can do
geospatial data science without ever having to operate with these concepts explicitly.

Let’s make a few important remarks:

Flexibility comes with a price. To construct a SimpleMesh of connected geometries we


need to explicitly create a vector of vertices, and connect these vertices into geometries
using their indices in the vector.
Geometries in a SimpleMesh can be of different type. In the example, we have both
Triangle and Quadrangle geometries in the domain. This is similar to what we had with
GeometrySet , but now the geometries are connected.
SimpleMesh are rarely constructed by hand. They are often the result of a sophisticated
geometric processing pipeline that is already stored in a file on disk.

The last missing piece of the puzzle is the combination of tables with domains into geospatial
data, which we discuss next.

1.2.3 Data
Wouldn’t it be nice if we had a representation of geospatial data that behaved like a table as
discussed in the Tables section, but preserved topological information as discussed in the
Domains section? In the GeoStats.jl framework, this is precisely what we get with the georef
function:

using GeoStats

df = DataFrame(
NAME=["John", "Mary", "Paul", "Anne"],
AGE=[34.0, 12.0, 23.0, 39.0]u"yr",
HEIGHT=[1.78, 1.56, 1.70, 1.80]u"m",
GENDER=["male", "female", "male", "female"]
)

grid = CartesianGrid(2, 2)

geotable = georef(df, grid)


:
4×5 GeoTable over 2×2 CartesianGrid

NAME AGE HEIGHT GENDER geometry

Categorical Continuous Continuous Categorical Quadrangle

[NoUnits] [yr] [m] [NoUnits] Cartesian{NoDatum}

John 34.0 yr 1.78 m male Quadrangle((x: 0.0 m, y: 0.0 m), ..., (x: 0.0 m, y: 1.0 m))

Mary 12.0 yr 1.56 m female Quadrangle((x: 1.0 m, y: 0.0 m), ..., (x: 1.0 m, y: 1.0 m))

Paul 23.0 yr 1.7 m male Quadrangle((x: 0.0 m, y: 1.0 m), ..., (x: 0.0 m, y: 2.0 m))

Anne 39.0 yr 1.8 m female Quadrangle((x: 1.0 m, y: 1.0 m), ..., (x: 1.0 m, y: 2.0 m))

Tip for all users


The framework is integrated with the Unitful.jl module. To add the unit “meter” to the numeric value
1.0 , we can write 1.0u"m" . Similarly, we can add units to vectors of values:

[1.0, 2.0, 3.0]u"m"

3-element Vector{Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}}:


1.0 m
2.0 m
3.0 m

It is also possible to load units explicitly to avoid the “u” prefix:

using Unitful: m, ft

1.0m + 2.0ft

The function combines any table with any domain into a geospatial data representation that
adheres to the Tables.jl interface. We call this representation a GeoTable to distinguish it from a
standard table. Besides the original columns, the GeoTable has a special geometry column with
the underlying domain:

names(geotable)

5-element Vector{String}:
"NAME"
"AGE"
"HEIGHT"
"GENDER"
"geometry"
:
Unlike a standard table, the GeoTable creates geometries on the fly depending on the data
access pattern. For example, we can request the first measurement of the GeoTable and it will
automatically construct the corresponding Quadrangle :

geotable[1,:]

(NAME = "John", AGE = 34.0 yr, HEIGHT = 1.78 m, GENDER = "male", geometry =
Quadrangle((x: 0.0 m, y: 0.0 m), ..., (x: 0.0 m, y: 1.0 m)))

If we request a subset of measurements, the GeoTable will avoid unnecessary creation of


geometries, and will instead return a view into the original data:

geotable[1:3,["NAME","AGE"]]

3×3 GeoTable over 3 view(::CartesianGrid, 1:3)

NAME AGE geometry

Categorical Continuous Quadrangle

[NoUnits] [yr] Cartesian{NoDatum}

John 34.0 yr Quadrangle((x: 0.0 m, y: 0.0 m), ..., (x: 0.0 m, y: 1.0 m))

Mary 12.0 yr Quadrangle((x: 1.0 m, y: 0.0 m), ..., (x: 1.0 m, y: 1.0 m))

Paul 23.0 yr Quadrangle((x: 0.0 m, y: 1.0 m), ..., (x: 0.0 m, y: 2.0 m))

Finally, if we request the entire geometry column, we get back the original domain:

geotable[:,"geometry"]

2×2 CartesianGrid
├─ minimum: Point(x: 0.0 m, y: 0.0 m)
├─ maximum: Point(x: 2.0 m, y: 2.0 m)
└─ spacing: (1.0 m, 1.0 m)

Besides the data access patterns of the DataFrame , the GeoTable also provides an advanced
method for retrieving all rows that intersect with a given geometry:

geotable[Segment((0, 0), (2, 0)), :]

2×5 GeoTable over 2 view(::CartesianGrid, [1, 2])

NAME AGE HEIGHT GENDER geometry

Categorical Continuous Continuous Categorical Quadrangle


:
[NoUnits] [yr] [m] [NoUnits] Cartesian{NoDatum}

John 34.0 yr 1.78 m male Quadrangle((x: 0.0 m, y: 0.0 m), ..., (x: 0.0 m, y: 1.0 m))

Mary 12.0 yr 1.56 m female Quadrangle((x: 1.0 m, y: 0.0 m), ..., (x: 1.0 m, y: 1.0 m))

This method is very useful to narrow the region of interest and quickly discard all measurements
that are outside of it. For instance, it is common to discard all “pixels” outside of a polygon
before exporting the geotable to a file on disk.

Notice that the GeoTable representation is general enough to accommodate both “raster data”
and “vector data” in traditional GIS. We can create very large rasters because the
CartesianGrid is lazy:

georef(
(
R=rand(1000000),
G=rand(1000000),
B=rand(1000000)
),
CartesianGrid(1000, 1000)
)

1000000×4 GeoTable over 1000×1000 CartesianGrid

R G B geometry

Continuous Continuous Continuous Quadrangle

[NoUnits] [NoUnits] [NoUnits] Cartesian{NoDatum}

0.95311 0.540418 0.766193 Quadrangle((x: 0.0 m, y: 0.0 m), ..., (x: 0.0 m, y: 1.0 m))

0.207471 0.626717 0.537383 Quadrangle((x: 1.0 m, y: 0.0 m), ..., (x: 1.0 m, y: 1.0 m))

0.80495 0.372314 0.667564 Quadrangle((x: 2.0 m, y: 0.0 m), ..., (x: 2.0 m, y: 1.0 m))

0.211939 0.279282 0.278076 Quadrangle((x: 3.0 m, y: 0.0 m), ..., (x: 3.0 m, y: 1.0 m))

0.786862 0.348092 0.970068 Quadrangle((x: 4.0 m, y: 0.0 m), ..., (x: 4.0 m, y: 1.0 m))

0.88911 0.118394 0.805253 Quadrangle((x: 5.0 m, y: 0.0 m), ..., (x: 5.0 m, y: 1.0 m))

0.521314 0.475458 0.463105 Quadrangle((x: 6.0 m, y: 0.0 m), ..., (x: 6.0 m, y: 1.0 m))

0.595226 0.108076 0.944947 Quadrangle((x: 7.0 m, y: 0.0 m), ..., (x: 7.0 m, y: 1.0 m))

0.639352 0.895026 0.585009 Quadrangle((x: 8.0 m, y: 0.0 m), ..., (x: 8.0 m, y: 1.0 m))

0.930454 0.786526 0.440901 Quadrangle((x: 9.0 m, y: 0.0 m), ..., (x: 9.0 m, y: 1.0 m))

⋮ ⋮ ⋮ ⋮

And can load vector geometries from files that store simple features using the GeoIO.jl module:
:
using GeoIO

GeoIO.load("data/countries.geojson")

177×3 GeoTable over 177 GeometrySet

COUNTRY REGION geometry

Categorical Categorical MultiPolygon

[NoUnits] [NoUnits] GeodeticLatLon{WGS84Latest}

Fiji Melanesia Multi(3×PolyArea)

Tanzania Eastern Africa Multi(1×PolyArea)

W. Sahara Northern Africa Multi(1×PolyArea)

Canada Northern America Multi(30×PolyArea)

United States of America Northern America Multi(10×PolyArea)

Kazakhstan Central Asia Multi(1×PolyArea)

Uzbekistan Central Asia Multi(1×PolyArea)

Papua New Guinea Melanesia Multi(4×PolyArea)

Indonesia South-Eastern Asia Multi(13×PolyArea)

Argentina South America Multi(2×PolyArea)

⋮ ⋮ ⋮

Note
The “data” folder is stored on GitHub. Check the Preface for download instructions.

We will see more examples of “vector data” in the chapter Interfacing with GIS, and will explain
why file formats like Shapefile.jl and GeoJSON.jl are not enough for advanced geospatial data
science.

Tip for all users


The GeoStats.jl module reexports the full stack of modules for geospatial data science in Julia. There is
no need to import modules like Meshes.jl explicitly. You are all set if you start your script with

using GeoStats

Tip for advanced users


In Julia, a function is type-stable if the return type is known at compile time. Since the GeoTable has
columns from the original table (e.g., numbers) and an additional special geometry column, the access
:
to the data with the DataFrame syntax is not type stable. If you need to write type-stable code, use the
functions values and domain instead:

values(geotable)

4×4 DataFrame

Row NAME AGE HEIGHT GENDER

String Quantity… Quantity… String

1 John 34.0 yr 1.78 m male

2 Mary 12.0 yr 1.56 m female

3 Paul 23.0 yr 1.7 m male

4 Anne 39.0 yr 1.8 m female

domain(geotable)

2×2 CartesianGrid
├─ minimum: Point(x: 0.0 m, y: 0.0 m)
├─ maximum: Point(x: 2.0 m, y: 2.0 m)
└─ spacing: (1.0 m, 1.0 m)

1.3 Remarks

What did we learn in this chapter?

Geospatial data can be efficiently represented as a GeoTable . This representation is


universal, meaning it combines the “raster” and “vector” representations found in
traditional GIS.
A GeoTable provides all the data access patterns of a DataFrame . In particular, it supports
the syntax geotable[rows,cols] without expensive copies of geometries.
Working with geospatial domains as opposed to working with vectors of disconnected
geometries has major benefits. In particular, it preserves topological information.

It is very convenient to manipulate a GeoTable as if it was a DataFrame . Nevertheless, we will


learn that advanced geospatial data science requires higher-level constructs to preserve
geospatial information. We will cover these constructs in Part II and Part III of the book.
:

You might also like