Thesis Access
Thesis Access
Quality Data
by
A thesis
submitted to the Victoria University of Wellington
in fulfilment of the
requirements for the degree of
Master of Science
in Computer Science.
I would also like to thank Dr Guy Coulson from the National Institute of
Water & Atmospheric Research Ltd (NIWA) for helpful discussions and
support during the research study.
Sincere thanks to all of the participants who volunteered their time and
effort to assist.
iii
iv
Contents
1 Introduction 1
1.1 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Background 7
2.1 Air Pollution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Information Visualization . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Taxonomies . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.2 Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.3 Drill Down . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.4 Breadth First . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.5 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 Evaluation Methods . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.1 Visualization User Study . . . . . . . . . . . . . . . . . 24
2.3.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3 AtmoVis 31
3.1 AtmoVis: Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1.1 Personas . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.1.2 Kath: Air Quality Scientist Persona . . . . . . . . . . . 34
3.1.3 Oliver: Data Analyst Persona . . . . . . . . . . . . . . 35
3.1.4 Mathew: Student Persona . . . . . . . . . . . . . . . . . 36
v
vi CONTENTS
4 User Study 59
4.1 Study Participants . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2 Study Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.2.1 Study Methodology . . . . . . . . . . . . . . . . . . . . 60
4.2.2 Pilot Study . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2.3 Main Study . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.3 Study Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.3.1 Task 1: Mapping The Data . . . . . . . . . . . . . . . . 63
4.3.2 Task 2: Aggregate Data . . . . . . . . . . . . . . . . . . 64
4.3.3 Task 3: Parallel Coordinate Data Comparison . . . . 65
4.3.4 Task 4: Temporal Pattern . . . . . . . . . . . . . . . . . 65
4.4 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.5 Measuring Effectiveness . . . . . . . . . . . . . . . . . . . . . . 67
CONTENTS vii
5 Results 71
5.1 Pilot Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.1.1 Heat Calendar . . . . . . . . . . . . . . . . . . . . . . . . 73
5.1.2 Line Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.1.3 Site View . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.1.4 Monthly Averages . . . . . . . . . . . . . . . . . . . . . 74
5.1.5 Data Comparison . . . . . . . . . . . . . . . . . . . . . . 74
5.1.6 Windowing . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.1.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.2 Main User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.2.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.3 Visualization Evaluation . . . . . . . . . . . . . . . . . . . . . . 80
5.3.1 Visualization Effectiveness . . . . . . . . . . . . . . . . 81
5.3.2 Heat Calendar . . . . . . . . . . . . . . . . . . . . . . . . 82
5.3.3 Line Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.3.4 Monthly Rose . . . . . . . . . . . . . . . . . . . . . . . . 91
5.3.5 Site View . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.3.6 Monthly Averages . . . . . . . . . . . . . . . . . . . . . 101
5.3.7 Data Comparison . . . . . . . . . . . . . . . . . . . . . . 105
5.3.8 Windowing . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.4.1 Accuracy of AtmoVis for Persona Scenarios . . . . . 111
5.4.2 Heat Calendar . . . . . . . . . . . . . . . . . . . . . . . . 115
5.4.3 Line Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.4.4 Monthly Rose . . . . . . . . . . . . . . . . . . . . . . . . 117
5.4.5 Site View . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.4.6 Monthly Averages . . . . . . . . . . . . . . . . . . . . . 119
5.4.7 Data Comparison . . . . . . . . . . . . . . . . . . . . . . 121
5.4.8 Windowing . . . . . . . . . . . . . . . . . . . . . . . . . . 122
viii CONTENTS
6 Conclusions 127
6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.2 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . 130
6.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Appendices 135
1.1 A temperature inversion layer with hot air above and cold
air underneath. The inversion layer prevents particulate
matter from rising. Drawn By: B. Powley. . . . . . . . . . . . 4
ix
x LIST OF FIGURES
3.4 A heat calendar with the mouse hovered over a day, the
average ozone pollution for that day is shown at the bottom
of the screen. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.8 The monthly averages for a given site with the mouse hov-
ering over a label, the label can be dragged and dropped to
relate the data to other visualizations. . . . . . . . . . . . . . . 53
5.1 The window frame border for the visualizations and the op-
tions panel. The top image is the frame for the visualization
with a highlighted gear icon which the mouse has hovered
over. The bottom image is the frame for the options panel
with a back button instead of a gear icon. . . . . . . . . . . . . 76
5.2 Effectiveness of visualization measured by the post-study
questionnaire Likert scales aggregated by the method de-
scribed in Section 5.3.1 . . . . . . . . . . . . . . . . . . . . . . . 81
5.3 Heat Calendar: Likert scales from post-study questionnaire. 84
5.4 Line Plot: Likert scales from post-study questionnaire. . . . 88
5.5 Monthly Rose: Likert scales from post-study questionnaire. 92
5.6 Site View: Likert scales from post-study questionnaire. . . . 98
5.7 Monthly Averages: Likert scales from post-study question-
naire. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.8 Data Comparison: Likert scales from post-study question-
naire. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.9 Four different usage styles in AtmoVis, Figure 5.9a shows
window splattering, Figure 5.9b shows border to border care-
ful tiling, Figure 5.9c shows a near maximized window and
Figure 5.9d shows a stack of windows. . . . . . . . . . . . . . 123
xii LIST OF FIGURES
Chapter 1
Introduction
There are several different air pollutants in the atmosphere including par-
ticulate matter, ozone, nitrogen oxides and sulphur dioxide that can cause
health issues for people. These pollutants have a variety of different ad-
verse health effects [15]. For example the health effects from particulate
matter include cardiovascular disease [18] and respiratory mortality [31].
Sources of air pollution include traffic, domestic fires, industrial sources,
and shipping [15]. In order to understand the effects of air pollution on
health and on the environment, data containing air pollution information
needs to be collected and analysed. Analyzing a large multivariate dataset
is difficult, a thorough exploration of the data is needed to find relation-
ships in air pollution which are suitable for further analysis.
Data is visualized by looking at an image representation of the data,
the image can be produced by any method and the data is interpreted and
understood using human perception to produce an understanding of the
image [56]. Exploration of a dataset using a visualization can help to form
questions about data even when there is little knowledge of what the data
is like at the start of the process [28]. In this thesis a prototype system for
visualizing spatio-temporal data related to air quality is designed, built,
and evaluated. Evaluating the visualization system through a user study
determines the effectiveness of the system for answering questions related
1
2 CHAPTER 1. INTRODUCTION
to air quality and the results of the study could be used to inform the
design of other visualization systems for environmental data. A novel
web-based exploratory visualization system, AtmoVis, was developed as
part of this research. AtmoVis was designed for breadth first exploration
of air quality data.
Data visualization can improve on the current methods for data anal-
ysis by making data more understandable and by providing an interac-
tive way to search, filter, and find relationships within the dataset. The
data collected on air pollution has some problems and limitations that
can make data analysis challenging. Research sites can collect different
variables over different time frames. For example, upgrades at a mea-
surement site can cause different data collection methods to be used for
a given substance. Not all of the variables are collected at every site and
some sites are only partially complete. Finding data collected using a vi-
sualization interface would be easier because the sites collecting a given
pollutant could be visualized. The National Institute of Water and At-
mospheric Research (NIWA) has provided a dataset containing air quality
and meteorological data for the purpose of developing an air quality visu-
alization interface. NIWA is a Crown Research Institute that performs re-
search about New Zealand climate change and weather forecasting [6, 7].
The atmospheric dataset provided by NIWA for this research has a large
number of sparsely collected variables over a range of time frames. In
New Zealand there are some key locations where air quality is poor and
there is seasonal variation in air quality, though in general air quality is
good. For example, there are sites located at Auckland, Taupo, Welling-
ton, Christchurch, and Dunedin. An effective visualization system would
improve the analysis of the dataset provided by allowing the sites in the
dataset to be explored interactively and analyzed through a graphical in-
terface. An exploratory data visualization system could potentially help to
form questions about air quality in New Zealand for future research. Air
quality experts were interviewed to create personas and the personas were
3
used to identify the goals and functionality of the AtmoVis system. Pilot
testing was conducted to identify necessary adjustments to system func-
tionality and evaluation protocol before a main user study was conducted
on air quality experts and participants with experience in Geographic In-
formation Systems (GIS).
Figure 1.1: A temperature inversion layer with hot air above and cold air
underneath. The inversion layer prevents particulate matter from rising.
Drawn By: B. Powley.
analyze the data, however, spreadsheets are limited to the general visual-
izations that are provided in the software which are not developed specif-
ically for air quality analysis. AtmoVis is a domain specific air quality
application developed specifically for the dataset and allows the data to
be explored interactively rather than by modifying scripts.
1.2 Contributions
The following list outlines the contributions of this thesis.
Chapter 3: AtmoVis The personas for the target audience are described.
System goals and the design analysis of the AtmoVis system are also
presented.
Chapter 4: User Study The user study design is described and related to
the personas and design of the AtmoVis interface.
Chapter 5: Results The results of the pilot study and main user study are
presented then discussed.
Background
This chapter reviews and discusses previous methods used for the visual-
ization of air quality, interface design used for visualization systems and
evaluation techniques for measuring the effectiveness of a visualization
system. Visualization is the process of interpreting data to produce an un-
derstanding of the information [56].
7
8 CHAPTER 2. BACKGROUND
(a) The “calendar view” heat map from Zhou et. al. [69].
Figure 2.1: Two different methods for drawing calendar heat maps.
cities on a grid map so that the calendars did not overlap [69]. The min-
imum spanning tree was used in a similar way to a force directed layout
system. When the minimum spanning tree was not used the close cities
had obscured calendar values. The minimum spanning tree was an ef-
fective way to re-position the heat maps to avoid overlap. The calendar
displayed as a heat map would be useful for displaying a time series for
air quality in New Zealand in the present study. The circular heat map
2.1. AIR POLLUTION 9
Figure 2.2: PM10 U-Air machine learning compared with linear interpola-
tion between monitoring sites [68]. The linear interpolation is less acurate.
The green at the top right monitoring site has been inaccurately spread out
over a large area.
10 CHAPTER 2. BACKGROUND
Figure 2.3: AOT interpolated over Makkah Mina and Arafah in Saudi Ara-
bia using ordinary Kriging interpolation [45].
2.2.1 Taxonomies
Creating a taxonomy is a way of categorizing how visualization systems
allow the user to interact with a visualization in order to explore data and
find information in the system. Comparing a visualization to a taxonomy
is a way of ensuring that necessary functionality has been included in the
system. A taxonomy was created by Yi et al. [66] to classify user interac-
tion techniques for visualization systems. The taxonomy categorizes tech-
niques according to whether they select, explore, reconfigure, encode, abstract/
2.2. INFORMATION VISUALIZATION 15
2.2.2 Exploration
(a) The word cloud used by the TIARA system for describing emails inserted into the
system [62].
The TIARA , Voyager, Visage and Tioga-2 systems were designed with
a “drill-down” design principle [16, 49, 62, 65]. In a “drill-down” design
a user can select variables to filter and extract information. The user in-
terface of the TIARA system is based on a “stacked graph” visualization
of keywords from the text [62]. The drill down can be performed by se-
lecting words on the visualization. The Tioga-2 user interface supports
“drill-down” behaviour using “wormholes” and by performing the “Set
Range”, “Overlay”, and “Shuffle” operations on relations and composites.
A wormhole allows a user to jump from one canvas to another canvas. Set
range filters the displayed relation, overlay combines two composites, and
shuffle changes the drawing order for relations [16]. The use of data ex-
ploration systems can be compared to the taxonomy performed by Shnei-
derman [52]. Both the TIARA system and the Tioga-2 system made use of
the drill down principle.The drill-down principle would be useful in the
design of a visualization system prototype for air quality monitoring data
so that monitoring sites can be presented in an overview and then filtered
and to find sub selections.
The Visage system allows users to drill down through variables in a
table. The user can select variables from a menu to filter the display. Rela-
tions between database objects can also be used to drill down [49]. Data in
the Lyra system can be adjusted with filters and other transformations [51].
The other tools such as the Polaris [57] and Trelliscope [30] systems have
made use of filters and transformations to adjust data.
2.2.5 Metrics
2.3.2 Summary
“Breadth-first” The Voyager, Trelliscope, and Polaris [30, 57, 65] systems
reviewed had a “breadth-first” design principle. Breadth first design
is suitable for the prototype system because it allows a large number
of sites in the dataset to be explored inspecting a pollution variable
first. This approach allows the dataset to be searched for areas with
the highest pollution instead of looking at areas first and drilling into
the areas to find out how much pollution is at the one location. “The
breadth-first” method assists with exploration so that new areas of
interest can be discovered based on the pollution level.
“Drill down” The TIARA, Voyager, Visage and Tioga-2 [16, 49, 62, 65] sys-
tems had a “drill-down” design principle which allows data points
to be filtered and selected based on criteria. Visualizations inside an
air quality data visualization system could contain filter criteria for
drilling down and sub selecting the data.
“Database Interaction” The RB++, Tioga-2, FacetMap and Polaris [16, 55,
57, 67] systems operate on data from a relational database. Gener-
ating queries from a web based front end would assist in manag-
ing large amounts of data by retrieving subsets of the data available.
The air quality prototype visualization system will use visualizations
which query a database to reduce the amount of data loaded when
the visualization system starts.
2.3. EVALUATION METHODS 29
AtmoVis
31
32 CHAPTER 3. ATMOVIS
ploring a large number of variables in a dataset [65]. Figure 3.1 shows the
AtmoVis interface with different visualizations open demonstrating an ex-
ample session.
3.1.1 Personas
A persona is a fictitious person that can help to build an understanding of a
group of users and the necessary functionality of the system [43]. Personas
can help to build an understanding of a user’s goals and requirements
and to build empathy with the target group [44]. Several personas can
be used to design a system when there are several groups of users with
different goals. Using a persona reduces the risk of a software developer
building a system targeting themselves rather than the target group [47].
For this project, some preliminary interviews were carried out with two
air quality experts from the Ministry For Environment (MFE) and from
NIWA to help determine what sorts of systems were already in place and
were used to identify the functional requirements of AtmoVis for use in
developing personas. A persona consists of a personal description of the
3.1. ATMOVIS: DESIGN 33
target audience, goals that the persona wants to achieve, and scenarios
describing possible uses for the system.
Kath is an air quality scientist. She is 40 years old and has a Bachelor’s
degree. Kath dresses smartly, she wears tidy black pants, a light coloured
shirt and dark work shoes. Kath is tidy at work, likes coffee and sees her-
self as a casual person. Her motivation for using AtmoVis is to reduce
the amount of time taken to produce relevant plots for air pollution. She
is interested in technology and has some basic experience using statistics
software. Kath has previously used parallel coordinate plots to identify
correlation, and wind rose plots to identify spatial patterns, however, it
is time-consuming to produce the correct data plots and a more interac-
tive solution would be preferable as an interactive tool could also be used
to introduce the dataset to a junior scientist. Kath has access to several
datasets on air quality as well as information about the emission sources
of air pollution to supplement the use of the system.
Kaths main goals:
Goal 3 To communicate air pollution findings with data analysts and policy
experts.
Scenario 1 A policy change was implemented which limits the output of SO2
pollution from a given industrial source. Kath wants to find out
whether the change has an effect on directional pollution in a given
area.
Scenario 2 Two pollutants are believed to be produced by the same source. Kath
wants to find out how related the pollutant levels are in a given re-
gion.
Goal 2 Communicate efficiently with scientists who work in the air quality
field.
Scenario 1 Oliver wants to find out when a particular region is breaching air
quality standards for SO2 . The software tool should make peak pol-
lution levels visible.
The air quality data could be read into a spreadsheet, however, finding
interesting features to comment on is difficult. The tool has an exploratory
feel which could reduce the amount of time needed to find interesting fea-
tures.
Mathew’s main goals:
Scenario 1 Mathew opens up the web page for the visualization tool looking at
the Auckland region. He wants to use the visualization to explore
the monitoring sites at the surrounding area and find out what pol-
lutants are available to view and analyse, then find a trend over time
for a particularly interesting pollutant.
The following system goals were developed by analysing persona air qual-
ity focused scenarios and goals.
38 CHAPTER 3. ATMOVIS
System Goal
Persona
1 2 3 4 5
Kath: Scenario 1 yes
Kath: Scenario 2 yes yes
Mathew: Scenario 1 yes yes
Mathew: Scenario 2 yes yes
Oliver: Scenario 1 yes
Oliver: Scenario 2 yes yes
3.2 Architecture
The system architecture describes how the components of AtmoVis inter-
act with each other at an abstract level. AtmoVis has a front end and a
back end so that the system can be used as a web-based visualization re-
source with the front end in the browser and the back end on the server.
The architecture diagram (Figure 3.2) also describes how the air quality
data was pre-processed for insertion into the database which is contained
within the back-end. The system was designed incrementally using sev-
eral prototypes and an evaluation review before the final version.
3.2. ARCHITECTURE 39
3.2.1 Front-End
3.2.2 Back-End
with Excel and Python, before insertion into the database using the Python
Pandas library. The database interfaces with flask so that the database is
not directly exposed to the front end. This ensures that the Flask server is
producing the SQL queries.
R was also used as part of the back end. R is a statistical package which
can produce visualizations through the use of different R libraries. The
Openair library is an R library which can produce static visualizations of
air quality data. Running an R web server using the plumber API [9] al-
lows the visualizations produced by Openair to be sent to the front end.
Static visualizations from openair were made interactive in the front end
using JavaScript. The Python flask back end communicates with R to re-
quest visualizations for the front end to display. The Plotly R library also
runs on the R back end and produces interactive plots for display using
the plotly JavaScript API [54]. The R back end retrieves data through flask
so that there is only one interface to the database.
3.3 Implementation
The following section compares different libraries and programming tools
for implementing visualizations, and discusses the choices that were made
for the libraries used to implement AtmoVis.
D3 [20], Protovis [19] and VEGA [12] are three systems for building vi-
sualizations inside a web browser. D3 and Protovis are embedded DSL’s in
JavaScript [19, 20]. D3 is implemented as a JavaScript library [20]. VEGA is
based on JSON [12]. D3, Protovis, and VEGA have all taken a declarative
approach to the specification of a visualization [12, 19, 20]. VEGA’s JSON
grammar is interpreted by a JavaScript library that can target HTML5 Can-
vas or SVG for rendering [12, 19, 20]. The JavaScript library can be used
to interact with the JSON specified visualization through the use of sig-
nals, listeners, and other methods [12]. In D3 the visualization is specified
in JavaScript [20]. Selectors are an important part of D3. DOM nodes
42 CHAPTER 3. ATMOVIS
can be selected, then data can be bound to nodes. There are methods
that select different parts of the binding: an enter method, which selects
data with no node attached, and an exit method, which selects nodes with
no data attached. The enter and exit methods are used to assist with the
process of updating a visualization [20]. The Protovis visualization can
target HTML5, SVG, and Flash for rendering targets [19]. Primitives in
the Protovis DSL are called marks which can be Area, Bar, Dot, Wedge,
Rule, Link and Label [19]. Panels are used to nest content into more com-
plex visualizations [19]. JavaScript can be used to register event handlers
with marks so that user interaction can be added [19]. The VEGA visual-
ization grammar uses signals, data, scales, projections, axes, legends and
marks. The signals are used to add interaction from the mouse or from
values inserted by the JavaScript runtime. The data specifies how infor-
mation is loaded into the system for visualization. Data can be filtered,
imputed, or adjusted by a variety of operations on being loaded. Scales
allow transformations to be applied to the data that specify pixels colours
and sizes when the data is drawn. The axes are labelled co-ordinate axes,
and legends can be used to indicate scaling, colour and other attributes.
The marks indicate how data should be drawn by the system [12]. D3
was the visualization system chosen for implementing AtmoVis because
D3 works well with other web technologies and is more customizable than
the VEGA grammar.
The Shiny library [25] for R can be used to build interactive web-based
interfaces. Also, the Plotly R library can be used to produce interactive
R charts [54]. Shiny runs a web server from R and only supports the use
of R as a programming environment. However, Plotly provides API’s for
other programming environments and an account can be produced to host
charts and data online [54]. Plotly has a JSON based grammar [54] for
specifying plots in a similar way to how plots can be specified in VEGA [12]
however the grammar is not required as plotly can convert some charts
built in ggplot2 adding interaction. Plotly supports a wide variety of dif-
3.3. IMPLEMENTATION 43
ferent charts, including histograms, heat maps, contour plots and line
charts since plotly makes use of the D3 library underneath, many plot
types can be extended using D3. For this project, D3 was used to extend
a Plotly histogram plot to add drag and drop functionality for the his-
togram bar labels. The drag and drop functionality allowed the histogram
plot to interact with other plot types by changing the pollutant drawn by
the other plots.
3.3.1 Data
Input data was provided as separate Excel spreadsheets which were or-
ganized folders with a folder for each region. There was a spreadsheet
for each monitoring site that contained hourly measurements and there
were also some spreadsheets containing daily averages. The spreadsheets
contained two worksheets, a worksheet with metadata and a worksheet
with the actual recorded information. The metadata contains information
about the monitoring site, for example, the start and finish times for the
measurements contained in the data from the site. The data measured
was not complete, there were missing values and the time frames that the
measurements are recorded over were different for each site. The data was
converted to CSV format before processing, Additionally, there was some
re-tabulation for several sites in the dataset.
The Pandas API [11] was used to process the input data from the CSV
spreadsheets into an SQL database. Additional tables were produced for
the units of measurement and the site names. When reading in the data us-
ing the Pandas API there were data entry errors. The use of the NZTM_X
and NZTM_Y parameters was inconsistent and, there were measurements
which were clearly invalid. Extreme outliers were removed and a basic
check to ensure that the coordinates were within the expected parameters
was performed. There were several different ways of representing coor-
dinates so all coordinates were converted to latitude and longitude before
44 CHAPTER 3. ATMOVIS
being read into the database. The Pyproj tool was used to convert the coor-
dinates [10]. The data visualization system needed a map of New Zealand
so Carto [24] was used to provide a street map layer for the Leaflet [5]
map.
3.4.1 Windowing
In order to implement an exploratory interface for AtmoVis which sup-
ports many different visualizations, a way of viewing and adding visual-
izations to a canvas was required. A windowing system was implemented
using D3 which provides a draggable window border. The window border
(Figure 3.3 a) can be dragged anywhere on the main canvas however the
canvas is subdivided into a grid so that space is used efficiently. Dragging
and dropping a window frame snaps the window onto the nearest loca-
tion on the grid. The windows are resizeable. There is a draggable triangle
(Figure 3.3 b) in the corner which changes the window size. The window
frame also provides two buttons, a delete button (Figure 3.3 c) and an op-
tions panel button (Figure 3.3 d). Clicking on the window border raises
the window to the top. The window border contains a text title briefly
summarizing the visualization (Figure 3.3 e). For example, the line plot vi-
sualization contains a text title summarizing the pollutant which is being
drawn. The Google material icons library [4] was used to provide inter-
face icons. Visualizations can provide both a main visualization panel and
an options panel accessible through the options button. The delete button
removes the visualization from the canvas and is styled with an ‘x’. The
options panel button is styled with a gear when the visualization is dis-
played, or a back arrow when the options panel is displayed (Figure 3.3
f). During the pilot testing, the gear icon was replaced with a back ar-
row when the options panel is visible, leaving the gear icon on the options
panel made the layout confusing.
3.4. USER INTERFACE 45
There were two options considered for the windowing. One option
was to use floating windows and the other was to use tiles. In a tiling
layout, the grid can be selected or subdivided into panels and the visual-
izations occupy the different tiles. The tiles do not necessarily have a top
window border with buttons and tiles can be split or merged to change
the layout. The windowed option was chosen because windows are more
intuitive than tiling and the tiling option would leave some functional-
ity hidden from view. A tiling window layout was developed but not
used. The tiling layout allowed regions of the interface to be selected and
merged before visualizations were dropped onto the canvas. An addi-
tional mode was needed to move around the visualizations.
46 CHAPTER 3. ATMOVIS
AtmoVis has control options for the time shown on all of the visualiza-
tions. The play button (Figure 3.3, g) was provided so that temporal trends
in the data can be inspected. The advantage of using a play button is that
several plots fixed to the same date can be added to the canvas area and
then animated with the play button to show the temporal data. A text area
containing the exact date and time is provided (Figure 3.3, h) so that dates
and times can be inserted to change the date and time selected. In my
own experience, the play button alone was too restrictive because it was
difficult to find interesting sections of data without playing right through,
so the heat calendar can be used with the play button to make data more
explorable. The load button (Figure 3.3, i) is provided so that the data
can be reloaded and a notification becomes visible when the data has been
reloaded which can confirm the absence of data to a user. A menu is pro-
vided (Figure 3.3, j) which contains a list of the visualizations which can
be inserted by clicking once, or by clicking and dragging into the desired
position on the interface. A menu button (Figure 3.3, k) is provided to hide
and show the menu.
The heat calendar (Figure 3.4) draws a yearly calendar for a single variable
at a monitoring site. The day is coloured according to the mean value of
the pollution measurements for that day on the calendar. Hovering the
mouse over a day shows the mean value of the pollution on that day in
a labeled text field under the calendar. Clicking on the day will change
the time slider on the user interface to the start of that day. The month
headings can also be clicked. Clicking on a month will change the calendar
from a yearly calendar to a single calendar month.
3.4. USER INTERFACE 47
Figure 3.4: A heat calendar with the mouse hovered over a day, the aver-
age ozone pollution for that day is shown at the bottom of the screen.
The heat calendar improves the utility of the data visualization tool by
allowing days of interest to be found quickly and efficiently. Several cal-
endars can be displayed at once in different windows and clicking on the
title of the month will display a monthly view removing information that
is not required for the analysis. The heat calendar (Figure 3.4) shows the
mean daily O3 pollution at Musick Point in 2012. The mouse is positioned
over 29th of July 2012 and there is a small blue box showing that the day
has been selected and the mean value for O3 on that day is 72.19μgm−3 .
48 CHAPTER 3. ATMOVIS
Figure 3.5: A line plot showing ozone measurements at Musick Point with
the mouse hovering over a measurement showing that the value of ozone
on 02/01/2012 at 17:00 hours was 40μgm−3 .
The line plot displays one y-axis variable at once from several different
sites so that a site can be dragged from a map onto the line plot without
re-configuring which variables are displayed on the site or on the map
which helps speed up the interaction. Several different line plots can be
viewed within AtmoVis at the same time so that different variables can be
displayed at once. Using one y-axis on each line plot keeps the axis scaling
simple and the plot can be zoomed by using the scroll wheel anywhere
on the visualization. Using more than one y-axis would have made the
zooming more complex. The zoom only scales the y-axis so that the time
scales for different line plots are consistent and easy to compare. The line
plot was implemented in D3 because it allows HTML and SVG elements
to be selected, filtered and updated with a convenient syntax for binding
data. D3 also allows interactive mouse functionality to be used.
50 CHAPTER 3. ATMOVIS
The monthly rose visualization (Figure 3.6) integrates the Openair pol-
lution rose into AtmoVis. The monthly rose plot is not interactive how-
ever other charts can be used to control what is being displayed and the
monthly rose plot can receive a site dragged and dropped from the site
view.
Figure 3.6: The monthly rose diagram shows the concentration of ozone in
colour and the frequency of counts by wind direction as the length of each
sector from the centre.
3.4. USER INTERFACE 51
When a site is dropped onto the monthly rose the pollutant will be
set to match the pollutant of the site view that it was dragged from, if a
pollutant is set for the site view. The options panel for the monthly rose is
the same as the options panel for the line plot so the pollutant can be reset
the same way using the drop-down menu. Using the same options panel
ensures that the two plots behave consistently. The monthly rose plot can
be controlled from the time slider. When the month changes a new rose
will be drawn. In Figure 3.6, a larger proportion of measurements with
O3 between 40 and 61.7μgm−3 is coming from a northerly direction than
from a southerly direction indicating higher O3 concentrations from the
north. Openair runs on an R backend and in order to display the monthly
rose plot (Figure 3.6) a PNG image of a pollution rose plot is produced
and sent to the flask server to be forwarded to the front end. When the
pollution rose plot is received by the front end, it is displayed using the
windowing layout implemented as part of AtmoVis.
The site view (Figure 3.7) has two panels, the main panel containing the
map and an options panel. The main panel contains an interactive map
which can be navigated using the mouse and shows the position of sites
overlayed on the map. The sites displayed are represented as coloured cir-
cles with the colour representing the intensity of the air quality data value.
Hovering over a site location shows information about the site and the
monitoring sites are added to other visualizations by dragging and drop-
ping. The site view is designed to be the only way of adding a monitoring
site to another visualization.
The sites displayed are coloured according to the intensity of the air
quality data value allowing information about air quality to be read from
the map. The options panel for the site allows the pollutant to be selected
and shows information about the variable selected.
52 CHAPTER 3. ATMOVIS
Figure 3.7: The site view centred over Auckland with 19 monitoring sites
visible and 3 of the monitoring sites have ozone data avaliable coloured in
red. The other sites are clear and have no data avaliable.
The site view (Figure 3.7) was implemented using Leaflet [5] and D3.
Leaflet is a JavaScript library for drawing maps, it can draw maps de-
scribed in the geojson format, draw map tiles provided by third parties
such as open street maps, and add positioning icons. Carto [24] was used
as a map provider and the map was based on data and images from open
street maps. The options panel for the site view was implemented using
D3 and allowed the pollutant to be selected. Drag and drop functionality
was implemented using the D3 library to add mouse interaction to the site
so that the site could be dragged between different visualizations.
3.4. USER INTERFACE 53
Figure 3.8: The monthly averages for a given site with the mouse hovering
over a label, the label can be dragged and dropped to relate the data to
other visualizations.
There is also an options panel which allows the data to be filtered. The
options panel works in a similar way to the options panel for the data com-
parison, however, the default settings are different. The monthly averages
plot displays a bar graph with the x-axis showing the variable measured
by the bar, and the y-axis showing the averaged hourly measurements of
the variable indicated on the bar over the month shown on the time se-
lector. The units are displayed on the x-axis because the units for the bars
are dependent on the variable being charted. The labels for the bars can be
dragged and dropped onto other visualizations open in AtmoVis to recon-
figure the visualization to display the variable being dropped. By default,
the monthly averages plot will set all applicable data variables to be shown
by the plot. This behaviour allows the plot to be read without going into
the options to reconfigure. Reconfiguration is only needed to filter the
data to a smaller number of variables. In Figure 3.8, the site monitored is
in Whangaparaoa and the monthly avererage levels of ozone, PM10 mea-
54 CHAPTER 3. ATMOVIS
sured with a BAM and PM2.5 measured with a BAM are displayed. The
month is January 2012 which is displayed on the time selector (not shown).
The Plotly histogram was used because Plotly has built in mouse in-
teraction which allows for histogram bars to be zoomed for a closer com-
parison. There were other options considered for the monthly averages
plot. One option was to implement the histogram in D3 however more
implementation work would have been required as Plotly provides zoom-
ing functionality automatically. Plotly is based on the D3 library and plots
can be extended using D3 [54] . The functionality of the Plotly histogram
was extended to allow the plot to be used as a navigational tool. Mouse
interaction was added to the labels of the histogram bars using D3 selec-
tors. The interactive labels allow variables to be dragged and dropped
onto other visualizations to reconfigure the data displayed.
site as located in Pukekohe. The values for the variables measured at the
site are displayed, NO2 is measured at 1.7μgm−3 and O3 is measured at
28.3μgm−3 . The measurements were taken on 3/1/2012 at 19:00.
Figure 3.9: A data comparison parallel coordinate plot with the mouse
hovering over one measurement to show the details.
determine any categories that can be grouped. When a colour group is not
present it can tell us something about the data as well. Relationships in the
data can be identified by looking for lines which are parallel or intersecting
to show an inverse relationship.
“Overview first” The site view provides an overview of the data and dis-
plays the monitoring sites that are available.
“Zoom” Some of the plots allow zooming. The site view can be zoomed,
so can the Line Plot, Data Comparison plot and the monthly aver-
ages. The study tasks should require users to perform a zoom.
“Filter” Some plots can be filtered using checkboxes, the parallel coordi-
nate plot can use checkboxes to select which coordinate axes to view,
the monthly averages can also use checkboxes to restrict the data
shown as an alternative to the zoom.
“Relate” Points from the site view can be dragged and dropped onto
other plots as a selection mechanism which can relate the data be-
tween the two plots, Also dragging and dropping labels from the
monthly averages plot allows the site view to show different pollu-
tant information. The heat calendar relates to different plot types
by changing the time selected for every visualization in use. There
3.6. SUMMARY 57
3.6 Summary
In this chapter the architecture of a web-based system for visualizing air
quality data, AtmoVis, was presented and the windowing system was de-
scribed. Three personas were created to describe scenarios and goals of
the target audience. These personas and goals were used to create system
goals and the goals were related to the personas to ensure those persona
scenarios were covered by the system goals. The data visualized by At-
moVis contains measurements of air quality and meteorological variables.
The visualizations included in AtmoVis are: Site view, line plot, heat
calendar, monthly rose, monthly averages, and data comparison.
Site View The site view is designed to show an overview of all the sites
with data available and one variable is displayed by shading the
colour of each site according to the intensity of the measurement.
Sites with no measurement are left clear and sites containing a mea-
surement are coloured.
Line Plot The line plot shows a time series trend for an air quality variable
at several different sites.
Heat Calendar The heat calendar displays daily averages for a selected
variable inside each date on the calendar with the colour of the date
indicating the intensity. The heat calendar is interactive and clicking
on a date changes the time selector to that date allowing the heat
calendar to be used to discover time frames with high air pollution
on other visualizations.
58 CHAPTER 3. ATMOVIS
Monthy Rose The monthly rose shows directional trends for a variable
over a monthly time frame. The distance of a sector from the cen-
tre indicates the number of measurements counted for that wind di-
rection and the colour was shaded according to the intensity of the
variable being visualized.
User Study
In this chapter, the user study design, user study participants, data collec-
tion methods, and measures of effectiveness are described. It is important
to perform a user study to develop and inform the system design process.
The objective of this user study was to measure the effectiveness of At-
moVis at presenting information to environmental scientists about New
Zealand air quality so that inferences could be made, and to gather the in-
formation that can be used to make the system more suitable for the target
audience. The results of the user study contribute to answering the re-
search questions (Section 1.1). The personas and system goals were used
to design the study and interface taxonomies were used in the analysis of
the study tasks to ensure that all aspects of the system design were ad-
dressed by the study.
The user study required ethical approval which was obtained from the
Victoria University Human Ethics Committee. The ethics approval num-
ber was #0000026810.
59
60 CHAPTER 4. USER STUDY
participants for the pilot study were sourced from both statistical/data
analysis backgrounds and computer science/engineering backgrounds.
Some of the participants for the pilot study were sourced from within the
School of Engineering and Computer Science (ECS), Victoria University of
Wellington, and other participants were sourced from outside the univer-
sity. The pilot study was used to identify some usability problems in the
design and to identify any issues in the user study questions. 20 partic-
ipants were recruited for the main user study. Participants were sourced
via email through a GIS expert and also by sending an introductory email
to air quality scientists at NIWA, GNS Science, and Regional Councils in
New Zealand. An information sheet was provided to all participants (Ap-
pendix A). The participants for the main user study were selected experts
in air quality analysis, and university students sourced from the School
of Geography, Environment and Earth Sciences (SGEES), Victoria Univer-
sity of Wellington, with experience in GIS. AtmoVis design was targeted
at users experienced in air quality data analysis.
the study tasks were testing use cases that could occur outside the labo-
ratory settings. The study tasks were based on the goals of the personas
which were developed during system design (Section 3.1.1). Personas al-
lowed the goals and requirements of users to be explored in detail through
narratives. The persona goals define the sort of tasks that an analyst would
want to perform with the system.
A pilot study was used as part of the requirements gathering process [43].
The results of the pilot study were used to improve the user study ques-
tionnaires, tasks and protocol before conducting the main user study. At-
moVis was also refined and adjusted concurrently with pilot study using
the results, and the pilot study was performed incrementally. Not all of
the participants were using the same prototype of AtmoVis as the system
was being improved between pilot tests. The pilot study consisted of a
mixture of questionnaires and usability tasks, and was conducted to de-
termine whether each AtmoVis prototype fitted the requirements of the
target audience.
Each pilot study participant was given a brief description of the goals
of the study, then instructional material for the use of AtmoVis was pro-
vided. The instructional material for the pilot study consisted of some in-
structional videos and a slide show provided on a web page (Appendix E)
describing different aspects of the system. The participants were allowed
to look at this material before and during the study. The pre-study ques-
tionnaire was presented at the same time as the study tasks. Since the
pilot study was designed for the improvement of AtmoVis, participants
were encouraged to voice their thoughts about the program and the in-
structional materials using a think aloud protocol [43]. Questions about
the material and system were answered and noted. The researcher took
notes about the usage of AtmoVis in a log book while the study was be-
62 CHAPTER 4. USER STUDY
ing performed. The screen recording software was trialled in some of the
pilot studies. After the study tasks were completed the post-study ques-
tionnaire was given. The post-study questionnaire was used to gather in-
formation about the participant’s experiences. Data responses from the
question sheets were tabulated so that statistical analysis techniques for
the main user study could be trialled. The study tasks in the pilot deter-
mined whether the correct questions were being asked by the study task
sheet and whether AtmoVis was suitable for answering these questions.
In the first section entitled “mapping the data”, the exercise on ozone in
Auckland required the participant to view a particular pollutant over time
at a site and to find information about a trend over a time frame. The par-
ticipant was required to insert a site view and find the start date of the
O3 air pollutant, then they were required to insert line plots, read data
from line plots and compare two different map views for points which are
highlighted on both. The parallel coordinate plot was used to describe a
relationship between different pollutants. The first section encourages a
64 CHAPTER 4. USER STUDY
allowing the label to be dragged and dropped onto the site view. The wind
rose plots were compared. The use of the calendar for navigation was also
a task that related well to Oliver’s persona as the calendar could be used
to find averages while comparing a variable over different time frames.
Oliver’s scenario 1 (Section 3.1.1) required peak pollution levels to be vis-
ible for inspecting air quality standard breaches so the use of the calendar
would be of interest. Kath’s scenario 1 (Section 3.1.1) requires Kath to ob-
serve the directional pollution change in an area over time, so looking at
the wind roses by clicking on days on the calendar would interest Kath.
a temporal trend in an interesting site. In question 1b the site and the pol-
lutant is provided so the question is similar to Mathew’s persona scenario
2 (Section 3.1.1) and Oliver’s scenario 2 (Section 3.1.1) is also similar as the
data is aggregate. Mathew’s persona was more exploratory than Oliver’s
or Kath’s. The breadth first nature of the interface could help with finding
suitable areas with pollutants. The use of the open-ended tasks was de-
signed to collect information about how different visualizations could be
used together to explore the data.
drop actions which cause some participants to complete the tasks more
easily than other participants. These differences may only be visible on
the screen recording. The audio recording ensured that responses given
at the time of the system usage were not missed. As a participant may
not write down everything that they comment on. The audio recording
was used to keep a log of anything that was said while the study was be-
ing conducted. The screen recording footage was used to make inferences
about the use of AtmoVis and how the participants interacted with the
visualizations. The note-taking contained times that were of importance
on the screen recording footage, comments made by the participant and
comments on the participants use of the interface.
Results
In this chapter, the results of the pilot study and the main user study are
presented. Section 5.1 describes the changes that were made to AtmoVis
as a result of the pilot study. Changes were applied between pilot tests
and contributed towards the iterative development of AtmoVis before the
main user study was conducted. In Section 5.2 the results of the main user
study are presented. The results include quantitative statistics from the
pre-study, study tasks, and post-study questionnaires as well as qualita-
tive feedback from the participants.
71
72 CHAPTER 5. RESULTS
their questions were answered and recorded in a log book along with other
observations about the participants. Adjustments were made to AtmoVis
based on the qualitative feedback from participants and the observations
recorded by the researcher in the log book. Table 5.1 shows background
information from the participants recruited for the pilot study. The infor-
mation was provided by the participants as part of the pre-study question-
naire.
site name for every site regardless of whether there was data present and
to allow sites with no data to be dragged and dropped.
intended to make the line plot more readable when very small changes
to pollutants were happening over a short time frame as each axis can be
zoomed and dragged individually to show a different range of values.
5.1.6 Windowing
The AtmoVis user interface was designed around a collection of floating
windows (Figure 3.1). Each visualization occupies a window and partici-
pants could click and drag on the window border to move them around.
The windowing layout was substantially improved as a result of the pilot
study. There were improvements to window movement, window place-
ment, the options panel, play button layout, and the way that visualiza-
tions were added to the screen. The improvements were intended to make
the floating windows easier to interact with.
The options panel The window frame contained two buttons. One but-
ton was for the options panel, the other button was for closing a visual-
ization. The options panel button was styled with a gear icon (Figure 5.1).
(a)
(b)
Figure 5.1: The window frame border for the visualizations and the op-
tions panel. The top image is the frame for the visualization with a high-
lighted gear icon which the mouse has hovered over. The bottom image is
the frame for the options panel with a back button instead of a gear icon.
two buttons was adjusted so that the button icon changes colour on hover.
The icon size was decreased to leave more space between the icon and the
border of the window frame improving the visual appearance of the icon.
Labels and user interface style changes There was some distance be-
tween the time selector and the play button as the time selection was in
the right bottom corner and the play button was next to the menu but-
ton. Participants were observed moving the mouse a large distance across
the screen, so the layout of the play button was adjusted to place the time
selection next to the play button.
The window frame only contained the close button and the options
panel button and some comments suggested that participants became con-
fused about the data which was being visualized in each window. Addi-
tional labels were added to the window borders to distinguish different
visualizations which were added to the canvas. The labelling on the win-
dow borders was intended to prevent users from becoming confused by
misreading the visualization or forgetting the data which the visualization
had been applied to.
The options panel used for both the line plot and the site view was ad-
justed to display the information as a table rather than as a list to improve
the readability of the data being presented.
The font sizes were enlarged and colours were adjusted so that the line
plot, data comparison, and site view would be more readable. Labels were
adjusted on the line plot, data comparison, site view, and heat calendar in
order to make the presentation of the data clearer.
5.1.7 Summary
During the pilot study, differences were observed in the way that partic-
ipants interacted with AtmoVis. There were some different usage styles
for the windowing system. It was necessary to design AtmoVis so that
there was more than one way to interact with the windowing, so that the
78 CHAPTER 5. RESULTS
interface suits different usage styles. For example, a participant was ob-
served tiling the windows on the screen so that windows do not overlap.
Another participant was observed placing the windows on top of each
other and needed to switch the window on top to access the visualizations
underneath. A third participant used visualizations as insets to the map,
positioning line plots close to the relevant stations.
The pilot study also detected parts of AtmoVis that could be misunder-
stood by participants due to documentation or the way information was
presented by the interface. The labelling of components on the interface
was improved to make the interface more usable and zooming was added
to both the data comparison and the line plot to make the visualizations
more readable by the participants.
Additional documentation was added and updated based on the com-
ments made by participants. The documentation was provided in the form
of videos, and a slide show on a web site. There was a number of partic-
pants who did not watch the videos first. Documentation provided in a
slide show form was found to be more digestible, as these slide shows
were provided to the participant through a web browser tab. The doc-
umentation used screenshots and pictures of visualizations labelled with
the functionality of the components. So that a participant could choose to
launch into the study without reading or watching footage then use the
slide show as a reference. The test setup was standardized to use a par-
ticular laptop to allow screen recording, to avoid test setup issues with
equipment and to ensure that the display size and mouse were always the
same. When all the adjustments had been made to the interface, documen-
tation, and study protocol the main user study was conducted.
5.2.1 Participants
This section analyses the feedback given by the participants and observa-
tions about how the participants performed during the study. The partic-
ipants use of AtmoVis was then analysed to evaluate the effectiveness of
the visualizations used in the system and to provide suggestions of ways
that AtmoVis could be improved to make it more suitable for the day to
day tasks of environmental scientists. The analysis of the user study con-
tributes towards the research questions for this thesis (Section 1.1).
5.3. VISUALIZATION EVALUATION 81
The question on the size of each visualization was removed from the
Likert scale data to ensure consistency. The scales on assistance with the
82 CHAPTER 5. RESULTS
User study tasks The participants were asked which month of 2012 con-
tained the highest daily average level of PM10 measured with TEOM at a
5.3. VISUALIZATION EVALUATION 83
Statistical analysis The mean Likert scale score in the heat calendar sec-
tion of the post-study questionnaire was calculated for each participant
based on the method described by Section 5.3.1. Table 5.3 groups the mean
Likert scale test scores based on the amount of experience the participants
have analysing air quality data. The table describes the mean, standard
deviation, and the number of participants in each group.
Welch’s two-sample t-test determined that the two groups “no air qual-
ity experience” and “air quality experience” have approximately the same
mean (t(17.54) = 1.04, p = .311, d = 0.45). The t-test means that the par-
ticipant’s statement of their experience with air quality data did not sig-
nificantly alter the participant’s Likert scale rankings of their experience
84 CHAPTER 5. RESULTS
(a)
(b)
(c)
(d)
(e)
with the heat calendar. The heat calendar was evaluated in the post-study
based on how effectively the heat calendar was for identifying high pollu-
tion, how effective the colour was, how effective the time navigation was
and how much assistance was required.
Table 5.3: Summary statistics for the mean heat calendar section Likert
score in the post-study questionnaire grouped by experience.
No Experience Experience
Mean SD n Mean SD n
Colour The heat calendar colour scheme was based on the highest and
lowest pollution for the year of the selected pollutant, and participants
were not required to compare the colours of different heat calendars. There
were some positive comments relating to the way that the heat calendar
shows yearly trends for a variable.
Interaction There were positive responses to the way that the heat cal-
endar allows different days to be selected and interacts with other visual-
izations.
The most frequent concern about the heat calendar was that it did not
provide additional information about the month currently selected. The
86 CHAPTER 5. RESULTS
participant, were required to read the time selector to find the current
month. 3/18 participants requested some form of a visual cue for the cur-
rent month.
User study tasks The participants were asked to find the date and time
of the highest O3 pollution visible on the line plot (mapping the data ques-
tion 2a, Appendix C). 19/20 answered the question with a result from the
line plot demonstrating that they were able to click and drag the point
from the site view onto the line plot to produce a response. 11/20 partici-
pants responded with the highest pollution level on the visible section of
the line plot, 3/20 participants found the correct day only, 5/20 partici-
pants looked forward to a higher pollution level using the play button.
The participants were asked to comment on the relationship between
2 different sites measuring O3 in Auckland. 18/20 participants answered
the question correctly describing the relationship between the O3 mea-
surements at both sites (mapping the data question 2b, Appendix C) demon-
strating that the line plot is effective for trend comparison.
5.3. VISUALIZATION EVALUATION 87
Statistical analysis The mean Likert scale score in the line plot section
of the post-study questionnaire was calculated for each participant based
on the method described by Section 5.3.1. Table 5.4 groups the mean Lik-
ert scale test scores based on the amount of experience the participants
have analysing air quality data. The table describes the mean, standard
deviation and the number of participants in each group.
88 CHAPTER 5. RESULTS
(a)
(b)
(c)
(d)
(e)
(f)
Table 5.4: Summary statistics for the mean line plot section Likert score in
the post-study questionnaire grouped by experience.
No Experience Experience
Mean SD n Mean SD n
Welch’s two-sample t-test determined that the two groups “no air qual-
ity experience” and “air quality experience” have approximately the same
mean (t(14.51) = 1.80, p = .093, d = 0.83). The t-test means that the partic-
ipant’s statement of their experience with air quality data did not signifi-
cantly alter the participant’s Likert scale rankings of their experience with
the line plot. The line plot was evaluated in the post-study based on the
effectiveness of the line plot for finding temporal patterns between vari-
ables, the effectiveness of using the play button with the line plot for find-
ing temporal patterns, the use of colour, the difficulty of using the mouse
with the line plot and how much assistance was required.
Additional Variables The design of the line plot opted to show only 1
variable at a time to avoid comparisons between pollutants with different
units but 6/20 participant suggested adding functionality to support more
than 1 variable on a line plot.
“If you could put ozone and NO2 together then play them like that,
that would be useful.” - PID 11
90 CHAPTER 5. RESULTS
“If the line axis is larger, sometimes the plot is scaled down, and dif-
ficult to observe the data.” - PID 14
Though it was possible to find out what the coloured lines on the line
plot were by hovering over points, it is not easy to see at a glance if there
are a lot of different monitoring sites on the line plot. Hovering over points
on the line plot is not always possible because only one point can be hov-
ered over at once so adding an additional key below the axis would im-
prove the readability at a glance and make the line plot more effective.
“Its useful that when you roll over a data point it does tell you the
location.” - PID 10
Time Scale Since the date was only displayed on mouse hover 2 partic-
ipants voiced confusion about the time scale. Changing the scale on the
time axis would make the line plot clearer by showing the current date
and month with tick marks for hours rather than the time in hours from
the current time on the time slide. A visual indicator of night and day
would make diurnal patterns easier to infer.
5.3. VISUALIZATION EVALUATION 91
“Well the thing is I don’t know what this bottom scale represents,
time hours, I don’t know what this [time is] unless I look at what the
start date is up here.” - PID 5
The line plot could also be improved by allowing different time frames
to be applied so that longer trends can be shown on the same line plot.
User study tasks 19/20 participants answered the user task question
on wind direction in Woolston (aggregate data section question 2, Ap-
pendix C). 5 of these participants required assistance to comment on the
monthly roses but were able to read the visualizations after receiving as-
sistance. 15/20 participants found some trend in the wind direction and
9/20 found the relationship between concentration and wind direction
correctly.
19/20 participants answered the question about the comparison be-
tween Woolston and St. Albans for SO2 pollution at a given time in ag-
gregate data question 3a. 5/20 participants answered with a correct com-
parison of both wind direction and pollution level at both locations, 14/20
participants answered the question with partial correct solutions.
92 CHAPTER 5. RESULTS
“Oh right so north south east west, so this is like the directionality,
frequency of counts by wind direction so mostly like east and west got
that.” - PID 18
Participant Likert scale post-study responses The Likert scales from the
post-study questionnaire demonstrate positive results for the effectiveness
of the monthly rose (Figure 5.5) with 14/20 participants stating that the
monthly rose was useful for identifying pollutants (Figure 5.5a), and 12/20
participants stating that the monthly rose was effective for finding rela-
tionships between data variables (Figure 5.5b ). 11/20 participants stated
that they needed little or very little assistance with the monthly rose (Fig-
ure 5.5c).
(a)
(b)
(c)
Statistical analysis The mean Likert scale score in the monthly rose sec-
tion of the post-study questionnaire was calculated for each participant
based on the method described by Section 5.3.1. Table 5.5 groups the mean
Likert scale test scores based on the amount of experience the participants
5.3. VISUALIZATION EVALUATION 93
have analysing air quality data. The table describes the mean, standard
deviation and the number of participants in each group.
Table 5.5: Summary statistics for the mean monthly rose section Likert
score in the post-study questionnaire grouped by experience.
No Experience Experience
Mean SD n Mean SD n
Welch’s two-sample t-test determined that the two groups “no air qual-
ity experience” and “air quality experience” have approximately the same
mean (t(14.29) = 0.62, p = .546, d = 0.29). The t-test means that the partic-
ipant’s statement of their experience with air quality data did not signifi-
cantly alter the participant’s Likert scale rankings of their experience with
the monthly rose. The monthly rose was evaluated in the post-study based
on how effectively pollutants could be identified, how effectively relation-
ships between data variables were found, and how much assistance was
needed.
Table 5.6 groups the mean Likert scale test scores based on the cor-
rectness of the response to the aggregate data question 2. The table de-
scribes the mean, standard deviation and the number of participants in
each group.
94 CHAPTER 5. RESULTS
Table 5.6: Summary statistics for the mean monthly rose section Likert
score in the post-study questionnaire grouped by response accuracy.
Question Accuracy
Incorrect Correct
Mean SD n Mean SD n
The comments recorded about the monthly rose refer to the scale cate-
gories, the colour scheme, the time frames chosen for the analysis and
the interaction with the mouse. 4/20 participants commented that the
monthly rose was useful for analysing the data and 2 participants stated
that they would like to use AtmoVis to access the monthly rose. Some
participants were not able to correctly read the monthly rose plot concen-
trations due to their unfamiliarity with the other wind rose visualizations
and difficulties with the scale. 4/20 participants reported being unsure of
what information the monthly rose was being displayed to them them.
The distance between the percentage lines from the centre was also
confusing to some participants, who found the categories difficult to com-
pare when trying to compare the percentage as well.
“So they [percentage markings] are not the same, in the same place. ”
- PID 19
Interaction The interaction between the monthly rose and the heat calen-
dar received some positive responses. Using the heat calendar to change
the pollutant on a monthly rose visualization is faster than using an R
script. Two participants stated that they would like to use AtmoVis in
order to use the monthly rose.
96 CHAPTER 5. RESULTS
“That’s quite useful, [I] quite like that feature, it’s certainly a lot
quicker as a data visualizer, I mean you could do all this in R of course
but its obviously a lot quicker than mucking around with the script
itself.” - PID 20
The monthly rose does not identify what month is currently selected
on the time slider.
“I can’t remember what day I clicked on, I knew it was a day in Febru-
ary so it would be quite good if the wind rose could say what day it
was, or what time or what month or what period so you know what
you’re looking at.” - PID 5
Labelling the current month on the monthly rose would make the vi-
sualizations more presentable and effective if the functionality was added
to extract visualizations for insertion into documents.
User study tasks The participants were required to read the site view
to find how many locations had filled in colour when the ozone was first
selected by the options panel (mapping the data question 1b, Appendix
5.3. VISUALIZATION EVALUATION 97
C), the locations were not visible on the default scaling. 16/20 participants
answered correctly demonstrating that interesting sites can be identified
based on colour value. Though four participants gave responses that were
based on the starting location and did not pan or zoom the site view in
order to discover the other sites.
“Right so you can change what the site is viewing based on the labels,
ok cool, that’s cool.” - PID 18
(a)
(b)
(c)
(d)
(e)
(f)
(g)
“The thing about air quality in general and especially in New Zealand
is that it’s very highly localized so these places are 20 km apart so the
air quality at the two sites is completely unrelated to each other so
you wouldn’t expect to see anything ... With this kind of data set
is what you’ve got is little islands of data which are not necessarily
joined up.” - PID 11
5.3. VISUALIZATION EVALUATION 99
Participants generally felt that they did not need much assistance, 7/20
participants felt that they did not need assistance with the site view (Fig-
ure 5.6f) while 8/20 felt that they neither needed little assistance nor much
assistance. The use of colour was generally effective, 8/20 indicated that
the text size was small (Figure 5.6g). The colour and the text on the site
view could be improved given more time and further user studies.
Statistical analysis The mean Likert scale score in the site view section
of the post-study questionnaire was calculated for each participant based
on the method described by Section 5.3.1. Table 5.7 groups the mean Lik-
ert scale test scores based on the amount of experience the participants
have analysing air quality data. The table describes the mean, standard
deviation, and the number of participants in each group.
Table 5.7: Summary statistics for the mean site view section Likert score in
the post-study questionnaire grouped by experience.
No Experience Experience
Mean SD n Mean SD n
Welch’s two-sample t-test determined that the two groups “no air qual-
ity experience” and “air quality experience” have approximately the same
mean (t(15.53) = 0.51, p = .620, d = 0.23). The t-test means that the par-
ticipant’s statement of their experience with air quality data did not sig-
nificantly alter the participant’s Likert scale rankings of their experience
with the site view. The site view was evaluated in the post-study based on
colour changes, the information displayed in the options panel, the colour
100 CHAPTER 5. RESULTS
on the map, the spatial relationships displayed on the map, the use of the
mouse to navigate, and the amount of assistance required.
Colour 6/20 participants discussed the colour of the site view. The colour
scheme ranged from white to red and was scaled based on the maximum
recorded measurement for the pollutant selected for the site view. One
suggestion was to make the colour better by removing white from the
colour scheme, as the base map was very pale in colour so it was some-
times difficult to distinguish the colourless sites from the white sites.
“There were some parts where those station colours were quite similar
in shade to the base map and that made it hard to know in some cases
if the station actually had data or whether that was absent.” - PID 3
“First of all this keeps changing I understand why it is, because it’s
diurnal, ok, but it goes white at night time, ok so it looks like any
other site, so it needs to go black or grey or something like that.” -
PID 8
The site view was good at providing access to the sites by mouse navi-
gation providing that the site location was known.
User study tasks 17 participants were able to use the monthly averages
to drag and drop a pollutant onto a monthly rose (aggregate data 4a, Ap-
pendix C) so the monthly averages was effective for displaying the pollu-
tants at a given site.
The monthly average plot was easy to use, 11/20 participants, so re-
sponded that they required little or very little assistance with the monthly
averages (Figure 5.7e).
(a)
(b)
(c)
(d)
(e)
Statistical analysis The mean Likert scale score in the monthly average
section of the post-study questionnaire was calculated for each participant
based on the method described by Section 5.3.1. Table 5.8 groups the mean
Likert scale test scores based on the amount of experience the participants
have analysing air quality data. The table describes the mean, standard
deviation, and the number of participants in each group.
5.3. VISUALIZATION EVALUATION 103
Table 5.8: Summary statistics for the mean monthly average section Likert
score in the post-study questionnaire grouped by experience.
No Experience Experience
Mean SD n Mean SD n
Welch’s two-sample t-test determined that the two groups “no air qual-
ity experience” and “air quality experience” do not have the same mean
(t(15.65) = 3.70, p = .002, d = 1.67). The t-test means that the participant’s
statement of their experience with air quality data did significantly alter
the participant’s Likert scale rankings of their experience with the monthly
average plot and participants with no experience in air quality data anal-
ysis were more positive towards the use of the visualization. The monthly
average visualization was evaluated in the post-study based on the effec-
tiveness of the visualization for changing variables on the map, the effec-
tiveness of the visualization for identifying pollutants of interest, the effec-
tiveness of the visualization for finding relationships between pollutants,
and how much assistance was needed.
“Right so you can change what the site is viewing based on the labels,
ok cool, that’s cool.” - PID 18
Two participants reported being confused by the text “click and drag
bar label onto other plots” which was displayed on the monthly averages
visualization and they attempted to drag the bar instead, though the Lik-
ert test score shows that most participants did not require assistance. Ob-
servations from the video show that there were other participants who
attempted to drag the bar and managed to correct their usage of the inter-
face.
“Sorry what does it mean by it says drag and drop the from the la-
bel?” - PID 12
Scales The scale for the visualization was recalculated for each month
which was displayed. One participant found the scaling tricky.
issue but if your in a bit of half daze or whatever, you could look at
something with a lower value and recognize it as being higher than
what you just looked at ...” - PID 3
small or too large, though 6 of the participants stated that the text size was
too small (Figure 5.8c). The text issue was also commented in the pilot
study and the text was repositioned and scaled to fix the labelling, how-
ever, the changes made to the text in the pilot study were not sufficient
for 6 participants though the majority of participants thought that the text
size was correct. A better way of solving text issues would be to put an
option in the options panel to adjust and scale the text.
Statistical analysis The mean Likert scale score in the data comparison
section of the post-study questionnaire was calculated for each participant
based on the method described by section 5.3.1. Table 5.9 groups the mean
Likert scale test scores based on the amount of experience the participants
have analysing air quality data. The table describes the mean, standard
deviation and the number of participants in each group.
Table 5.9: Summary statistics for the data comparison section mean Likert
score in the post-study questionnaire grouped by experience.
No Experience Experience
Mean SD n Mean SD n
Welch’s two-sample t-test was used to determine that the two groups
“no air quality experience” and “air quality experience” do not have the
same mean. (t(11.40) = 2.20, p = .049, d = 1.09). The t-test means that the
participant’s statement of their experience with air quality data did signif-
icantly alter the participant’s Likert scale rankings of their experience with
the data comparison and participants without air quality experience were
5.3. VISUALIZATION EVALUATION 107
(a)
(b)
(c)
(d)
(e)
more positive towards the visualization. The data comparison was eval-
uated in the post-study based on the effectiveness of the data comparison
for finding relationships among pollutants, the effectiveness of the data
comparison for identifying the pollutant of interest, the difficulty of using
the mouse to navigate and the amount of assistance required.
Table 5.10 groups the mean Likert scale test scores based on the correct-
ness of the response to mapping the data section question 4 (Appendix C).
The table describes the mean, standard deviation, and the number of par-
ticipants in each group. The mapping the data section question 4 was cho-
108 CHAPTER 5. RESULTS
Table 5.10: Summary statistics for the data comparison section mean Likert
score in the post-study questionnaire grouped by response accuracy.
Question Accuracy
Incorrect Correct
Mean SD n Mean SD n
Welch’s two-sample t-test was used to determine that the two groups
“incorrect” and “correct” do not have the same mean. (t(13.35) = 3.00,
p = .010, d = 1.35). Participants answering data section question 4 correctly
did give measurably different feedback on the post-study questionnaire
Likert scales compared to participants answering the question incorrectly.
Scale 4 participants commented on the scales chosen for the data com-
parison. The scales were automatically chosen based on the highest mea-
sured value for the variable displayed on the axis through the scales could
be changed by dragging and zooming the axis using the mouse.
5.3. VISUALIZATION EVALUATION 109
Zoom The zoom allowed the range of the y-axis to be changed by scrolling
the mouse. Each axis could be zoomed separately. 4 participants com-
mented on the axis zoom. 2 participants asked whether it was possible to
zoom the axis.
2 participants commented that the zoom was difficult to use due to un-
expected mouse behaviour. These participants were observed positioning
the mouse incorrectly before attempting to zoom.
“Maybe put in a couple of visual cues on the plot so you know where
to point.” - PID 10
5.3.8 Windowing
Participants were able to drag, drop and move windows containing vi-
sualizations (Figure 3.1). The windowing system grid received a positive
response from one participant.
“ The snapping is really nice, I like the way it snaps and stuff to each
other, its a good feature.” - PID 18
“It would be great if the tiles that I put them in get [were] not over-
lapping each other ... having some [of] the windows appear wherever
there’s empty space with a smallish size.” - PID 15
all participants, though the gear icon was chosen for its familiarity and use
on other settings. Four participants required assistance to find the options
panel.
A better strategy would be to automatically tile visualizations on cre-
ation so that they do not overlap. This would ensure that a participant
would be able to see whenever a visualization was produced.
In future saved layouts could be applied to position windows on the
screen to reduce the amount of time spent on rearranging windows, also
sessions could be saved and loaded present an overview.
5.4 Discussion
AtmoVis allows atmospheric air quality data to be explored and visualized
using a selection of different connected visualizations. The objective of the
user study was to evaluate the effectiveness of different visualizations for
air quality data exploration (Section 1.1). In order to describe the effec-
tiveness of the visualizations the results from the Likert scales, qualitative
responses, study task responses and observations presented in Section 5.2
will be discussed.
The main user study produced some positive feedback for the heat cal-
endar, site view, monthly averages, monthly rose and the way that At-
moVis allows the user to interact with the different visualizations. There
were also aspects of the interface which could be improved and expanded
to make AtmoVis more suitable for the target audience. This section dis-
cusses the results of the main user study for each visualization, possible
improvements for AtmoVis and directions for future research.
even though two line plots were required with one variable displayed on
each. The feedback from the study participants using the line plot indi-
cates that being able to compare different pollutants at the same site is an
important feature as 6/20 participants suggested adding functionality for
displaying more than one variable. So building more comparative func-
tionality into the other visualizations would improve the usability of the
interface. AtmoVis is effective for finding the relationship between the two
pollutants and the line plot would assist with Kath’s second scenario more
than the data comparison.
ent scales for the monthly roses so comparing monthly roses for aggregate
data comparison between sites was less effective than reading information
from a monthly rose at a single site. The heat calendar also makes aggre-
gate data avaliable and all participants were able to correctly identify the
month containing the highest daily mean level of PM10 measured with
TEOM at a temperature of 50◦ C in Woolston. The use of the heat calendar
demonstrates that AtmoVis does make aggregate data avaliable for tem-
poral visualization tasks which would assist with Oliver’s second scenario
by allowing daily averages to be compared between different months.
The heat calendar was effective for reading time series trends in the data
and finding days with a high daily average, All twenty participants found
the month with the highest daily average pollution for PM10 measured
with TEOM at a 50◦ C temperature setting. A calendar heat map has previ-
ously been used to visualize air pollution across China [69] and their cal-
endar heat map for visualizing temporal trends was used with a geospatial
map for visualizing spatial trends [69]. In contrast to this research project
the system presented by Zhou et. al. [69] placed the heat map calendars
directly onto the geospatial map. In AtmoVis, the heat calendar was a sep-
arate visualization and data was inserted onto the heat calendar by drag
and drop from the site view. The heat map calendar used by Zhou et.
al. [69] was effective because it was used to find temporal trends, though,
a formal user study was not conducted on the visualization. In contrast,
116 CHAPTER 5. RESULTS
a formal user study was conducted as part of this research and the heat
calendar received positive responses, the mean Likert scale score for the
heat calendar section of the post-study questionnaire had a median of 4.38,
calculated with the method described in Section 5.3.1. Differences in the
Likert scale responses from the post-study between participants with ex-
perience in air quality data analysis and participants without experience
were not able to be detected (p = .311). Experience with air quality analysis
did not effect the participants response to the effectiveness of the visual-
ization. Both air quality experts and non experts were able to read and
understand the heat calendar visualization. If AtmoVis was extended in
the future with other datasets the heat calendar could display information
about estimated emissions in the area as well as the pollution in the atmo-
sphere. The heat calendar is also generally applicable to data which is not
directly related to air pollution.
The line plot was effective for reading time series trends in the data and for
finding peak pollution levels over a short time frame based on the results
of the study task question where 16 participants found the relationship
between O3 and Solar Radiation at a single site. The line plot had gener-
ally positive responses in the Likert scale questions with an average score
of 3.60 (Figure 5.2). The Trelliscope visualization system [30] also uses a
line plot to display time series trends and similarily this user study on
Trelliscope demonstrated that the line plot was effective when used with
a recommendation system for identifying generator trips. The Trelliscope
line plots were generated using data from a recommendation system in
contrast to this research project which did not use a recommendation sys-
tem. Differences in the Likert scale responses from the post-study between
participants with experience in air quality data analysis and participants
without experience were not able to be detected (p = .093). Experience with
5.4. DISCUSSION 117
air quality analysis did not effect the participants response to the effective-
ness of the visualization and both air quality experts and non experts were
able to understand the information presented by the line plot.
The line plot could be extended by allowing the colour scheme to be
chosen as a preference and inserting a legend for the colour instead of rely-
ing on the mouse hover to show the site. Some participants asked whether
it was possible to put more than one variable onto the same scale. Adding
a button for zooming the visualization instead of relying on the mouse
scroll would make the visualization more intuitive as only some partici-
pants used the mouse to zoom the line plot when pollutant levels were
low. The line plot could be used in the future to display information from
other datasets such as traffic monitoring data or additional meterological
information.
The monthly rose was effective for finding trends in wind direction and
in the directional concentration of air pollution at a given station. 15/20
participants found some trend in the wind direction and 9 found the rela-
tionship between concentration and wind direction correctly. It was more
difficult to answer the question about the comparison between Woolston
and St. Albans for SO2 pollution at a given time (Question 3). 19/20 par-
ticipants answered with a valid solution, 5 of those participants answered
with a correct comparison of both wind direction and pollution level at
both locations and 14 participants answered the question with partial cor-
rect solutions so the comparison between different locations was less ef-
fective. wind rose plots have previously been used to present the effects of
wind on background particle concentrations at freight truck terminals [29].
The wind rose plots were overlayed onto a geographical map so that the
surrounding area of the sensor could be visualized. Statistical tests were
performed in that study to find differences in concentration based on wind
118 CHAPTER 5. RESULTS
direction. AtmoVis can be used to visualize monthly roses and the win-
dow based layout allows the monthly roses to be dragged and positioned.
AtmoVis does not have the functionality to also perform statistical tests
over a timeframe though this functionality could be added in the future to
expand the possible uses for the monthly rose.
The monthly rose received positive responses, the mean Likert scale
score for each participant had a median of 3.33, calculated with the method
described in Section 5.3.1. Differences in the Likert scale responses from
the post-study between participants with experience in air quality data
analysis and participants without experience were not able to be detected
(p = .546). Experience with air quality analysis did not effect the partici-
pants response to the effectiveness of the visualization.
The site view was effective for accessing the different sites that were avail-
able. 16/20 participants were able to locate sites with a particularly high
value for a pollutant based on the colour, though there were some possi-
ble improvements which would make sites easier to find (Section 5.3.5).
A previous user study has incorporated a geospatial map to visualize pol-
lution in China [69]. The geospatial map was found to be effective for
visualizing the site locations and a heat map was used to visualize a cal-
endar showing temporal trends [69]. Unlike this research project, Zhou et.
al. did not conduct a user study on the effectiveness of the visualizations.
The visualizations were used to find trends in the air pollution in China
and those results were reported.
19/20 participants managed to use the site view to interact with the line
plot by dragging and dropping points demonstrating that the site view
was effective for interacting with other visualizations (Section 5.3.3). The
Likert scale averages show that the site view had positive responses over-
all with a median score of 3.17, calculated with the method described in
5.4. DISCUSSION 119
Section 5.3.1 (Figure 5.2), though improving the use of colour could make
the site view clearer. The font was generally satisfactory though 6 partic-
ipants considered the font size too small (Figure 5.6). Differences in the
Likert scale responses from the post-study between participants with ex-
perience in air quality data analysis and participants without experience
were not able to be detected (p = .620). Experience with air quality analysis
did not effect the participants response to the effectiveness of the visual-
ization, and both air quality experts and non experts were able to under-
stand the information presented by the site view. Qualitative responses
suggested that geographical knowlege of the area did help participants
when finding locations.
The site view could be extended with a search feature to make it easier
for participants to locate, some participants had difficulty locating places
which were specified on the question sheets. The site view could also be
extended by adding overlays for the regions or allowing more than one
pollutant to be visualized at once. The geospatial map presented in the
study by Zhou et.al. [69] did not provide a search feature and zoomed the
map so that the entire map was visible. Enlarging the font on the hover
menus would improve the usability of the site view. The use of the site
view would generalize to other datasets, for example if traffic data was
inserted then the site view could be used to show information about con-
gestion through other types of sensors.
17/19 participants were able to use the monthly averages to drag and drop
a pollutant onto a monthly rose so the monthly averages visualization was
effective for displaying the variables available at a given site and allowing
the variable on another visualization to be changed easily. The monthly
averages visualization was a bar chart of the pollutant levels averaged
over a given month and participants were able to see all the levels on a
120 CHAPTER 5. RESULTS
Bar graphs have previously been used by the Voyager system to de-
scribe data. Different categorical variables can be graphed against a nu-
merical variable. Though the way that the user interacts with the vari-
ables contrasts with AtmoVis. Voyager provides a list of fields which can
be dragged and dropped onto an encoding section and different views are
produced through a recommender engine [65]. The labels on the bar graph
cannot be dragged and dropped from the visualization onto other visual-
izations, unlike AtmoVis which uses the monthly averages visualization
to show what variables are available and provide interaction with other
visualizations through drag and drop.
5.4.8 Windowing
There were also differences detected in the way that window systems are
used by different participants. Participants in the study were observed
piling windows on top of each other, tiling them out neatly, using win-
dows near maximized, or ‘splattering’ [59] windows around the screen.
The observations about the use of windows in AtmoVis demonstrate that
differences usage styles can emerge.
The drag and drop interaction is really powerful, however, it is lim-
ited because only some plots can have data dragged out of them. The
windowing of the interface does encourage participants to have a large
number of windows visible on the same interface. One study on window
switching found that on a single monitor the median number of windows
that a user has visible is 1.7 [59], though the video of the task completion
demonstrated that the median of the number of windows visible for each
user in mapping the data question 4 (Appendix C) was 5 windows visible.
When the windows are used in a comparative connected way the number
of windows visible when using AtmoVis will be higher than the median
for a single monitor setup. The study on window switching also found
that when the number of windows increases the amount of time switching
also increases. So extending AtmoVis to use a window switching feature
and supporting multiple monitors could reduce the amount of time spent
window switching. A study on the use of a multi monitor high-resolution
displays found that participants spent less time using file system navi-
gation, minimizing and maximizing. Instead, they piled up windows on
different monitors in categories as an extention of their memory [17].
The interface of AtmoVis could be extended to allow different groups
of visualizations to occupy different time frames. Instead of grouping all
of the plots to one play button, there could be groups with a play button
5.5. SUGGESTED IMPROVEMENTS 123
(a) (b)
(c) (d)
Figure 5.9: Four different usage styles in AtmoVis, Figure 5.9a shows win-
dow splattering, Figure 5.9b shows border to border careful tiling, Fig-
ure 5.9c shows a near maximized window and Figure 5.9d shows a stack
of windows.
group of participants was not practicable at this stage in the projects’ de-
velopment. The user testing could be integrated into the daily use of the
software tool for a group of professionals and a more detailed study of
the usability of the tool could be performed over a longer time frame in a
workplace environment.
Data extraction AtmoVis does not implement all of the functionality de-
scribed in Shneiderman’s taxonomy [52]. For example, there is no way
to extract a selection of data out of the visualization system to save in a
spreadsheet and perform further analysis.
“I can’t get the monthly average data from this panel, so if there is a
button I can automate that [exports data] into a CSV file or something
so the user can use it directly otherwise they should write [it] down.”
- PID 16
Metadata Some participants reported that they did not fully understand
the nature of the data that they were looking at. AtmoVis could be ex-
panded with a metadata viewer which can provide textual information
about monitoring sites and pollutants measured as well as photographs of
monitoring equipment installed.
5.6 Summary
AtmoVis was evaluated in a pilot study and the main user study. The vi-
sualizations for AtmoVis were tested by the participants completing study
5.6. SUMMARY 125
Conclusions
127
128 CHAPTER 6. CONCLUSIONS
found to be effective and the line plot was the most effective for compar-
ing temporal trends in the air quality data. The following contributions
were made in this thesis.
6.1 Contributions
During the user study, the participants were required to use the line plot,
monthly rose, monthly averages, site view and data comparison to explore
the data and identify relationships between variables in the dataset. The
line plot and the heat calendar were most effective for assessing the tem-
poral trends in the visualizations.
RQ1 How effective are the visualization techniques for exploring air
quality data?
The heat calendar had the highest score for effectiveness based
on the post-study questionnaire responses. The heat calendar re-
ceived positive feedback from participants about its navigational
use. The visualization system received generally positive feedback
in the post-study questionnaires and the responses also identified
some parts of the visualization system which could be improved.
The data comparison received a lower effectiveness score than the
other visualizations and there was a statistically significant differ-
ence between users with experience analysing air quality data and
users without. The uses without experience gave more positive
responses about the effectiveness of the visualization compared to
users with experience. The heat calendar and the line plot required
the least assistance. They were also the most effective in terms of
the overall effectiveness of the visualization. The data comparison
required the most assistance and received the lowest overall effec-
tiveness score.
6.3. FUTURE WORK 131
RQ2 How effective is the user experience of AtmoVis for exploring air
quality data?
The window interface was evaluated by inspecting the partici-
pant’s use of the interface. Participants were able to effectively
use several windows at once to answer questions. The drag and
drop interactivity between different visualizations received posi-
tive feedback from participants.
In conclusion, the results of the user study demonstrate that air qual-
ity data analysis in New Zealand could benefit from interactive visualiza-
tion through a web-based interface. The heat calendar, line plot, site view,
monthly averages and monthly rose were effective for analyzing air qual-
ity through AtmoVis and an interactive web based interface for data explo-
ration with a window layout was an effective method for accessing these
visualizations and inferring relationships among air quality variables at
different monitoring sites.
Additional datasets could be built into the system so that traffic data,
water quality, land use, and other information could be compared
to air quality. Inserting metadata about emissions sources in each
region would allow more inferences to be made about the causes of
the air pollution measured in each region.
The functionality of the system could be extended using a recom-
mender engine to find areas that are particularly interesting or pollu-
tants that are related. Recommender engines are used by other data
visualization systems and can recommend data for analysis based on
similarity.
A file export system to produce layouts of different visualizations us-
ing an R script would allow more bidirectional interaction between
other systems. A plug-in extension system which adds interaction
to visualizations produced by other systems would allow the inter-
face to consolidate many visualization systems into one place and
improve the utility of the interface.
AtmoVis could start with a help window visible to assist first time
users. The help window would provide a reference describing the
system functionality in an interactive way. AtmoVis could be made
more intuitive by adding a help system which can be dragged and
dropped from the menu like a visualization. The help system would
open up a documentation viewer in a window.
AtmoVis allows visualizations from R to be used, but implementing
a domain specific language could allow more complex plots to be
produced by performing operations on the data, e.g plotting the dif-
ference between two variables. An algebra for generating tables was
defined as part of the Polaris system [57] (Section 2.2.4) and a similar
algebra for generating visualizations could be defined to compose
together visualizations in AtmoVis.
.
134 CHAPTER 6. CONCLUSIONS
Appendices
135
Appendix A
Information Sheet
137
Appendix B
Pre-Study Questionnaire
141
Visual Analytics for Air Quality Data :
Pre-study questionnaire
General:
55 - <60 60+
3. Occupation:
Analysis tools:
Have you used any of the following tools for work or study?
3. Spreadsheet Software.
Yes No
Infrequently Frequently
1 2 3 4 5
(b) Have you used spreadsheets for visualizing data (e.g chart, line plot, histogram e.t.c)?
Yes No
What sort of visualizations do you draw using spreadsheets for work or study?
Yes No
Yes No
Programming tools:
Have you used any of the following tools for work or study?
4. R
Yes No
Yes No
Yes No
7. Tableu
Yes No
Basic Advanced
1 2 3 4 5
8. PowerBI
Yes No
Basic Advanced
1 2 3 4 5
Yes No
If yes then for how many years.
Yes No
If yes then how many years.
If yes then what sort of experience?
3. Do you have any experience using different kinds of data sets other than the air quality
monitoring datasets e.g traffic congestion data?
Yes No
If yes then discuss the data set and the type of data.
1. Have you ever analyzed/used the NIWA air quality monitoring data set?
Yes No
Basic Advanced
1 2 3 4 5
2. Have you ever analyzed/used air quality monitoring data sets from a different source e.g
Air quality data sets from LAWA?
Yes No
Basic Advanced
1 2 3 4 5
4. What sort of comparisons did you make between air pollutants on a larger scale spanning
several monitoring sites?
5. Were you using aggregate data derived from the station monitoring data?
Yes No
If your answer was yes what sort of derived data were you using?
6. Were you interested in time series trends for air quality monitoring data at a given station?
Yes No
7. When analyzing/using air quality monitoring data sets what sort of analysis software did
you use?
8. Were you comparing/ analyzing any other datasets alongside the air quality dataset to infer
air quality information?
Yes No
Study Instructions
149
Visual Analytics for Air Quality Data: User Study Tasks
1. Use the menu to insert a map of New Zealand by dragging the site view button onto the
screen. Then use the options panel to select O3 .
(a) What is the start date for the O3 measurements displayed on the options panel?
(b) Enter the start date and time from the options panel into the time selector next to the
play button, change back to map view and click load. Reading points from the map, how
many recording stations are highlighted with filled in color on the start date?
2. Leaving the map in place, Insert a line plot. Drag the Howic_Music Point (in Auckland) O3 site
onto the line plot.
(a) At what date and time is the highest pollution level for O3 visible on the line plot? What
is the value?
(b) Insert Whangaparaoa_Shakespear Park (in Auckland) onto the same line plot, how are
the levels of O3 related?
3. Add a second map of New Zealand onto the screen and set the map to show solar radiation.
Insert an additional line plot and drag the Howic_Music Point from the solar radiation map
onto the line plot.
(a) How many monitoring stations are recording both O3 and solar radiation at the date and
time on the time selector (i.e. how many stations have filled in color for both O3 and
solar radiation)?
(b) Using both line plots can you see any relationship between O3 and solar radiation at
Howic_Music point? Describe the relationship.
4. Insert a parallel coordinate plot by dragging and dropping the data comparison button from
the launcher menu, Use the options panel to set the pollutants to PM_2_5_BAM, PM_10_BAM
and O3 . From the O3 map, drag and drop Pukekohe and Whangaparaoa (in Auckland) onto
the parallel coordinate plot.
Click the play button and observe for about 12 hours. Is there any relationship between par-
ticulate matter PM_2_5_BAM and O3 ? Describe the relationship.
Aggregate data
Reload the page before starting this section.
1. Insert a site view, set the pollutant to PM10_TEOM50. Insert a heat calendar and set the time
selection to 01/01/2012 1:00. Drag Woolston (in Christchurch) onto the calendar.
What month contains the highest daily average level of PM10_TEOM50 in 2012 for the Christchurch
monitoring station?
2. Without removing the calendar, Insert a monthly rose plot and drop Woolston onto the plot.
On the calendar, click on any day in the month with the worst PM10_TEOM50 pollution.
Is there any relationship between wind direction and the level of PM10_TEOM50 on the month
with the highest PM10_TEOM50 pollution in 2012? Describe the relationship.
3. Insert a site view with the pollutant set to SO2 , load the data. Insert a monthly average
chart and two monthly rose plots. Set the date to 01/01/2012 1:00 and drag Woolston onto
the monthly average chart. Drag Woolston onto one of the monthly rose plots then drag
St.Albans_Coles Pl (in Chistchurch) onto the other rose.
(a) Comment on the distribution of SO2 with wind direction.
(b) What else is being measured at Woolston ?
4. Drag and drop the PM2_5_FDMS label from the monthly average chart onto the map view.
Insert two more monthly rose plots. Then drag and drop a point from Woolston onto one of
the roses and the point from St.Albans_Coles Pl onto the other rose.
(a) Comment on any similarities or differences between PM2_5_FDMS and SO2 .
(b) Delete the monthly average chart and replace it with a calendar view. Drag and drop
Woolston onto the calendar view. Click on a day in February and comment on the sim-
ilarity or difference between PM2_5_FDMS and SO2 for the month of February on the
monthly roses. Is there a trend continued from January?
5. Insert a site view with the pollutant set to SO2 . A monthly average chart, A calendar and one
line plot. Set the date to 01/01/2012 1:00, Drag and drop Woolston onto the monthly average
chart. Drop Woolston onto the calendar. Drop Woolston onto the line plot.
(a) Identify the day in the year where SO2 pollution is the worst.
(b) Looking at the calendar, is there a trend in the days where the SO2 air pollution is the
worst in Woolston? Describe the trend.
(c) Click on the day before the worst day in January for SO2 . At what time was the air pollu-
tion the worst on the 72 hour line chart for SO2 and what was the maximal value for SO2 ?
1. Insert three map views. Set the map views to show CO, PM10_BAM and NO2 . Set the time
to 01/01/2012 1:00. Load the data, then insert a Data Comparison plot. Set the parallel co-
ordinate plot to show CO, PM10_BAM and NO2 , click apply. Then drag and drop Newmarket
(In Auckland) onto the parallel coordinate plot. Also drag Henderson_Lincoln Rd (In Auckland)
onto the plot.
(a) Press the play button, observe for about 12 hours. Is there any relationship between
PM10_BAM and NO2 at the Newmarket station? Describe the relationship.
(b) Is there any relationship between the results for Newmarket and Henderson_Lincoln Rd?
Describe the relationship.
Temporal Pattern:
Reload the page before starting this section
1. What are the temporal patterns in air pollutants recorded, and can trends be identified?
(a) Set the date to 01/01/2012 and choose any station, Use the Line Plot, Site View , Heat
Calendar and Monthly Average Chart to observe temporal patterns among a selec-
tion of air pollutants and meteorological variables. Use the play button and describe
the patterns on a line plot over time.
(b) Use the Site View, Monthly Average Chart and Heat Calendar to examine PM10_BAM
in Masterton during 2012. Comment on any seasonal trends or days with the highest pol-
lution.
Post-Study Questionnaire
155
Visual Analytics for Air Quality Data: Post Study Questions
Circle one answer only per question
Site view
1 Was the text and monitoring sites the correct size?
2 How much difficulty did you experience using the mouse to navigate?
3 Was the use of colour on the map effective for representing the data collected at
each site?
4 Was the information displayed in the options panel of the site view useful when
exploring the data e.g start date?
5 How much did you feel like you needed assistance with the site view?
6 How effective was the site view for finding spatial relationships of variables dis-
played on the map?
1
7 When the play button was pressed, was the colour change on the map meaningful
for identifying temporal patterns in the data?
8 Would you like to see any functionality added to the site view?
Yes No
Suggestion:
Line Plot
9 Was the text the correct size?
10 How much difficulty did you experience using the mouse to navigate?
11 Was the use of colour in the line plot effective for interpreting the data?
2
12 How much did you feel like you needed assistance with the line plot?
13 How useful was the line plot for finding temporal patterns between variables?
14 How useful was the play button for finding temporal patterns in the data using the
line plot?
15 Would you like to see any functionality added to the line plot?
Yes No
Suggestion:
3
Heat Calendar
16. How useful was the calendar for identifying high pollution areas of interest?
19. Would you like to see any functionality added to the calendar?
Yes No
Suggestion:
20. How much did you feel like you needed assistance with the calendar view?
4
21. Was the colour coding useful in the calendar view?
23. How useful was the monthly average plot for identifying pollutants of interest?
24. How useful was the monthly average plot for changing variables on the map?
25. How useful was the monthly average plot for finding relationships between pollu-
tants?
5
26. Would you like to see any functionality added to the monthly average plot?
Yes No
Suggestion:
27. How much did you feel like you needed assistance with the monthly average plot?
Data Comparison
28. How useful was the data comparison (parallel coordinate plot) for identifying pol-
lutants of interest?
30. How much difficulty did you experience using the mouse to navigate?
6
31. How useful was the data comparison for finding relationships among pollutants?
32. Would you like to see any functionality added to the data comparison ?
Yes No
Suggestion:
33. How much did you feel like you needed assistance with the data comparison?
34. How useful was the monthly rose plot for identifying pollutants?
35. How useful was the monthly rose plot for finding relationships between data vari-
ables?
7
36. How much did you feel like you needed assistance with the monthly rose plot?
37. Would you like to see any functionality added to the monthly rose plot?
Yes No
Suggestion:
General questions
38. If you felt like you needed assistance with the interface, comment on the aspect(s)
that you needed assistance with.
8
39. Comment on any unusual aspect(s) of the dataset inspected.
40. Did you identify any temporal patterns in the air quality data set provided? If yes
then give a brief description.
41. Did you feel you could discuss the data better after using the Atmovis tool?
Yes No
42. What aspect of the Atmovis tool helped you the most with understanding the
dataset?
9
43. Would the Atmovis tool be useful in a presentation to demonstrate air quality in-
formation?
Yes No
If your answer was yes then where would you be likely to use the tool:
44. Did you feel more engaged with the task when using the Atmovis tool compared to
using a spreadsheet?
Yes No
45. Would you prefer to use the Atmovis interface over a spreadsheet for any data
analysis tasks?
Yes No
10
46. Additional Suggestions or comments:
11
Appendix E
Instructional Slides
167
Introduction
The purpose of this research project is to design and build an effective prototype for visualizing spatial-
temporal data from multiple sources related to air quality. The effectiveness of the prototype will be evaluated
by user study. The prototype system will allow analysts to understand trends between different monitoring
stations more effectively.
videos
The AtmoVis system consists of several different types of visualization which work together to visualize the
data.
Loading Data
Line Plot
Data Comparison
Monthly Rose
Calendar And Monthly Average
<
<
Instructions
The instructions for using AtmoVis are provided as a series of slides with each slide describing the usage of a
visualization. The main layout consists of a menu launcher, a canvas and a play button with the date and time.
172 APPENDIX E. INSTRUCTIONAL SLIDES
Bibliography
173
174 BIBLIOGRAPHY
[11] Python Data Analysis Library — pandas: Python Data Analysis Li-
brary, https://pandas.pydata.org/. Accessed: 04/07/2019.
[15] New Zealand’s Environmental Reporting Series: Our air 2018. In New
Zealand’s Environmental Reporting Series: Our air 2018. Ministry for the
Environment & Stats NZ, 2018, pp. 12–40.
[19] B OSTOCK , M., AND H EER , J. Protovis: A Graphical Toolkit for Visu-
alization. IEEE Transactions on Visualization and Computer Graphics 15,
6 (Nov. 2009), 1121–1128.
[25] C HANG , W., C HENG , J., A LLAIRE , J. J., X IE , Y., AND M C P HERSON , J.
shiny: Web Application Framework for R. 2018. Accessed: 04/07/2019.
[28] F EKETE , J.-D., VAN W IJK , J. J., S TASKO , J. T., AND N ORTH , C.
The Value of Information Visualization. In Information Visualization:
Human-Centered Issues and Perspectives, A. Kerren, J. T. Stasko, J.-D.
Fekete, and C. North, Eds. Springer Berlin Heidelberg, Berlin, Hei-
delberg, 2008, pp. 1–18.
[29] G ARCIA , R., H ART, J. E., D AVIS , M. E., R EASER , P., N ATKIN , J.,
L ADEN , F., G ARSHICK , E., AND S MITH , T. J. Effects of Wind on Back-
ground Particle Concentrations at Truck Freight Terminals. Journal of
occupational and environmental hygiene 4, 1 (Jan. 2007), 36–48.
[31] H ALES , S., S ALMOND , C., T OWN , G. I., K JELLSTROM , T., AND
W OODWARD , A. Daily mortality in relation to weather and air pollu-
tion in Christchurch, New Zealand. Australian and New Zealand Jour-
nal of Public Health 24, 1, 89–91.
[32] H SU , Y.-C., D ILLE , P., C ROSS , J., D IAS , B., S ARGENT, R., AND
N OURBAKHSH , I. Community-Empowered Air Quality Monitoring
System. In Proceedings of the 2017 CHI Conference on Human Factors
in Computing Systems (New York, NY, USA, 2017), CHI ’17, ACM,
pp. 1607–1619.
[33] I SENBERG , T., I SENBERG , P., C HEN , J., S EDLMAIR , M., AND
M ÖLLER , T. A Systematic Review on the Practice of Evaluating Visu-
alization. IEEE Transactions on Visualization and Computer Graphics 19,
12 (Dec. 2013), 2818–2827.
BIBLIOGRAPHY 177
[40] L IU , Y., B ARLOWE , S., F ENG , Y., YANG , J., AND J IANG , M. Eval-
uating exploratory visualization systems: A user study on how
clustering-based visualization systems support information seeking
from large document collections. Information Visualization 12, 1 (2013),
25–43.
[47] P RUITT, J., AND A DLIN , T. The Persona Lifecycle: Keeping People in
Mind Throughout Product Design. Morgan Kaufmann Publishers Inc.,
San Francisco, CA, USA, 2005.
[49] R OTH , S. F., L UCAS , P., S ENN , J. A., G OMBERG , C. C., B URKS , M. B.,
S TROFFOLINO , P. J., K OLOJECHICK , A. J., AND D UNMIRE , C. Visage:
A User Interface Environment for Exploring Information. In Proceed-
ings of the 1996 IEEE Symposium on Information Visualization (INFOVIS
BIBLIOGRAPHY 179
’96) (Washington, DC, USA, 1996), INFOVIS ’96, IEEE Computer So-
ciety, pp. 3–12.
[50] S ANTOS , B. S., D IAS , P., S ILVA , S., F ERREIRA , C., AND M ADEIRA , J.
Integrating User Studies into Computer Graphics-Related Courses.
IEEE Computer Graphics and Applications 31, 5 (Sept. 2011), 14–17.
[52] S HNEIDERMAN , B. The Eyes Have It: A Task by Data Type Taxon-
omy for Information Visualizations. In Proceedings of the 1996 IEEE
Symposium on Visual Languages (Washington, DC, USA, 1996), VL ’96,
IEEE Computer Society, pp. 336–343.
[55] S MITH , G., C ZERWINSKI , M., M EYERS , B., R OBBINS , D., R OBERT-
SON , G., AND TAN , D. S. FacetMap: A Scalable Search and Browse
Visualization. IEEE Transactions on Visualization and Computer Graphics
12, 5 (Sept. 2006), 797–804.
[57] S TOLTE , C., TANG , D., AND H ANRAHAN , P. Polaris: A System for
Query, Analysis, and Visualization of Multidimensional Databases.
Commun. ACM 51, 11 (Nov. 2008), 75–84.
[58] S TONE , M. Field Guide to Digital Color. A. K. Peters, Ltd., Natick, MA,
USA, 2003.
[60] T HOMAS , J., AND C OOK , K. Illuminating the path: the research and
development agenda for visual analytics. Los Alamitos, CA: IEEE Com-
puter (2005).
[62] W EI , F., L IU , S., S ONG , Y., PAN , S., Z HOU , M. X., Q IAN , W., S HI ,
L., TAN , L., AND Z HANG , Q. Tiara: a visual exploratory text analytic
system. In In Proceedings of the 16th ACM SIGKDD international con-
ference on Knowledge discovery and data mining, KDD ‘10 (2010), ACM,
pp. 153–162.
[68] Z HENG , Y., L IU , F., AND H SIEH , H.-P. U-Air: When Urban Air Qual-
ity Inference Meets Big Data. In Proceedings of the 19th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining (New
York, NY, USA, 2013), KDD ’13, ACM, pp. 1436–1444.
[69] Z HOU , M., WANG , R., M AI , S., AND T IAN , J. Spatial and temporal
patterns of air quality in the three economic zones of China. Journal
of Maps 12, sup1 (2016), 156–162.