0% found this document useful (0 votes)
5 views12 pages

Visual Analysis of Air Pollution Spatio-Temporal Patterns

This paper presents a visual analysis framework for exploring spatiotemporal patterns of air pollution data, addressing the limitations of direct data visualization in long-time domains. It introduces a two-stage cluster analysis method to extract transport patterns from large-scale pollutant trajectories, aiding domain experts in decision-making. The framework incorporates user-friendly multi-view interactions and is validated through case analyses, demonstrating its effectiveness in analyzing air pollution dynamics.

Uploaded by

OLERATO MODIGA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views12 pages

Visual Analysis of Air Pollution Spatio-Temporal Patterns

This paper presents a visual analysis framework for exploring spatiotemporal patterns of air pollution data, addressing the limitations of direct data visualization in long-time domains. It introduces a two-stage cluster analysis method to extract transport patterns from large-scale pollutant trajectories, aiding domain experts in decision-making. The framework incorporates user-friendly multi-view interactions and is validated through case analyses, demonstrating its effectiveness in analyzing air pollution dynamics.

Uploaded by

OLERATO MODIGA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

The Visual Computer (2023) 39:3715–3726

https://doi.org/10.1007/s00371-023-02961-4

ORIGINAL ARTICLE

Visual analysis of air pollution spatio-temporal patterns


Jiayang Li1 · Chongke Bi1

Accepted: 9 June 2023 / Published online: 24 June 2023


© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023

Abstract
Advances in air monitoring methods have made it possible to analyze large-scale air pollution phenomena. Mining potential
air pollution information from large-scale air pollution data is an important issue in the current environmental field. Although
direct data visualization provides an intuitive presentation, the method is less applicable in long-time domains with high
temporal resolution. To better meet the analysis needs of domain experts, we design a visual analysis framework based on
friendly multi-view interactions and novel visual view designs. This framework can explore the spatiotemporal dynamics of
multiple pollution data. In this paper, a two-stage cluster analysis method is proposed to extract possible transport patterns
from large-scale pollutant transport trajectories. This method will be substantially helpful for domain experts to make relevant
decisions. At the same time, the index is constructed from long-time series data at the grid point in the specific transport
trajectories. This structure can help experts complete the sketch match with custom time resolution. It can assist domain
experts in extracting key possible time-varying features. Finally, we verified the validity through spatial and temporal case
analysis for pollutant data.

Keywords Air pollution · Transport pattern · Sketch match · Visual analysis

1 Introduction pollution data is a hot issue in various disciplines. Direct air


pollution data visualization [3] has extremely low applicabil-
In recent years, along with accelerating economic develop- ity in the long-time domain and high temporal resolution. It
ment and expanding urbanization, many areas in China have cannot summarize transport patterns at a macro-level, and is
experienced very serious air pollution. The pollution has not substantially helpful for domain experts to make relevant
seriously restricted modern development. The increasingly decisions. At the same time, the variation of pollutants in
serious environmental problems have had a great negative the long-time domain usually contains important features. A
impact on people’s studies, lives, and work. Air pollution is single observation cannot well uncover potential trends from
the result of the interaction between weather and humans, and long-time series data [4]. It is difficult for domain experts
it has an important impact on human health [1]. Excess acid to analyze specific pollution phenomena in depth based on
in the atmosphere can catalyze acid rain. More acidic sub- existing knowledge. Finally, domain experts cannot perform
stances tend to acidify soil and damage outdoor non-metallic friendly interactive manipulation and analytical exploration
materials. High concentrations of particulate matter in the on large-scale data [5]. Existing methods do not allow experts
atmosphere can also inhibit crop photosynthesis, affect crop to develop a deeper understanding of air pollution.
growth, and eventually lead crops to death [2]. Severe air pol- During a specific time, air pollutants will appear in typical
lution can also affect various human social activities, reduce transport patterns under the combined effects of meteorol-
productivity, and constrain economic development. ogy, topography, and human activities. Typical air pollution
The continued advances in air monitoring methods have spatial and temporal transport patterns can clearly express
made it possible to analyze large-scale air pollution. Min- pollutant transport characteristics. It is an important basis
ing important air pollution information from large-scale air for experts in the field to develop treatment measures. In
this paper, we will explore its important patterns based on
B Chongke Bi large-scale pollutant spatio-temporal data through several
[email protected]
steps, such as transport trajectory description, transport pat-
1 College of Intelligence and Computing, Tianjin University, tern extraction, user-defined time-scale sketch match, and
Tianjin 300350, China

123
3716 J. Li, C. Bi

visual analysis framework design. This framework will help Beijing–Tianjin–Hebei region using the CMAQ model and
domain experts efficiently and intuitively analyze. Therefore, ISAM (Integrated Science Assessment Model). The method
the main contributions of this paper have been divided into is beneficial to joint air pollution management in the Beijing–
three aspects. Tianjin–Hebei region. Ballesteros-González et al. [10] used
the WRF model to assess the impact of open biomass burn-
(1) An air pollution visual analysis framework is proposed ing events on pollutant concentrations. Through the seasonal
in this paper. It enables specialists to extract potentially variations in pollutants, this study analyzed the seasonal rela-
useful information from large-scale pollution data and tionship between biomass burning and air pollution levels.
comprehend specific patterns by integrating prior knowl- Hoorn et al. [11] presented a top-down approach for min-
edge. imizing computational costs that can account for chemical
(2) Several layouts and interactions are used to convey crit- transport model outputs. This approach helps to simulate and
ical information about pollutants. They will make it discover the potential impact of policies on air quality. By
easier for professionals to summarize certain air pollution using a large-scale urban energy consumption model to esti-
occurrences. mate anthropogenic heat emissions, Tao et al. [12] assessed
(3) The framework enables experts to automatically extract the impact of anthropogenic heat emissions on air quality in
pollution transport patterns, allowing them to quickly Beijing. The relevant conclusion may be an effective method
match custom sketches from long-time series. Three to improve PM2.5 simulation performance.
cases demonstrate that it can meet domain experts’ ana- Building transport networks based on massive data is an
lytical needs. important tool for analyzing spatio-temporal data. Typically,
the nodes denote locations, and the edges denote transport
The rest of this paper is presented below. Section 2 paths between different locations. The constructed network
describes some related work in this field. Section 3 describes relationships are widely used in several research disciplines,
the main algorithms in the visual analysis framework. Sec- such as computer science, business, and biology. Kim [13]
tion 4 describes the views and interactions in the visual used a quantitative analysis method to assign values for
analysis framework. Section 5 analyzes three cases and sum- node attributes and analyze the network evolution process.
marizes the relevant conclusions. Finally, Sect. 6 provides a This method provides important ideas for network evolu-
final conclusion. tion analysis in network analysis, quantitative analysis, and
topology analysis. Feng et al. [14] developed an interactive
visual analysis system that includes summary visualization,
2 Related work interactive exploration, and structure statistics. This system
helps users analyze complex structures in the hierarchical
Analytical work on air pollution is a research topic in various directed acyclic graph (HDAG) and understand the interre-
fields. In this section, this paper will introduce related work lationships between different structures. Linhares et al. [15]
about numerical pattern analysis, transport network analysis, analyzed the layout performance of massive sequence view,
and traditional visual analysis. temporal activity diagram, matrix animation, and structure
animation through extensive user research. This work sum-
2.1 Transport analysis marizes the shortcomings of the four structures for pattern
detection evaluation. Meidiana et al. [16] proposed a metric
Numerical pattern analysis has also seen extensive devel- to measure network structure changes. This metric is able
opment in air pollution transport analysis. Taylor et al. [6] to compare the real geometric changes in dynamic network
studied the changes in heat distribution over cold seas and structures. Ponciano et al. [17] combined node ranking, edge
proposed the famous Taylor theory for turbulent diffusion [7]. sampling, and time slicing to analyze traditional views. They
This theory provides an important theoretical foundation for further demonstrated how optimizing traditional sequential
the development and improvement of subsequent pollutant views can enhance analysis while reducing visual clutter.
diffusion models. With the latest WRF (Weather Research Liu et al. [18] proposed a novel dynamic network embed-
and Forecasting) model and CMAQ (Community Multiscale ding framework, the motif-preserving temporal shift network
Air Quality) model, Liu et al. [8] analyzed the possible (MTSN). The framework is able to simulate both the higher-
relationships between O3 and other pollutants in different order structure and temporal evolution of dynamic networks.
cities and regions during different time periods from 2013 Sabarish et al. [19] proposed a method for identifying similar
to 2017. This work provides a better understanding of the trajectories based on transport trajectories. This method uses
relationship between pollution and policies. Dong et al. [9] hierarchical clustering to analyze trajectory data.
assessed the effects of emission reduction and interannual
meteorological conditions on air pollution transport in the

123
Visual analysis of air pollution spatio-temporal patterns 3717

2.2 Long-time series analysis based on the minimum description length. This analysis sys-
tem can help domain experts interpret uncertain patterns at
Long-time series data are found everywhere in various sub- large spatial scales. Lu et al. [31] developed a tool for multi-
ject areas, yet we are always interested in some local data granularity time series by using a pie chart. The tool combines
from them. Retrieving subsequences from long-time series a heat map, a line chart, and other charts to achieve a fast and
data with target features is the most important step before the clear representation of spatio-temporal pollution data. Li et
other analyses. Feng et al. [20] proposed a lightweight index al. [32] developed a web application that presents air pollu-
structure and match method for the constrained normalized tion data for the USA in real time on an hourly basis. This
subsequence. The index would be built based on the dis- application provides a window for the public to visualize
joint window average. To further improve the variable query air quality. Qu et al. [33] proposed a novel visual analy-
length efficiency, the lookup algorithm is implemented by sis system, AirExplorer. The approach is able to help users
pruning. Fernandez et al. [21] proposed an accelerator based explore multivariate patterns in air quality data through user-
on magnetoresistive random access memory (MRAM) for friendly interactions and intuitive views. Chen et al. [34] used
time series analysis by performing in-situ computational pro- real-time monitoring data from Beijing to quickly and inter-
cessing of in-memory computations at data locations. This actively analyze air quality. Yang et al. [35] integrate relevant
work improves parallelism in memory substrates through factors and complete analytical tasks through filters, zoom,
columnar arithmetic operations and substantially reduces and hover. This work assists users in analyzing the spatial
data movement costs. This method has important applica- and temporal evolution of air pollution.
tions in large-scale simulations and high-precision compu-
tation. To accelerate the retrieval computation, Thanawin et
al. [22] focused on the exact sequential search problem of 3 Data and requirement abstract
normalized long-time series. They optimized the compar-
ative query process by using a lower bound function. Its This section presents the description of relevant data, the pre-
accuracy and efficiency are better than current advanced processing method, and the summarized experts’ require-
methods [23]. Linardi et al. [24] proposed an index struc- ments.
ture for the variable-length similarity match problem. This
structure allows efficient querying of multi-length series data.
Barandas et al. [25] proposed a time series feature extraction 3.1 Data description
library (TSFEL). This library can realize feature extraction
of multidimensional data and has very powerful flexibility In this paper, we analyze and calculate two datasets. The
in actual scenarios. The new detection framework [26] can first dataset is the large-scale air pollution grid point data. It
effectively deal with complex patterns in time series and effi- was produced by the Institute of Atmospheric Physics and
ciently capture multi-scale information from time series. The other research institutes [36] including information on six
method proposed by Ceci [27] can use embedding technology pollutants (PM2.5 , PM10 , SO2 , NO2 , CO, and O3 ) and five
to incrementally update the learning model. It can identify meteorological factors (meridional wind (V), zonal wind (U),
anomalies that other detection algorithms fail to identify and pressure (PSFC), relative humidity (RH), and temperature
avoid false positive detections. Boniol et al. [28] proposed an (Temp)) in longitude and latitude directions. The spatial and
unsupervised method, Series2Graph. This method does not temporal resolutions are, respectively, 15 km and 1 h. The
need to know the characteristics of the data in advance. Its time period is from 2013 to 2018. The second dataset is the
accuracy is much better than similar algorithms. global information assimilation data. The dataset was pro-
duced by the National Centers for Environmental Prediction
2.3 Visual data analysis through the Global Information Assimilation System to run
the hybrid single particle Lagrangian integrated (HYSPLIT)
The optimization of traditional data visualization methods is model [37].
the most common approach to analyzing air pollution data.
Liu et al. [29] proposed an integrated visual analysis system. 3.2 Data preprocessing
This system integrates a unified end-to-end tunable machine
learning pipeline. The pipeline supports result output through To better reflect Chinese air quality, the Chinese government
interactive analysis and can effectively monitor air quality promulgated the Chinese Air Quality Standards in 2012. It
data for outliers. This method allows analysts to examine air states that the air quality over a short period is described by
quality dynamics and anomalous events at multiple scales. the Air Quality Index (AQI) [38], which is an index without
Deng et al. [30] proposed a novel visual analysis system, specific units. It takes values from 0 to 500, with larger values
which uses the wind field to extract the transport patterns indicating more severe air pollution and a higher pollution

123
3718 J. Li, C. Bi

level. The AQI is calculated by the following formula: obvious time-varying trends. Domain experts often com-
  bine line charts and background knowledge to find existing
AQI = max IAQI p (1) time-varying trends. This will help experts easily understand
pollutant changes in a short time range. A line chart with
where IAQI p indicates the air quality sub-index correspond- a long time domain and high temporal resolution usually
ing to various pollutants. Every subscript p indicates the compresses local features. It can make domain experts easily
corresponding pollutant. ignore important local time-varying information and inter-
AQI is the maximum value of IAQI corresponding to fere with the summary of related air pollution phenomena.
multiple pollutants. Every pollutant has a different target
concentration limit for the IAQI p evaluation method. Its cal-
culation formula is shown below. R4: Interactive exploratory analysis for air quality data.
Large-scale air pollution data often contain many important
IAQIh − IAQIl   potential patterns. These details cannot be fully illustrated
IAQI p = C p − BPl + IAQIl (2)
BPh − BPl by a single visualization diagram. It requires both subjective
analysis experience and specialized background knowledge
where C p denotes the pollutant p concentration value, BPh from domain experts. At the same time, professional ana-
denotes the high value of the pollutant concentration limit lysts should conduct the analysis to extract important pattern
similar to C p , BPl denotes the low value of the pollutant con- information from massive air pollution data.
centration limit similar to C p , IAQIh denotes the air quality
sub-index corresponding to BPh , and IAQIl denotes the air
quality sub-index corresponding to BPl . 4 Transport patterns extraction
3.3 Requirements analysis To assist domain experts in accurately extracting important
transport patterns from large-scale air pollution data, the
In recent years, air quality stations have collected a large two-stage cluster analysis method is designed (Fig. 1). This
amount of high-dimensional air quality data with time-series method can extract pollutant transport patterns through pre-
characteristics. It is a challenging task to accurately analyze cluster analysis and formal cluster analysis, ranging from
such large-scale data and provide effective suggestions to coarse-grained to fine-grained. The method can obtain the
experts. To this end, we have investigated related work in destination results using two cluster analysis methods. It can
the field and summarized the important needs at the current mine the complex structure of large-scale data.
stage.
Pre-cluster Analysis The main purpose of the pre-cluster
R1: Visualization for large-scale air pollution data. Air analysis is to acquire coarse-grained features from the pollu-
pollution data is a kind of data with typical spatio-temporal tant trajectories during a specific time range. The HYSPLIT
information. Its data visualization is also the most main- model has high accuracy and interpretability in analyzing
stream analysis method in the field at present. The effec- the pollutant transport problem. This model is used to calcu-
tiveness of these methods will be seriously reduced with the late the transport trajectories from air pollution. In this paper,
increasing scale of pollution data. How to clearly and intu- the pre-cluster analysis is calculated by density-based spatial
itively display data with spatio-temporal characteristics is a clustering of applications with noise (DBSCAN) [39]. The
current hot issue. purpose of the pre-cluster analysis is to let the more dense
R2: Extraction for pollutant transport patterns. Cross- transport trajectories be classified into a large cluster. The
regional pollutant transport is an important factor that affects similarity sI (i, j) between transport trajectory i and trans-
the pollution dynamics in different regions. This diffusion
is the result of multiple factors, including humans, topog- HYSPLIT
raphy, climate, and even pollutants themselves. Numerical Pollution Data Trajectories
empirical analysis and geometric property analysis are the Input generation
common indicators used to analyze pollutant diffusion at this
stage. Although such methods can explain the existing pol- DBSCAN
lution phenomenon to some extent, there are large errors in
the analysis process. The utility is low in scenarios where Fine-grained Coarse-grained
precision is required for the analysis’s results. Clustering Clustering
AP
R3: Custom analysis for long-time series data. Pollu-
tant changes in small areas or locations usually have more Fig. 1 Pipeline for two-stage cluster analysis

123
Visual analysis of air pollution spatio-temporal patterns 3719

port trajectory j is calculated by the following equation:


 2
sI (i, j) = L p (i, t) − L p ( j, t) (3)
1≤t≤l

where l denotes the number of moments on each object trans-


port trajectory, and L p (i, t) denotes the trajectory position
at moment t on the i-th trajectory for pollutant p.

Formal Cluster Analysis The main purpose of formal clus-


ter analysis is to extract contaminant transport patterns from
the coarse classification results. To avoid the uncertainty of
manually specifying the cluster number, this paper imple-
ments formal clustering analysis by affinity propagation (AP)
clustering [40]. The final cluster centers are defined as the
extracted transport patterns. Since formal cluster analysis is
more accurate than pre-cluster results, the similarity mea-
sure also needs to consider pollutant information on different
transport trajectories at different moments and different loca-
tions. Therefore, the similarity sII (i, j) between transport
trajectory i and transport trajectory j in the formal cluster
analysis is calculated by the Equ. 4.


 2
sII (i, j) = β L p (i, t) − L p ( j, t)
1≤t≤l

+ (1 − β) (L c (i, t) − L c ( j, t))2
1≤t≤l
(4)

where l denotes the length of every transport trajectory,


L p (i, t) denotes the trajectory position at moment t on the
i-th transport trajectory for pollutant p, L c (i, t) denotes the
pollutant information corresponding to the trajectory posi-
tion at moment t on the i-th transport trajectory, and β is
used to calculate the weight between the transport trajec-
tory distance and pollutant information. In particular, every 5.1 Index construction
sII (i, j) corresponding to the i-th transport trajectory and the
j-th transport trajectory belongs to a coarse-grained cluster. In this paper, the balanced multinomial tree is used to con-
struct the index. Any internal node in the index has at most m
children (m ≥ 1). Each internal node in the index is associ-
5 Long-time series sketch match analysis ated with the MBTS and a child node ID. The MBTS area of
each internal node should be much smaller than the MBTS
In general, domain experts are often interested in sub-series area corresponding to the replacement of other child nodes at
data with certain characteristics of long-time series data. the same level. The same is true for leaf nodes and sub-time
Exploring such data are important for domain experts to series. Therefore, the MBTS area of each node at every level
analyze the problem. This paper constructs a query index should be more compact, from root nodes to leaf nodes. All
based on the minimum bounding time series (MBTS) [41] leaf nodes are at the bottom of the index and have the same
and largest triangle three buckets (LTTB) algorithm [42]. height.
This index supports different time scales and can analyze Before executing the query, the original long-time series
important trends from long-time series efficiently. data L T first needs to be segmented according to the length of

123
3720 J. Li, C. Bi

5.2 Custom time-scale sketch match

In sketch matching, nodes are recursively traversed from the


root to the leaf nodes. The traversal method is the depth first
search (DFS) algorithm. A result queue of length k needs
to be maintained in each iteration. Firstly, the node type is
checked. If the current node is an internal node, the nearest
Fig. 2 LTTB algorithm’s downsampling effect on long-time series data nodes are saved to the result queue based on the query order
according to the similarity distance until the queue is filled.
The similarity distance is the Euclidean distance. If there
are already k internal nodes in the result queue at this point,
the current internal node only needs to be replaced with one
with a smaller similarity distance than the k-th node in the
result queue. This index is recursively traversed from the
nearest child nodes. When the number of sub-time series in
the result queue is k, the visited node needs to be pruned. If
the current node is a leaf node, it is handled in the same way
10ĂĂ01 00ĂĂ01 ĂĂ 00ĂĂ00 11ĂĂ00
as the internal node. In the comparison process, pruning the
index can improve match efficiency. The detailed principle
10ĂĂ01 00ĂĂ01 ĂĂ 00ĂĂ00 11ĂĂ00
is shown in Algorithm 1.
Fig. 3 Principle of sub-time series binary encoding

6 View and interaction design

The visual analysis framework contains four core modules:


global overview, spatial-temporal dynamics view, sketch
match view, and transport pattern view. Users can switch
between the different views via buttons and interactions. In
this section, we will introduce the related views and interac-
tions.

6.1 Global overview module


Fig. 4 An example about index construction
The main purpose of the global basic view module is to give
experts an overview of air pollutants in the country. This
module helps domain experts identify important times before
the sketch data Q. Then, the key feature points are extracted proceeding to the next step in their analyses. This module can
based on the custom time scale in each sub-time series. The be divided into the yearly view (Fig. 6B), the monthly view
downsampling method is the LTTB algorithm. This algo- (Fig. 5a), and the daily view (Fig. 5b).
rithm can well-preserve the original data detail information
and extract the time variation patterns at different time scales. 3UT :[Y =KJ :N[X ,XO 9GZ 9[T 
 
A downsampling effect is shown in Fig. 2. In this paper,  
 
binary encoding is used to mark the sub-time series data after  

downsampling. The encoding principle is shown in Fig. 3.  


Considering the efficiency of index construction, the paper  
uses the quick sort algorithm to sort the encoded feature infor-  
mation. This method will significantly improve the accuracy
 
of matching. We adopt the bottom-to-top method to con-  

struct the index, and the structure example is shown in Fig. 4. 


  


In particular, the maximum capacity μmax and the minimum / // /// /< < </ / // /// /< < </

capacity μmin are set for each node during the construction (a) Monthly view (b) Daily view
process. The purpose is to avoid having too many or too few
sub-time series contained in a node. Fig. 5 Global overview module’s other two views

123
Visual analysis of air pollution spatio-temporal patterns 3721

Fig. 6 Visual analysis framework interface. The control panel (A) is tics view (C) allows users to calculate statistical data for grid points
where users input view parameters. The overview view (B) displays a by month or day. The query results view (I) may display relevant time-
yearly, monthly, and daily overview of national pollution. The map view varying data after matching the sketch features in the sketch view (K)
(D) shows spatial information. The area data view (G) and area wind with the time-series data in the time series view (J). The pattern view
view (H) display extensive data within the analysis area. The colored (E) displays the transport trajectory clustering results. The circular view
points examine the relationship between several elements. The statis- (F) can present time-varying data for key grid locations

In this paper, six pollution levels are classified, i.e., excel- 6.2 Spatio-temporal dynamics view module
lent, good, light, moderate, heavy, and severe. The yearly
view encodes the six pollution levels into different colors for Spatial and temporal analysis focuses on spatial and temporal
the central square. The number of each level is displayed visualization for air pollutant data or weather data at a specific
in different central square widths. The square’s horizontal time and in a specific region. This module consists of three
coordinates indicating the year and the square’s vertical coor- parts: map view (Fig. 6D), area data view (Fig. 6G), and area
dinates indicating the month. The monthly view reflects the wind view (Fig. 6H).
percentages for different pollutant levels in the current month The map view spatially visualizes pollutant data and
with a calendar chart and a pie chart. The number of pollu- weather data based on location information. In particular,
tion levels is represented by the area in the pie chart for every the wind data are dynamically rendered with WebGL. The
day in the current month. The daily view is divided into 24 other factor data is represented in the map view by scatter
parts in a clockwise direction from the top, representing 24 h. points, with the corresponding colors indicating the values.
Each sector is colored by six different levels from inside to The area data view shows the pollutant data and meteorolog-
outside, and the number of pollutant levels is represented by ical data in the selected region. The color of the folded line
the area. indicates the AQI level at the current grid point. Under the
This module’s initial state shows the yearly view. The user area data view, the dots’ color encodes the correlation in the
can select a specific month from the yearly view after obser- correlation view. The darker color indicates a higher corre-
vation. Then, the module will switch to the corresponding lation, and the correlation between PM2.5 and other factors
monthly view. This view supports mouse hover operation. is shown by default. The area wind view uses the polar coor-
Clicking on a sector in the pie chart will display the corre- dinate system to display wind information in the selected
sponding daily view. Clicking on a sector in the daily view region. The centripetal line in this view indicates the wind
will update the air pollution data in the map view. direction at the current moment on the grid point. The length
and color opacity indicate the wind value on the grid point.

123
3722 J. Li, C. Bi

Users can input time and factors through the control panel 6.4 Transport pattern view module
module to update the map view. To avoid visual distractions
caused by displaying too many points at once, the map view The transport pattern view module mainly includes three
supports a legend-directed filter. Users can show or hide eli- views: pattern view (Fig. 6E), statistics view (Fig. 6C) and
gible grid points using the legend. After the map view is circular view (Fig. 6F). The pattern view is a visualization
rendered, users can enter boundary data in the control panel chart of the transport trajectories after cluster analysis. The
module. The map view will respond to the user’s action and nodes represent the nearest grid points on the transport trajec-
update the data in the area data view and the area wind view. tories. Its size indicates the number of common trajectories,
Hovering over the map view scatter points will pop up spe- and the color indicates the cluster category. The links’ width
cific values for the current moment. Clicking on the grid point indicates the number of common trajectories, and the color
will update the long-time series data for the sketch match. In indicates the cluster category. The statistics view shows the
particular, users can analyze the correlation between the cur- box plots in different years with monthly or daily resolution.
rent factor and other factors by clicking on the dots under the The circular view displays the factor data over a specific
area data view. time range using the circle. This view has six layers from
inside to outside, with every layer having the same height
and different visual coding. The first layer maps the current
6.3 Sketch match view module wind direction, humidity, barometric pressure, and temper-
ature on the grid point with a straight arrow with direction,
The sketch match view module focuses on matching time- width, opacity, and color. The second layer is divided into six
varying patterns at different time scales for long time-series equal parts in a clockwise direction, with each sector repre-
data corresponding to specific factors at specific grid points. senting one pollutant. Each sector height indicates IAQI on
This module includes three views: time series view (Fig. 6J), the grid point at the current time. The third layer plots wind,
sketch view (Fig. 6K) and query result view (Fig. 6I). relative humidity, temperature, and atmospheric pressure in a
The time series view will plot the long-time series data clockwise direction, starting from the top, with four different
with the minimum temporal resolution for the specific fac- colors. The fourth layer, fifth layer, and sixth layer display
tor. This part introduces the “Focus-Contex” to assist the the information by radius angle from the second layer. The
analysis. Both the focus and contex components will display fourth layer analyzes the pollutant contribution with a spe-
the long-time series data by default. The sketch view is used cific time range and a specific time resolution. The height
to input the sketch data by setting different buttons. This data of the bar indicates the contribution rate, and the color indi-
will be used for sub-time series data queries. The query result cates the pollutant. The fifth layer shows the percentage of
view shows the detailed query results, which show the top pollution level for the different pollutants in the circular stack
k results in terms of similarity by default. Each result view chart. The arc color indicates the pollutant level. The sixth
will plot the original data. The distance similarity is marked layer shows detailed data for each pollutant during the anal-
in the top left corner and the specific time information at the ysis time. Each pollutant level at each moment is coded with
bottom. The query results are also highlighted in the time a different color dot.
series view. After selecting the grid points to be analyzed and setting
The user can update the time series view by clicking on various model parameters in the control panel, the framework
a grid point in the map view, a line curve in the area data will automatically generate the transport trajectories. The
view, or a centripetal line in the area wind view. The contex pattern extraction algorithm will analyze transport trajecto-
component in the time series view shows the data’s overview ries, and the clustering results are visualized in the pattern
trend with gray-filled curves. Its focus component shows the view. After the user clicking on a node or an edge in the
data at the selected time from the contex component. The pattern view, the trajectories passing through the node or the
two components support hover and zoom. The left side of the edge will display in the map view. The statistics view also
sketch view contains four buttons. They are the draw button, supports legend filters, mouse hover, and scroll zoom.
the erase button, the reset button, and the query button from
top to bottom. The user could draw the corresponding sketch
curve line in the right panel by selecting the draw button. The 7 Case study
erase button will clear the local data, and the reset button will
completely clear the sketch data on the panel. After the user 7.1 Air pollution spatial dynamics analysis
enters the sketch data, the framework will query the sketch
pattern and generate results for the query result view. Users By observing the AQI distribution in Fig. 7 from mid-January
can analyze the results by dragging the scroll bar below the 2013 to mid-December 2013, we find that heavy pollution is
query result view. mainly distributed in the northern areas in the autumn and

123
Visual analysis of air pollution spatio-temporal patterns 3723

     









(a) Mid-Jan (b) Mid-Feb (c) Mid-Mar (d) Mid-Apr            

Fig. 10 PM2.5 time-varying in the northwest from 2013 to 2018

  

(e) Mid-May (f) Mid-Jun (g) Mid-Jul (h) Mid-Aug




 
 
 
 















 


(a) Pattern details (b) Cluster results (c) Point details
(i) Mid-Sep (j) Mid-Oct (k) Mid-Nov (l) Mid-Dec
Fig. 11 Results of a pollutant backtracking trajectory transport model
Fig. 7 AQI distribution in the middle of each month in 2013 with Beijing as a starting point

desert, and sandstorms often occur in the region throughout


(a) Mid-Nov (b) Mid-Dec (c) Mid-Jan (d) Mid-Feb the year, especially from March to May each year. Therefore,
this region always has heavy pollution during that time.
Fig. 8 Distribution of PM2.5 in the mid-November 2013, the mid-
December 2013, the mid-January 2014 and the mid-February 2014 The SO2 in Yinchuan was anomalous on January 14, 2013.
Combined with the map view (Fig. 6D) and the area wind
view (Fig. 6G), there is a chemical company near Yinchuan.
The SO2 in Yinchuan is elevated because of the wind direc-
tion and topography. There is a clear correlation between SO2
and CO in Yinchuan at this time.
(a) Mid-Nov (b) Mid-Dec (c) Mid-Jan (d) Mid-Feb Specifically, we analyze the transport patterns for the Bei-
Fig. 9 Distribution of PM10 in the mid-November 2013, the mid-
jing grid point. It generated 36-hour pollutant backward
December 2013, the mid-January 2014 and the mid-February 2014 transport trajectories at 300 m height. The analysis dates
were from November 10, 2014 to November 20, 2014. The
obtained transport trajectories were clustered and analyzed,
winter, especially in the northern coastal areas. The main and the results are shown in Fig. 11.
reason is that the northern autumn and winter are the heating During the analysis time, three transport patterns (Fig. 11a)
periods when it will emit a lot of pollutants. The unfavorable can be summarized: the southern, the western, and the north-
weather conditions in autumn and winter will also make the west. The northwest transport pattern spans the longest
environment’s self-regulation ability worse, making it more period. This pattern starts in Russia, passes through the Inner
likely to generate heavy pollution. Further analysis shows Mongolia Autonomous Region, and finally arrives in Beijing
that air pollution is concentrated in the Beijing–Tianjin– without a large transport deviation. From the circular view, it
Hebei region, the Yangtze River Delta region, and the Pearl can be seen that Beijing was heavily polluted from November
River Delta region. These regions are economically prosper- 17 to November 21. The PM2.5 increased sharply, indicating
ous and industrially developed. It further shows that human that the northwestern transport pattern may have a signifi-
activities have an important influence on air pollution. cant impact on Beijing’s air quality. The shorter length of the
Analyzing mid-November 2013, mid-December 2013, western transport pattern indicates that its horizontal velocity
mid-January 2014, and mid-February 2014 in Figs. 8 and is higher. This pattern will make pollutants accumulate in the
9, it can be found that PM2.5 and PM10 are the most widely North China Plain, where the wind speed is lower. It further
distributed. They are mainly found in highly developed and aggravates the pollution in the Beijing–Tianjin–Hebei region.
densely populated places. The southern transport pattern also has a shorter transport dis-
In particular, the northwest region experienced more tance. So it is able to carry pollutants from other regions to
severe air pollution in the second and third quarters (Fig. 10). Beijing, which should also be an important reason for the
Further analysis revealed that this region is located in the severe pollution in Beijing.

123
3724 J. Li, C. Bi

(a) Custom sketch (b) Matching results

Fig. 12 Custom sketch and matching results (a) PM2.5

7.2 Air pollution temporal dynamics analysis

By analyzing the yearly view (Fig. 6B), we can find that


air pollution shows more obvious seasonal trends. Severe air
pollution is mainly concentrated in the autumn and winter
each year, and better air quality is mainly concentrated in the (b) PM10
spring and summer. Pollution tends to decrease first and then
increase, peaking in January and December (Fig. 12).
Taking the Beijing grid point as an example (Fig. 13), O3
shows a convex trend from 2013 to 2018. The concentrations
reached their maximum in the second and third quarters.
The concentrations reached their minimum in the first and
fourth quarters. The other five pollutants show a concave
trend, peaking in the first and fourth quarters and reaching (c) SO2
the minimum in the second and third quarters. A variety of
pollutants show a decreasing trend year by year in six years,
with SO2 decreasing most significantly and O3 showing an
increasing trend.
In particular, O3 showed a trend similar to the sketch
(Fig. 12a) around the Chinese Spring Festival in 2014
(Fig. 12b). Analyzing the O3 changes before and after each
(d) NO2
Chinese Spring Festival, it can be observed that the O3 change
trend is generally the same (Fig. 14). The O3 peak is increas-
ing with a trend of getting worse year by year.

8 Conclusion

To meet expert needs for analyzing large-scale pollution (e) CO


data, we introduce a visual analysis framework with multi-
ple views and friendly interactions. This framework includes
transport pattern extraction, long-time series analysis, and
spatio-temporal dynamics exploration. The work will bet-
ter help experts mine and understand potential information
comprehensively.
But there are still many improvements to be made. First,
combine with WRF or other models to achieve high-accuracy (f) O3
analysis on a large scale. Second, to further expand the anal-
Fig. 13 Time-varying information for six pollutants from 2013 to 2018
ysis factors and make the analysis results easier to interpret.
Third, to improve the efficiency of the analysis of long-time
series data with machine learning or deep learning.

123
Visual analysis of air pollution spatio-temporal patterns 3725

Sci. Total Environ. 739, 139755 (2020). https://doi.org/10.1016/j.


scitotenv.2020.139755
11. Vander Hoorn, S., Johnson, J.S., Murray, K., Smit, R., Heyworth,
J., Lam, S., Cope, M.: Emulation of a chemical transport model to
assess air quality under future emission scenarios for the southwest
of western australia. Atmosphere (2022). https://doi.org/10.3390/
atmos13122009
12. Tao, H., Xing, J., Pan, G., Pleim, J., Ran, L., Wang, S., Chang, X.,
Li, G., Chen, F., Li, J.: Impact of anthropogenic heat emissions on
meteorological parameters and air quality in Beijing using a high-
resolution model simulation. Front. Environ. Sci. Eng. 16(4), 1–11
(2022). https://doi.org/10.1007/s11783-021-1478-3
13. Kim, M.: Visualization of dynamic network evolution with quan-
tification of node attributes. IEEE Trans. Netw. Sci. Eng. 8(3),
2316–2325 (2021). https://doi.org/10.1109/TNSE.2021.3087334
14. Feng, Z., Li, H., Zeng, W., Yang, S.-H., Qu, H.: Topology density
map for urban data visualization and analysis. IEEE Trans. Vis.
Fig. 14 O3 information every year around Chinese New Year from
Comput. Graph. 27(2), 828–838 (2021). https://doi.org/10.1109/
2013 to 2018
TVCG.2020.3030469
15. Linhares, C.D., Ponciano, J.R., Paiva, J.G.S., Travençolo, B.A.,
Rocha, L.E.: A comparative analysis for visualizing the temporal
Acknowledgements This work was partially supported by the National evolution of contact networks: a user study. J. Vis. 24(5), 1011–
Key R&D Program of China under Grand No. 2021YFE0108400 and 1031 (2021). https://doi.org/10.1007/s12650-021-00759-x
partly supported by the National Natural Science Foundation of China 16. Meidiana, A., Hong, S.-H., Eades, P.: New quality metrics for
under Grant No. 62172294. dynamic graph drawing. In: International Symposium on Graph
Drawing and Network Visualization, pp. 450–465 (2020). https://
doi.org/10.1007/978-3-030-68766-3_35
17. Ponciano, J.R., Linhares, C.D.G., Rocha, L.E.C., Faria, E.R.,
References Travençolo, B.A.N.: Combining clutter reduction methods for
temporal network visualization. In: Proceedings of the 37th
1. Xu, X., Yang, H., Li, C.: Theoretical model and actual char- ACM/SIGAPP Symposium on Applied Computing. SAC ’22, pp.
acteristics of air pollution affecting health cost: a review. Int. 1748–1755. Association for Computing Machinery, New York, NY,
J. Environ. Res. Public Health (2022). https://doi.org/10.3390/ USA (2022). https://doi.org/10.1145/3477314.3507018
ijerph19063532 18. Liu, Z., Huang, C., Yu, Y., Dong, J.: Motif-preserving dynamic
2. Pandya, S., Gadekallu, T.R., Maddikunta, P.K.R., Sharma, R.: A attributed network embedding. In: Proceedings of the Web Confer-
study of the impacts of air pollution on the agricultural community ence 2021. WWW ’21, pp. 1629–1638. Association for Computing
and yield crops (indian context). Sustainability (2022). https://doi. Machinery, New York, NY, USA (2021). https://doi.org/10.1145/
org/10.3390/su142013098 3442381.3449821
3. Bachechi, C., Po, L., Rollo, F.: Big data analytics and visualization 19. Sabarish, B.A., Karthi, R., Kumar, T.G.: Graph similarity-based
in traffic monitoring. Big Data Res. 27, 100292 (2022). https://doi. hierarchical clustering of trajectory data. Procedia Comput. Sci.
org/10.1016/j.bdr.2021.100292 171, 32–41 (2020). https://doi.org/10.1016/j.procs.2020.04.004
4. Blázquez-García, A., Conde, A., Mori, U., Lozano, J.A.: A review 20. Feng, K., Wang, P., Wu, J., Wang, W.: L-match: a lightweight and
on outlier/anomaly detection in time series data. ACM Comput. effective subsequence matching approach. IEEE Access 8, 71572–
Surv. (2021). https://doi.org/10.1145/3444690 71583 (2020). https://doi.org/10.1109/ACCESS.2020.2987761
5. Guo, Y., Guo, S., Jin, Z., Kaul, S., Gotz, D., Cao, N.: Survey on 21. Fernandez, I., Manglik, A., Giannoula, C., Quislant, R., Man-
visual analysis of event sequence data. IEEE Trans. Vis. Comput. souri Ghiasi, N., Gómez-Luna, J., Gutierrez, E., Plata, O., Mutlu,
Graph. 28(12), 5091–5112 (2022). https://doi.org/10.1109/TVCG. O.: Accelerating Time Series Analysis via Processing using Non-
2021.3100413 Volatile Memories. arXiv e-prints (2022) https://doi.org/10.48550/
6. Taylor, G.I.: I. eddy motion in the atmosphere. Philos. Trans. R. arXiv.2211.04369
Soc. Lond. Ser. A Contain. Pap. Math. Phys. Character 215(523- 22. Li, Z., Guo, J., Li, H., Wu, T., Mao, S., Nie, F.: Speed Up Simi-
537), 1–26 (1915). https://doi.org/10.1098/rsta.1915.0001 larity Search of Time Series Under Dynamic Time Warping, vol.
7. Taylor, G.I.: Diffusion by continuous movements. Proc. Lond. 7, pp. 163644–163653 (2019). https://doi.org/10.1109/ACCESS.
Math. Soc. 2(1), 196–212 (1922). https://doi.org/10.1112/plms/ 2019.2949838
s2-20.1.196 23. Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover,
8. Liu, Y., Wang, T.: Worsening urban ozone pollution in china from B., Zhu, Q., Zakaria, J., Keogh, E.: Searching and mining trillions of
2013 to 2017—part 2: the effects of emission changes and impli- time series subsequences under dynamic time warping. KDD ’12,
cations for multi-pollutant control. Atmos. Chem. Phys. 20(11), pp. 262–270. Association for Computing Machinery, New York,
6323–6337 (2020). https://doi.org/10.5194/acp-20-6323-2020 NY, USA (2012). https://doi.org/10.1145/2339530.2339576
9. Dong, Z., Wang, S., Xing, J., Chang, X., Ding, D., Zheng, H.: 24. Linardi, M., Palpanas, T.: Scalable data series subsequence match-
Regional transport in beijing-tianjin-hebei region and its changes ing with ulisse. VLDB J. 29(6), 1449–1474 (2020). https://doi.org/
during 2014–2017: the impacts of meteorology and emission reduc- 10.1007/s00778-020-00619-4
tion. Sci. Total Environ. 737, 139792 (2020). https://doi.org/10. 25. Barandas, M., Folgado, D., Fernandes, L., Santos, S., Abreu, M.,
1016/j.scitotenv.2020.139792 Bota, P., Liu, H., Schultz, T., Gamboa, H.: Tsfel: time series feature
10. Ballesteros-González, K., Sullivan, A.P., Morales-Betancourt, R.: extraction library. SoftwareX 11, 100456 (2020). https://doi.org/
Estimating the air quality and health impacts of biomass burn- 10.1016/j.softx.2020.100456
ing in northern south America using a chemical transport model.

123
3726 J. Li, C. Bi

26. Gao, J., Song, X., Wen, Q., Wang, P., Sun, L., Xu, H.: RobustTAD: 41. Chatzigeorgakidis, G., Skoutas, D., Patroumpas, K., Athanasiou,
Robust Time Series Anomaly Detection via Decomposition and S., Skiadopoulos, S.: Indexing geolocated time series data. In: Pro-
Convolutional Neural Networks. arXiv e-prints (2020) https://doi. ceedings of the 25th ACM SIGSPATIAL International Conference
org/10.48550/arXiv.2002.09545 arXiv:2002.09545 on Advances in Geographic Information Systems. SIGSPATIAL
27. Ceci, M., Corizzo, R., Japkowicz, N., Mignone, P., Pio, G.: Echad: ’17. Association for Computing Machinery, New York, NY, USA
embedding-based change detection from multivariate time series (2017). https://doi.org/10.1145/3139958.3140003
in smart grids. IEEE Access 8, 156053–156066 (2020). https://doi. 42. Wen, M., Ma, Y., Zhang, W., Tian, Y., Wang, Y.: High-resolution
org/10.1109/ACCESS.2020.3019095 load profile clustering approach based on dynamic largest triangle
28. Boniol, P., Palpanas, T.: Series2Graph: Graph-based Subsequence three buckets and multiscale dynamic warping path under limited
Anomaly Detection for Time Series. arXiv e-prints, pp. 2207– warping path length. J. Mod. Power Syst. Clean Energy (2022).
12208 (2022) https://doi.org/10.48550/arXiv.2207.12208 https://doi.org/10.35833/MPCE.2022.000386
29. Liu, D., Veeramachaneni, K., Geiger, A., Li, V.O.K., Qu, H.:
AQEyes: Visual Analytics for Anomaly Detection and Examina-
tion of Air Quality Data. arXiv e-prints (2021) https://doi.org/10.
Publisher’s Note Springer Nature remains neutral with regard to juris-
48550/arXiv.2103.12910
dictional claims in published maps and institutional affiliations.
30. Deng, Z., Weng, D., Chen, J., Liu, R., Wang, Z., Bao, J., Zheng,
Y., Wu, Y.: Airvis: visual analytics of air pollution propagation.
Springer Nature or its licensor (e.g. a society or other partner) holds
IEEE Trans. Vis. Comput. Graph. 26(1), 800–810 (2020). https://
exclusive rights to this article under a publishing agreement with the
doi.org/10.1109/TVCG.2019.2934670
author(s) or other rightsholder(s); author self-archiving of the accepted
31. Lu, W., Ai, T., Zhang, X., He, Y.: An interactive web mapping visu-
manuscript version of this article is solely governed by the terms of such
alization of urban air quality monitoring data of china. Atmosphere
publishing agreement and applicable law.
(2017). https://doi.org/10.3390/atmos8080148
32. Kalo, M., Zhou, X., Li, L., Tong, W., Piltner, R.: Chapter 8—
sensing air quality: spatiotemporal interpolation and visualization
of real-time air pollution data for the contiguous united states,
Jiayang Li received the BSc (Eng.)
pp. 169–196 (2020). https://doi.org/10.1016/B978-0-12-815822-
from Chongqing University of
7.00008-X
Posts and Telecommunications in
33. Qu, D., Lin, X., Ren, K., Liu, Q., Zhang, H.: Airexplorer: visual
2020. Now, he is pursuing the
exploration of air quality data based on time-series querying.
ME degree in Computer Technol-
J. Vis. 23(6), 1129–1145 (2020). https://doi.org/10.1007/s12650-
ogy with College of Intelligence
020-00683-6
and Computing, Tianjin Univer-
34. Chen, P.: Visualization of real-time monitoring datagraphic of
sity. His main research interest is
urban environmental quality. Eurasip J. Image Video Process.
visual analysis.
2019(1), 1–9 (2019). https://doi.org/10.1186/s13640-019-0443-6
35. Yang, X., Peng, H., Zhang, Q.: Visual analysis of heterogenous air
pollution data. In: Proceedings of the 4th International Conference
on Computer Science and Software Engineering. CSSE ’21, pp.
300–306. Association for Computing Machinery, New York, NY,
USA (2021). https://doi.org/10.1145/3494885.3494940
36. Kong, L., Tang, X., Zhu, J., Wang, Z., Li, J., Wu, H., Wu, Q., Chen,
H., Zhu, L., Wang, W., Liu, B., Wang, Q., Chen, D., Pan, Y., Song,
T., Li, F., Zheng, H., Jia, G., Lu, M., Wu, L., Carmichael, G.R.: Chongke Bi received the BSc
A 6-year-long (2013–2018) high-resolution air quality reanalysis (Eng.) and MSc (Eng.) degrees
dataset in china based on the assimilation of surface observations from Shandong University in 2004
from cnemc. Earth Syst. Sci. Data 13(2), 529–570 (2021). https:// and 2007, respectively, and the
doi.org/10.5194/essd-13-529-2021 PhD (Sci.) degree from The Uni-
37. Jooybari, S.A., Peyrowan, H., Rezaee, P., Gholami, H.: Evalua- versity of Tokyo, Japan, in 2012.
tion of pollution indices, health hazards and source identification From 2012 to 2016, he was a
of heavy metal in dust particles and storm trajectory simulation Researcher with RIKEN, Japan,
using hysplit model (case study: Hendijan center dust, southwest where he focused on the research
of iran). Environ. Monit. Assess. 194(2), 107 (2022). https://doi. in the field of visual analysis of
org/10.1007/s10661-022-09760-9 large-scale simulation on super-
38. Zaib, S., Lu, J., Bilal, M.: Spatio-temporal characteristics of air computer. He is currently an asso-
quality index (aqi) over northwest china. Atmosphere (2022). ciate professor in the college of
https://doi.org/10.3390/atmos13030375 intelligence and computing, Tian-
39. Ouyang, T., Shen, X.: Online structural clustering based on dbscan jin University. His current research
extension with granular descriptors. Inf. Sci. 607, 688–704 (2022). interests include data visualization, machine learning, and high perfor-
https://doi.org/10.1016/j.ins.2022.06.027 mance computing.
40. Chen, J., Chen, B.: Development of driving cycle for light vehicle
based on the ap clustering method. In: Computational and Exper-
imental Simulations in Engineering: Proceedings of ICCES 2020,
vol. 2, pp. 495–506 (2021). https://doi.org/10.1007/978-3-030-
67090-0_40

123

You might also like