GraphDecoder Recovering Diverse Network Graphs From Visualization Images Via Attention-Aware Learning
GraphDecoder Recovering Diverse Network Graphs From Visualization Images Via Attention-Aware Learning
This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2022.3225554
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 1
Abstract—DNGs are diverse network graphs with texts and different styles of nodes and edges, including mind maps, modeling
graphs, and flowcharts. They are high-level visualizations that are easy for humans to understand but difficult for machines. Inspired
by the process of human perception of graphs, we propose a method called GraphDecoder to extract data from raster images. Given
a raster image, we extract the content based on a neural network. We built a semantic segmentation network based on U-Net. We
increase the attention mechanism module, simplify the network model, and design a specific loss function to improve the model’s
ability to extract graph data. After this semantic segmentation network, we can extract the data of all nodes and edges. We then
combine these data to obtain the topological relationship of the entire DNG. We also provide an interactive interface for users to
redesign the DNGs. We verify the effectiveness of our method by evaluations and user studies on datasets collected on the internet
and generated datasets.
Index Terms—Information visualization, Chart mining, Semantic segmentation, Network graph, Attention mechanism
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: ASTAR. Downloaded on February 08,2023 at 04:50:38 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Visualization and Computer Graphics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2022.3225554
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2
A Input
Start
Start
Size
Size
Return error
Return error Return error and end
and end and end
Return error
and end Assert that tlb
Assert that tlb and pagetable
and pagetable
End
Different
conversion
Different
methods
conversion
methods
Get the corresponding Return error Return error Get the corresponding
page table entries and end and end page table entries
The writing
logo is set
Physical address is
greater than the actual
physical address
Readonly
or not
The writing
logo is set
C Input Output Output
Benefits
Overview
Memorize
Returns
Overview Projects
BusErrorException Simple
Projects
Set the table entry Memorize Benefits Planning Goals
in use flag
Returns Set the table
BusErrorException entry in use flag
Planning Goals
Simple Strategies
Strategies
Calculate the
Physical address
Calculate the MIND MAPPING MIND MAPPING
Physical address
Teamwork
Teamwork Ideas
Productivity Sharing
Assert that the physical
address plus size will not
Assert that the physical Sharing Productivity Creativity Innovation
exceed the memory size
address plus size will not Colleagues
exceed the memory size
Returns
Ideas
NoException Colleagues Thoughts
PNG/JPG PNG/SVG/JSON
End Creativity Innovation
Fig. 1. The graph decoder can extract DNG data from raster images and automatically retarget them. Our method can
be applied to many DNGs, including flowcharts (A), hierarchical diagrams (B), model graphs, hand-drawn sketches,
and mind maps (C).
network relational data. Our method is robust to the and a pipeline to extract DNGs.
edges of polylines and curves and supports three types (3) We put forward multiple evaluations to demon-
of nodes: rectangles, diamonds, and ellipses. We also strate that our method is effective and robust.
provide users with an interactive system. After the
user uploads a raster image, the system extracts its
underlying data. Users can redesign and modify data on
2 R ELATED W ORK
the system interface. The system can be applied in many Our work is mainly related to three aspects: chart data
scenarios, such as mind maps, flowcharts, E-R diagrams, extraction, chart redesign, and attention mechanism.
and hierarchical structure diagrams.
We perform ablation experiments on our semantic 2.1 Chart Data Extraction
segmentation model; the results show that our neural
Most visualization charts are represented as static im-
network greatly improves the ability to extract edges
ages. The purpose of chart data extraction is to extract
from graphs. We also have applied structural and per-
the original data from the bitmap. Different from some
ceptual similarity evaluations on the real corpus. After
figure extraction works [24] which focus on locating the
redrawing the extracted results with AntV [21], we
position of the image in the paper and figure classifica-
compare them with the input images. We use NetSim-
tion, chart data extraction focuses on extracting original
ile [22] to evaluate the structural similarity and the
data from figures. Chart data extraction helps designers
visual saliency map [23] to evaluate the perceptual
redesign and modify charts. There are many studies
similarity. We established an additional corpus with
on chart data extraction for simple charts such as bar
different resolutions and scales of DNGs to test the
charts, line charts, and pie charts. For bar graphs, most
robustness of our method for different scales of network
methods [9, 25, 26, 27] use the analysis of connected
structure and pixels. Our experimental results and user
components to extract bars in the charts. These meth-
study show that our method offers great application
ods are only effective for solid single-color bars. For
potential in the redesign and modification of DNGs. We
pie charts, the existing methods [9, 13, 25, 28, 29] are
share differences between deep learning and heuristics.
based on the assumption of a solid pie chart without a
We also illustrate the experience of visualization image
three-dimensional effect. These methods mainly detect
segmentation compared with natural images. Finally,
each slice by analyzing connected components [13] or
we discuss the limitations of this paper and future
deep learning methods [28, 29]. Other methods [9, 25]
work. Some of the limitations introduced by data-driven
use curve fitting to locate each pie slice. Few methods
models can be explored in correcting the chart norm.
support line charts because of the difficulty of extracting
Our contributions include three aspects:
lines. Some methods [30, 31] can only extract line charts
(1) We defined the problem of data extraction from with a single line. Some recent studies [6, 28, 29] used
DNGs. We verified the practical importance of this deep neural networks for object detection to detect
study in various application domains. bars, showing better performance than traditional im-
(2) We designed a state-of-the-art deep learning model age processing. The latest methods use deep neural
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: ASTAR. Downloaded on February 08,2023 at 04:50:38 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Visualization and Computer Graphics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2022.3225554
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 3
networks to learn marked patches in the chart [15] output value of these methods is one-dimensional (e.g.,
or detect key points in the chart [16], but they are the length of one line, the area of one circle, the angle
still not robust enough for multiple lines and lines of between two sides [47], or the number of nodes and
different thicknesses. Mao et al. [32] proposed a method edges [51]). However, our task needs to output the
of extracting data from meteorological facsimile charts. topological relationship of N 2 , where N represents the
Poco et al. [33] focused on heatmaps with legends. Wu number of nodes. There are also many visual attributes
et al. [34] introduced a method to handle contour maps. (e.g., color map, node shape, node position). After pre-
There are also some chart extraction frameworks and liminary experiments, we find that the semantic seg-
tools that are semiautomatic, such as ChartSense [14], mentation method is more promising. Neural networks
DataThief [35], iVoLVER [36], Plot Digitizer [37], Da- for semantic segmentation mainly include FCN [52], U-
gra [38] and Engauge Digitizer [39]. These tools use Net [20], and SegNet [53]. These models have achieved
human-supplied data to improve extraction accuracy, good results in the task of segmentation of natural
including chart types, key point positions, and line scenes. However, for chart images, the segmentation
positions. There are also some extraction works for task requires higher accuracy because tiny pixel errors
specific charts, such as ChemGrapher [40], a framework may result in large differences. Wang et al. [54] proposed
for extracting compounds. Some works [41, 42] are the application of the attention mechanism in computer
focused on science textbooks. The difference between vision to make the model learn the areas that are impor-
science textbooks and DNGs is that their objects are tant to reduce errors. SENet [55] was the first to propose
the targets of natural images, not visualization charts. the channel attention mechanism of the SE module from
Chen et al. proposed a model [43] to extract the timeline. the channelwise level, which can adaptively adjust the
They proposed the GrabCut [44] method to improve the characteristic response value of each channel. Oktay
segmentation results. For the chart types above, DNG et al. [56] designed a module with attention, which
has more attributes. In addition to the structural at- prevented target discontinuity in medical images, such
tribute (topological relationship), DNGs also have other as those of the pancreas, and achieved better results.
visual attributes (e.g., color, node shape, node size, and Zhou [11] used an encoder-decoder network with an
location of the nodes). attention mechanism to complete the task of extracting
bar graphs. In this paper, we experimentally find that
The type of visualization most relevant to DNGs is the
the attention mechanism applied to medical image seg-
network graph. OGR [17] used morphological methods
mentation can successfully improve the segmentation of
to extract network graphs. However, its network graph
DNG images. We also discuss the difference between
definition has many limitations, such as solid circle
natural images and visualization images in Sec. 8.
nodes and charts without text. These methods require
manual adjustment of the binarization threshold and
weak anti-noise ability. Users need to adjust the thresh- 2.2 Chart Redesign
olds of binarization and morphological operations, so There are many charts in the real world that do not
the robustness ability of the method is weak. OGER [18] conform to the principles of visualization design, and
improved the edge detection ability of OGR and can chart extraction enhances these visualizations. [33] We
recognize dashed lines. However, it still has the same mainly discuss three principles related to the design
limitations as OGR. Some studies [45, 46] focused on of DNGs. First, color plays an important role in chart
graphs that are close to our definition of DNGs, but their design. Wang et al. [57] applied a knowledge-based
input was a set of stroke vectors. They included more model for learning color rules to visualize images. The
prior knowledge than static bitmaps. VividGraph [1] W3C standard [58] also specifies color rules for images
used U-Net to segment network graph images. How- on web pages. The main purpose of some chart mining
ever, due to the limitation of their training data and work [33] is to recolor the chart. Second, different repre-
untuned CNN (convolutional neural network) architec- sentations of DNGs will also bring different perceptions
tures, VividGraph only works for graphs with circular to users. The entity relationship diagram [3] has been
nodes and straight edges but no text. Compared with widely approved in the field of database design, using
their work, we design our model and semantic parsing three types of nodes and solid lines to represent data
module for wider application and higher accuracy, as relationships. E-R diagrams help database designers
shown in Sec. 7. build databases faster. Buzan et al. [4] first proposed
Optimizing CNNs for feature extraction is an impor- the mind map, which helps users efficiently express
tant step in using deep learning methods. Haehn et their divergent thinking. Flow charts [2] can clearly
al. [47] proposed the possibility of using CNNs to extract demonstrate algorithms in computer programming. Our
data from charts. They evaluated the performance of system provides different retargeting schemes for differ-
four models, including MLP, LeNet [48], VGG [49], ent kinds of DNGs. Third, obtaining original data allows
and Xception [50]. Giovannangeli et al. [51] further further visualization past the raster image. Interactive
corroborated this view. However, their methods are not DNGs can ensure better user experience. There are
suitable for high-dimensional data such as DNGs. The some other types of chart mining work [59, 60, 61] that
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: ASTAR. Downloaded on February 08,2023 at 04:50:38 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Visualization and Computer Graphics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2022.3225554
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 4
[([240,340,197,26,0],
'Analytics &Decisions',0.97),
CTPN CRNN ([768,340,214,26,0],
'Theoretical&Empirical’,1.00),
([238,486,203,22,0],
'Data Transformations',0.98),
Text Data Extraction ([500,487,117,26,0],
'VIS2022',0.97),
{"nodes": [{
"id": 0,
([719,488,117,25,0],
"x": 247,
'Applications’,1.00),
"y": 402,
([200,576,276,26,0],
"color": "#ffffff",
'Representations&lnteraction',0.98),
"label": "Analytics &Decisions",
([760,577,197,28,0],
'Systems&Rendering’,1.00)] Semantic "fontsize": 19,
"type": "rect",
Text Data Parsing "w": 244,
"h": 63}, …… ],
"links": [{
Input "id": 0,
Neural Network "source": 2,
"target": 0,
"color": "#d9d5d0"}, ……],
"backcolor": "#ffffff"}
Output
Semantic Map
Attention Map
Graphical Data Extraction
converted raster charts into vector diagrams to make We constructed a semantic segmentation network
the charts interactive, such as intelligent question and with an attention mechanism. This network can
answer, adding table notes, and adding auxiliary lines. accurately locate pixels at which the nodes and
In this work, we provide methods and pipelines for edges are located and classify various types of
extracting data from DNGs and corresponding inter- nodes. By adding attention modules and improving
faces and applications for automatic redesign. However, the objective function, the network is robust to
we do not contribute the design principles of DNGs, and continuous curves and polylines.
we assume that the target design is prior knowledge. • Semantic Parsing Module. By analyzing the con-
nected components of the data obtained in the
3 OVERVIEW previous two modules, we obtain the approximate
original data.
Our goal is to extract the original data from DNGs. The
traditional methods [1, 18] are only suitable for certain
simple graphs. DNGs have more data attributes, com- 4 M ETHODS
plex data types, difficult edge recognition, and matching
problems between text and graphics. To solve these 4.1 Dataset
problems, we use the OCR system to preprocess raster Graph data extraction requires a large number of train-
images, use a semantic segmentation network with an ing datasets, but most works usually use small-scale
attention module to locate and classify each pixel, and datasets and keep the data private [6]. This is because
finally use the semantic parsing module to recover the most datasets require manual labeling [62], which re-
DNG. quires high-quality control. The more advanced the
We propose a framework named GraphDecoder, visual coding, the more difficult the annotation. To the
which automatically extracts original data from DNG best of our knowledge, there is no existing DNG dataset
images. Fig. 2 shows the pipeline of our framework. The for data extraction. Therefore, we built our dataset using
framework includes three components: the most popular visualization tools: D3 Library [63]
• Text Detection Module. To improve the performance in JavaScript and Matplotlib [64] and Skimage [65] in
of semantic segmentation, we first extract the text Python. The height and width of each image are defined
data in the chart. Through the OCR system, we as H and W , where H, W ∈ [320, 800]. To improve the
obtain the context and position of the text. We robustness of the model, we also enhanced our dataset,
remove the text area in the image and fill it with including cropping, stretching, adding noise, etc. Finally,
color blocks. Then, we can obtain the DNG image we selected 12, 000 images as the training set, 4, 000
without text. images as the validation set, and 4, 000 images as the test
• Segmentation Neural Network. The segmentation neu- set. Python and JS each draw one-third of the images,
ral network is the core module of our framework. and the remaining are data-augmented images.
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: ASTAR. Downloaded on February 08,2023 at 04:50:38 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Visualization and Computer Graphics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2022.3225554
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 5
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: ASTAR. Downloaded on February 08,2023 at 04:50:38 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Visualization and Computer Graphics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2022.3225554
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 6
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: ASTAR. Downloaded on February 08,2023 at 04:50:38 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Visualization and Computer Graphics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2022.3225554
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 7
Edge Candidate Node Color Edge Color Traverse pixels around Node
(R, G, B) (R, G, B)
Gap Width
(Noise C) Height
Ellipse Candidate
(Noise A)
Coordinate Background Color Edge with only one Edge with two
Rectangle Node Candidate Edge Candidate
(x, y) confidence value (Noise) confidence values
(Noise B) (R, G, B)
Fig. 6. Semantic parsing process. In (a), we determine candidate CCs for edges and nodes by connected component
analysis and obtain (b) by morphological denoising. Thus, the visual attributes of the DNG are obtained through (b).
In (c), we traverse the pixels around the node to calculate the connection confidence and recover the connections.
in Fig. 4). Specifically, the opening operation is image 4.5 Large-Scale Graph Extraction
erosion followed by dilation with a disk structuring
element. The closed operation is the opposite. A disk The GraphDecoder supports general scale (node number
structuring element is similar to the convolution kernel, ≤ 30) DNGs. However, sometimes there are graphs with
and it is a small matrix p used for image processing. more than 30 nodes in the study, and these images often
We define its size as 31 Avg(Areaj ), where Avg(Areaj ) have a resolution greater than 1600×1600. This is a chal-
indicates the average area of all CCs. We obtain the lenging task for chart mining because the dimensions
attributes of edges and nodes from these candidates, as of the input in the semantic segmentation network are
shown in Fig. 6(b). The shape types of nodes (rectangle, stable, and high-resolution images will lose information
ellipse, diamond) are classified by our segmentation during the normalization process.
model. The color, coordinate and size are calculated by
CC. When the node color is different from the back-
ground color (e.g., Fig. 8), we consider the node to be Algorithm 1 Large-Scale Graph Mining Algorithm
solid. Conversely, (e.g., Fig. 2), we judge the node to be Input: CH×W : the large-scale graph image with high resolu-
a hollow node. tion
Output: y ′ : the semantic map of the large-scale graph
Third, we improved the node reconnection algorithm 1: Put CH×W into segmentation neural network
in VividGraph by imitating the way that humans per- ′
2: Obtain the semantic map yentire
ceive DNGs so that our pipelines can cope with non- 3: Cut CH×W into M pieces C m , m = 1, 2, ..., M
straight edges and nodes with various shapes. When 4: Put C m into segmentation neural network
′
5: Obtain the semantic map ym
people observe whether two nodes are connected, they ′ ′
6: Aggregate ym into ypiece
always look for a connecting edge around a node, so 7: Set α ∝ H × W
our heuristic algorithm mimics the process of human 8: for all Pixel
i ′do ′ ′
perception of connection. We consider the edges of three αypiece + (1 − α)yentire , ypiece (i) = 4
9: yi′ = ′ ′ ′
αyentire + (1 − α)ypiece , yentire (i) = 0, 1, 2, 3
different shapes: straight lines, polylines, and curves.
10: end for
They exhibit three common characteristics. First, they 11: return y ′ ;
are all connected to the sides of nodes. Second, they are
all continuous pixels rather than dashed lines. Third,
each edge connects two nodes. According to these com- To overcome this problem, we analyzed the charac-
mon characteristics, we design a general algorithm to teristics of the DNG. In the normalization process, edge
solve the topological analysis of DNGs. We traverse the information is easily lost because the edge occupies a
pixels around each node CC, as shown in Fig. 6. If a pixel few pixels, and the node information is still retained.
i belongs to a certain node, the confidence γpq that the Therefore, we can cut the image into pieces, extract
edge belongs to this node increases, where p, q represent them one by one, and then combine the semantic maps
Node p and Edge q. Since an edge always belongs to two of these pieces for semantic parsing. However, when
nodes, the two nodes with the highest confidence are cutting, the nodes are easily cut into different shapes,
connected. For Edge q, if max(γpq ) or submax(γpq ) is 0, we which causes errors in extraction. Therefore, we combine
consider the edge candidate to be noise, and we delete the semantic map obtained after the normalization of
it. We consider nodes p1 and p2 to be connected, where the whole image and the semantic map obtained after
p1 and p2 are the indices of max(γpq1 ) and submax(γpq2 ). cutting it into pieces for semantic parsing. We set a
We then assign the data obtained by the text detection confidence α ∈ [0.5, 1], which is proportional to the reso-
module to nodes or edges. For each detection result t in lution of the image. The confidence determines the bias
T extArrt with conf idence > 0.95, we find the nearest of the final semantic segmentation result. The details are
node p or edge q and assign t to it. shown in Algorithm 1, where H, W are the height and
The detailed process and pseudocode of the parsing weight of the image and y ′ represents the prediction of
module are attached in the appendix. the pixel category.
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: ASTAR. Downloaded on February 08,2023 at 04:50:38 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Visualization and Computer Graphics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2022.3225554
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 8
Duomo 8/1/17
ticket Date
8/18/17
Uffizi Florence
trip
sport Anna
swimming Holiday family
Julius
transportation
accommodation Finn
adults
plane
apartment
Tickets KLM
child
San Zanobi
trip Florence
Date
8/18/17
Uffizi
ticket
trip Florence
Date
8/18/17
Julius Julius
5 A PPLICATIONS adults
transportation
plane
accommodation Finn
adults
transportation
plane
accommodation
apartment
Finn
In this section, we mainly introduce three applications: (c) Output2 (d) Output3
sketch transformation, raster image extraction and chart
redesign.
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: ASTAR. Downloaded on February 08,2023 at 04:50:38 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Visualization and Computer Graphics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2022.3225554
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022/8/14 13:20 9
6 U SER S TUDY
Hand-drawn
Sketch
To expand the application of GraphDecoder, we con-
ducted a user study. We used system trials, user inter- 70 100 70 100 70 100 70 100 70 100
views and questionnaires to obtain user feedback, which (a) Scoring of extraction results
can improve our system. 2022/8/14 13:34
Procedure: Our user study includes five parts: (1) Manual Extraction Automatic Extraction
informed consent of users, (2) introduction of GraphDe-
Pair 1
coder, (3) system trials, (4) questionnaire, and (5) in-
terview. We first informed the participants that their
Pair 2
answers would be used for our research and then clas-
sified them by age and occupation. Then, we verbally Pair 3
introduced our system and methods to ensure that
every participant could understand the functions and Pair 4
usage of our system. We invited users to implement our
system for data extraction and transformation, then used Pair 5
questionnaires and interviews to obtain user feedback
−100 −50 0 50 100
and evaluate the actual effectiveness of our methods.
(b) Pairwise comparisons
Recruitment: We recruited 60 participants. Six of the Fig. 9. Questionnaire scoring results. (a) is the result of 8
participants were unable to understand our method scoring questions for the extraction results. (b) is the re-
due to capacity limitations, and we ultimately recov- sult of the 5 pairwise comparisons. Pairs 1-5 correspond
ered 54 valid user data (µage =25.9 years; 32 computer to Questions 12-16 in the questionnaire. Circles depict
professionals; 22 noncomputer professionals). Some of group averages (± 95% confidence intervals).
our participants were proficient with computers and
design, such as development engineers, visualization
designers, graphic designers and scientific researchers. Therefore, it is necessary to avoid traditional methods in
Other participants were noncomputer professionals, the pipeline as much as possible in the future.
such as teachers, middle school students, accountants, Pair Collection: Our pairwise comparisons neces-
and homemakers. sitated two groups. (1) Manual-extracted group: The
We recruited 5 additional designers (µage =25 years) designers used their skill for tools such as Photoshop,
working on visualization to collect the pairs in the PowerPoint, Processon, etc. to redraw and redesign
questionnaire for pairwise comparison. DNGs by observing the image. (2) Autoextracted group:
Questionnaire: We set up 13 questions in the ques- The designers used our system to extract and redesign
tionnaire, including 8 scoring questions for the extrac- DNGs.
tion results and 5 pairwise comparisons. We provided five designers with a DNG in bitmap
Scoring Questions: The hand-drawn sketches, mod- form and then asked them to redraw one DNG using
eling graphs, mind maps, and flowcharts each account their tools and another one using our system. We sup-
for a quarter of the scoring questions of the extraction plied a total of 5 DNGs and conducted a total of 25 trials.
results. Participants scored the extraction results based We calculated the time T , the number of left mouse
on five aspects: color, position, node (size, shape), con- clicks Nm , and the number of keyboard strokes Nk ,
nection relationship and text content, with a full score which were used for the two groups. The autoextracted
of 100. Since there are no handwritten data in our text group achieved T of 85 seconds, Nm of 32.4, and Nk
extraction model dataset, the hand-drawn sketch is not of 9. The manual-extracted group achieved T of 1165.8
scored for text. The final result is shown in Fig. 9(a). seconds, Nm of 465.2, and Nk of 424.8. This shows that
From the results, we can determine that our extraction our system reduces designer time for redesigning DNGs.
results can meet most of the needs of users. Most of the For each DNG given, we selected pairs with similar
average scores are above 90. The reason why the posi- and correct results and added them to the pair compar-
tion score is slightly lower than 90 is because we used a ison. There are five pairs in total.
small number of morphological methods in the semantic Pairwise Comparison: For the fairness of the com-
parsing module, which caused the node position to shift. parison, we randomly marked the images of the two
file:///E:/WeChat Files/ssc_55555555/FileStorage/File/2022-08/svg-scoring-result(2)(1).svg
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: ASTAR. Downloaded on February 08,2023 at 04:50:38 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Visualization and Computer Graphics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2022.3225554
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 10
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: ASTAR. Downloaded on February 08,2023 at 04:50:38 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Visualization and Computer Graphics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2022.3225554
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 11
TABLE 1
Evaluation results of semantic segmentation model
Methods FIoU↑ MIoU↑ Back IoU↑ Rect IoU↑ Elli IoU↑ Dia IoU↑ Edge IoU↑
Chen et al. [43] 0.9529 0.5242 0.9773 0.4114 0.5612 0.5435 0.1276
VividGraph [1] 0.9895 0.9407 0.9944 0.9762 0.9859 0.9537 0.7831
Attention 0.9908 0.9479 0.9951 0.9801 0.9876 0.9659 0.8107
Lhybrid 0.9902 0.9441 0.9948 0.9791 0.9862 0.9585 0.8019
Backbone 0.9901 0.9439 0.9947 0.9821 0.9863 0.9659 0.7906
Ours Attention+Lhybrid 0.9920 0.9536 0.9957 0.9866 0.9910 0.9712 0.8236
Attention+Backbone 0.9904 0.9454 0.9948 0.9820 0.9855 0.9689 0.7959
Backbone+Lhybrid 0.9908 0.9465 0.9952 0.9793 0.9881 0.9627 0.8070
Attention+Backbone+Lhybrid 0.9924 0.9562 0.9959 0.9857 0.9908 0.9717 0.8367
also used segmentation to extract other types of graphs. which is a method widely used to compare the similar-
Chen et al. proposed a model [43] to extract the timeline. ity of two network structures. NetSimile is defined as
Since the model does not include the algorithm of the follows:
node connection, we only compare the segmentation n
i i
X
Sigpre − Siggt
modules. For a fair comparison, we train this module N etSim(P re, Gt) =
(6)
i
+
Sig i
Sigpre
on our training dataset. i=1 gt
We conducted ablation experiments on our dataset i
where Sigpre is the signature vector of the extracted
to prove the effectiveness of our contribution to the i
network structure and Siggt is the signature vector of
semantic segmentation model. We made three improve-
the ground truth. The signature vector contains 35-
ments to the model, including adding a module with an
dimensional features, including node in and out degree,
attention mechanism, using a hybrid loss function, and
clustering coefficient, and ego. This method measures
simplifying the backbone of the model. We evaluate our
the similarity of the network structure by comparing the
semantic segmentation model using intersection over
Canberra distance of two signature vectors. The lower
union (IoU), mean IoU (MIoU), and frequency-weighted
NetSimile is, the higher the structural similarity. The
IoU (FIoU), which are defined as follows:
T ′ advantage of using this metric to evaluate structural
y y
IoU = yii S yi′ similarity is that there is no need for one-to-one corre-
i
spondence of node numbers. If a node is not detected,
Pk
1 pii
M IoU = k+1 i=0
Pk
pij +
P k pji −pii (5)
1
j=0
Pk j=0
pii
this method can also correctly measure the similarity
F IoU = P P
,
k
i=0
k
j=0 pij i=0
Pk P k
j=0 pij + j=0 pji −pii
of the two networks. To demonstrate NetSimile more
intuitively, we compare the network structure of the
where pij represents the situation. The ground truth
ground truth with the network structure of the same
is i, and the prediction is j. k is the number of pixel
number of nodes but with no edges to obtain the largest
classifications. We set k = 5 in our task.
NetSimile N etSimmax . Then, the structural similarity
IoU represents the overlap area ratio of the ground-
StruSim is defined as follows:
truth bounding box and predicted bounding box, which
is usually employed for evaluation [80, 81]. Due to N etSim(P re, Gt)
StruSim(P re, Gt) = (1− )×100% (7)
the characteristics of the graphs, the background pixels N etSimmax
account for a large proportion, so MIoU is more suitable
where StruSim ranges from 0 to 100%. StruSim =
for evaluating our model than is FIoU. The experimental
100% means that the two network structures are the
results are shown in Table 1. From the results, we can
same.
determine that the IoU of each category has improved,
The other method is to compare the visual saliency
especially the IoU of the edge. Therefore, our modi-
map of the input images and the redrawn images. The
fications to the model have effectively improved the
information of nodes and links will affect the metrics,
semantic segmentation capabilities of graphs (especially
including size, location, color, etc. We used AntV to
edges). This is because the attention mechanism we
redraw the extracted data into a DNG image. We then
introduced can solve the oversegmentation and under-
used the visual saliency model proposed by Bylinskii
segmentation problems caused by DNG features, as
et al. [23] to output the visual saliency map of the two
shown in Fig. 4.
images that need to be compared. This visual saliency
model can output the important areas of each graph
7.2 Structural and Perceptual Evaluation and assign all pixels a score from 0 to 255. We evaluate
We propose two methods to evaluate the effectiveness of the perceptual similarity by comparing the difference
our extracted results. The first method is NetSimile [22], in the average scores of the two images. Similarly,
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: ASTAR. Downloaded on February 08,2023 at 04:50:38 UTC from IEEE Xplore. Restrictions apply.
G 教程 andAPI
搜索… This article has been accepted for publication in IEEE Transactions on Visualization and Computer Graphics. This is the author's version which has not been fully edited 图
content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2022.3225554
搜索
小提琴图
JavaScript Data
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 12
312 // 小提琴图用 KDE transform 提供数据
场景案例 313
314
const dv1 = new DataView().source(data).transform({
type: 'kde',
OGER with
GraphDecoder VividGraph 315 extent: [-1, 2], // 采样范围
交互语法 Semantic Parsing
316
system does not have a gradient color option, and the
fields,
150%
317 graph is filled with a single color. However, we can find
as: ['key', 'y', 'size'], // 生成的统计字段:字段名、采样点、采样点对应
折线图 100%
318
319
groupBy: ['Species'],
that our methods still correctly identify the connection
minSize: 0.01, // 小于这个值的概率将被忽略
320 });
面积图 50% 321 relationships of these DNGs from StruSim. The graph
// 需要根据分组提供分位值等统计数据,所以提前拍平数据
322 const dv2 = new DataView().source(data);
decoder can handle most DNGs, especially when DNGs
柱状图 0%
323
324
dv2.transform({
include noise, nonsolid nodes, and different edge sizes,
type: 'fold',
325 fields,
条形图 -50% 326 which are all difficult for morphological methods.
});
StruSim PerSim 327 // 计算 95% 分位值,用于画 95% 分位线
In addition, we found that image resolution and struc-
饼图 328 const dv3 = new DataView().source(dv2).transform({
tural complexity are important factors affecting extrac-
Fig. 11. The results of evaluation experiments on a real
点图 data corpus. The circles show the median values. Error
tion accuracy. Therefore, we constructed an additional
set of evaluation data corpora with different resolutions
雷达图 bars depict the interquartile range. and numbers of nodes. To ensure that the evaluation
漏斗图 corpora and the training dataset do not overlap, we
chose AntV as the visualization tool for the additional
热力图 we also divide the difference between this score and
corpus. Each image in the corpus has a different layout,
the ground truth score into a perceptual similarity
箱型图 P erSim ∈ [0, 100%]. P erSim is defined as follows:
different colors, and different node sizes and edge types.
We chose three resolutions of 640 × 640, 960 × 960,
烛形图 |Salipre − Saligt | and 1280 × 1280 and three types of node numbers
P erSim(P re, Gt) = 100% − (8)
仪表盘 Saligt around 10, 20, and 30. We conducted experiments for
comparison with OGER and VividGraph on nine sets of
地图 Saligt is the visual saliency map score of the ground corpora. The experimental data are shown in Fig. 12.
truth, and Salipre is the visual saliency map score of
分面 Our method achieves state-of-the-art results in terms
our extraction results. of all three metrics. OGER with semantic parsing is a
关系图 We collected 100 images from E-Charts, D3, AntV, complete heuristic method, so its results change with
Google, and Xmind, and user hand-drawn sketches
组件使用 the resolution and the number of nodes. Although we
were used as our evaluation dataset. We compared set optimal thresholds for each set of OGER to enable
其他图表 our method with VividGraph [1] and OGER [18] (an smooth binarization and skeletonization, our results are
improved version of OGR [17]). VividGraph does not
大数据量 also superior to it. This is because OGER’s segmentation
have a text extraction module, so we use our text method does not work for hollow nodes. The segmen-
其他图表 detection module to extract texts for fair comparison. tation model of VividGraph causes the phenomena of
VividGraph
2 uses an untuned segmentation model, and oversegmentation and undersegmentation, and the node
its reconnection algorithm is not suitable for the con- reconnection algorithm can only recover the connections
nection of nonstraight edges. Our real dataset contains of straight edges, so StruSim is lower. These results
many DNGs with hollow nodes and nonstraight edges confirm that our methods can extract DNGs of different
(as shown in Fig. 1), which results in the poor StruSim scales.
achieved by VividGraph. OGER has many limitations.
OGER cannot extract texts and classify nodes. It cannot
analyze the shape of nodes and edges, nor can it match 7.3 Time Performance
textual data and graphical data. We added our semantic Our method also consumes less time than OGER and
parsing module to enable a fair comparison. This also VividGraph. We first focus on real-world datasets.
enables OGER to handle connections of nonstraight GraphDecoder spent an average of 3.115 (95% CI:
edges in the dataset. However, compared with end-to- 3.108 − 3.122, p<0.01) seconds on this corpus, while
end methods, OGER still has many limitations in com-
antv.vision/zh/examples/other/other#violin OGER spent an average of 6.585 (95% CI: 6.507 − 6.663,
parison. Since OGER is entirely a heuristic method, we p<0.01) seconds; VividGraph spent an average of 3.184
need to adjust the segmentation threshold for different (95% CI: 3.157−3.210, p<0.01) seconds. Our methods are
resolutions and different scales of network graphs and faster than OGER because our segmentation uses deep
binarization parameters for different colormaps. We also learning methods. This time gap increases with the im-
resize the images to a suitable resolution and set the age resolution. Our time performance is also improved
proper threshold. These operations are not counted in over VividGraph because our semantic parsing module
the time of the OGER. traverses each node: the corresponding time complexity
The evaluation results are shown in Fig. 11. Compared is O(N ). VividGraph traverses every two nodes, with
with VividGraph and OGER, our method achieves the complexity O(N 2 ).
best results for both the StruSim and P erSim metrics. We also evaluated the time performance based on
The results show that our method can extract data additional datasets. As shown in Fig. 12(c), we found
from DNGs in the real world with better quality than that at the same resolution, when the number of nodes
other state-of-the-art methods. The error of P erSim is increases, the average time spent by all three meth-
primarily due to the rich styles of many DNGs in the real ods increases slightly. However, when the resolution
world, such as gradient colors. When we redraw, our increases, the average time consumption of the OGER
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: ASTAR. Downloaded on February 08,2023 at 04:50:38 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Visualization and Computer Graphics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2022.3225554
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 13
2022/8/15 01:21 svg-performance
Number
Method Resolution of Nodes Structural Similarity (a) Perceptual Similarity (b) Time Performance (c)
Node≤10 98.65% 94.00% 2.99s
640×640 Node≤20 98.47% 95.83% 3.06s
Node≤30 98.32% 96.81% 3.14s
Node≤10 99.08% 94.62% 3.04s
GraphDecoder 960×960 Node≤20 99.74% 96.32% 3.02s
Node≤30 99.90% 97.66% 3.11s
Node≤10 98.10% 95.17% 3.03s
1280×1280 Node≤20 99.63% 97.29% 3.10s
Node≤30 99.90% 98.09% 3.11s
Fig. 12. The results of evaluation experiments on additional sets of evaluation data corpora. In (a), we used structural
similarity to evaluate the extraction accuracy of relationships. In (b), we used perceptual similarity to evaluate the
extraction accuracy of various visual attributes. In (c), we show the time spent by each method. The white values
represent the mean values. Error bars depict ± 95% CIs for mean values.
increases significantly, while the average time consump- In our semantic parsing module, we use heuristic
tion of the others remains stable. This illustrates the time methods to analyze semantic maps. We did not use
performance advantage of the deep learning module. deep learning [11, 41, 42] to extract relationships for
In addition, our time performance is better than that two reasons. First, the heuristic method can already
of VividGraph under the same conditions. This again effectively analyze the relationship, as shown in the
proves that our method is more efficient than Vivid- structural evaluation in section 7. Second, we extract
Graph in computing node relations. many attribute values from the semantic maps, includ-
ing relationships, positions, colors, sizes, and shapes. A
8 D ISCUSSION single deep learning model cannot accurately regress too
many attribute values. If we design a model for each
8.1 Deep Learning or Heuristic Methods
attribute, the pipeline will be redundant. For this part,
The popularity of deep learning brings tremendous the heuristic method is simple and effective.
changes to visualization research [82]. We believe that
deep learning should be used in scenarios for which
heuristic methods have limitations. If heuristic methods 8.2 Visualizations or Natural Images
work well, then deep learning is of little significance. There are works [6, 43] which propose chart extraction in
We designed a deep learning model to segment nodes relation to the difference between graphic and natural el-
and edges. Compared with heuristic methods such as ements. Most of the existing CNN models and attention
OGR [17] and OGER [18], deep learning models have mechanisms are designed based on natural images. On
better robustness and time performance. First, heuristic the one hand, they are concerned with large objects such
methods need a suitable threshold for binarization, but as people, trees, and vehicles. On the other hand, they
there is no perfect threshold for the various colormaps segment low-level semantics. For example, a person is
of real-world DNGs to distinguish all targets and back- recognized, but the result is slightly wider and shorter
grounds. Second, deep learning solves the limitation of than the ground truth. This has no effect on the desired
the heuristic method for hollow nodes because heuristic outcome. However, the situation is different for charts,
methods encounter difficulty distinguishing the edges especially DNGs.
and outlines of hollow nodes. Third, existing heuristic Visualization images should be more focused on the
methods for segmenting network graphs do not include ambiguity of basic geometric shapes. DNGs have many
node classification modules. They are only suitable for edges and nodes with various shapes. Since most of
graphs with circular solid nodes. If we attempt to the targets of DNG are basic geometric shapes, they are
complete the node classification, heuristic segmentation easily confused with hollow nodes due to the ambiguity
methods need to add more complex thresholds and caused by edges during segmentation. When design-
rules. However, our deep learning model has integrated ing a segmentation model for visualization images, we
these. suggest paying attention to the undersegmentation and
file:///E:/WeChat Files/ssc_55555555/FileStorage/File/2022-08/svg-performance(1)(1).svg
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: ASTAR. Downloaded on February 08,2023 at 04:50:38 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Visualization and Computer Graphics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2022.3225554
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 14
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: ASTAR. Downloaded on February 08,2023 at 04:50:38 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Visualization and Computer Graphics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2022.3225554
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 15
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: ASTAR. Downloaded on February 08,2023 at 04:50:38 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Visualization and Computer Graphics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2022.3225554
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 16
[44] C. Rother, V. Kolmogorov, and A. Blake, “” grabcut” interactive [67] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-
foreground extraction using iterated graph cuts,” ACM transac- time object detection with region proposal networks,” Advances
tions on graphics (TOG), vol. 23, no. 3, pp. 309–314, 2004. in neural information processing systems, vol. 28, pp. 91–99, 2015.
[45] J. Wu, C. Wang, L. Zhang, and Y. Rui, “Offline sketch parsing [68] B. Shi, X. Bai, and C. Yao, “An end-to-end trainable neural net-
via shapeness estimation,” in Twenty-Fourth International Joint work for image-based sequence recognition and its application to
Conference on Artificial Intelligence, 2015. scene text recognition,” IEEE transactions on pattern analysis and
[46] X.-L. Yun, Y.-M. Zhang, J.-Y. Ye, and C.-L. Liu, “Online hand- machine intelligence, vol. 39, no. 11, pp. 2298–2304, 2016.
written diagram recognition with graph attention networks,” in [69] F. Böschen, T. Beck, and A. Scherp, “Survey and empirical com-
International Conference on Image and Graphics. Springer, 2019, pp. parison of different approaches for text extraction from scholarly
232–244. figures,” Multimedia Tools and Applications, vol. 77, no. 22, pp.
[47] D. Haehn, J. Tompkin, and H. Pfister, “Evaluating ‘graphical per- 29 475–29 505, 2018.
ception’with cnns,” IEEE transactions on visualization and computer [70] P.-T. De Boer, D. P. Kroese, S. Mannor, and R. Y. Rubinstein,
graphics, vol. 25, no. 1, pp. 641–650, 2018. “A tutorial on the cross-entropy method,” Annals of operations
[48] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based research, vol. 134, no. 1, pp. 19–67, 2005.
learning applied to document recognition,” Proceedings of the [71] X. Li, X. Sun, Y. Meng, J. Liang, F. Wu, and J. Li, “Dice loss
IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. for data-imbalanced nlp tasks,” in Proceedings of the 58th Annual
[49] K. Simonyan and A. Zisserman, “Very deep convolutional Meeting of the Association for Computational Linguistics, 2020, pp.
networks for large-scale image recognition,” arXiv preprint 465–476.
arXiv:1409.1556, 2014. [72] R. L. Wilder, “Evolution of the topological concept of “con-
[50] F. Chollet, “Xception: Deep learning with depthwise separable nected”,” The American Mathematical Monthly, vol. 85, no. 9, pp.
convolutions,” in Proceedings of the IEEE conference on computer 720–726, 1978.
vision and pattern recognition, 2017, pp. 1251–1258. [73] R. M. Haralick, S. R. Sternberg, and X. Zhuang, “Image analysis
[51] L. Giovannangeli, R. Bourqui, R. Giot, and D. Auber, “Toward using mathematical morphology,” IEEE transactions on pattern
automatic comparison of visualization techniques: Application analysis and machine intelligence, no. 4, pp. 532–550, 1987.
to graph visualization,” Visual Informatics, 2020. [74] B. Alper, B. Bach, N. Henry Riche, T. Isenberg, and J.-D. Fekete,
[52] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional net- “Weighted graph comparison techniques for brain connectivity
works for semantic segmentation,” in Proceedings of the IEEE analysis,” in Proceedings of the SIGCHI conference on human factors
conference on computer vision and pattern recognition, 2015, pp. 3431– in computing systems, 2013, pp. 483–492.
3440. [75] K.-P. Yee, D. Fisher, R. Dhamija, and M. Hearst, “Animated
[53] V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep exploration of graphs with radial layout,” in Proc. IEEE InfoVis
convolutional encoder-decoder architecture for image segmenta- 2001, 2001, pp. 43–50.
tion,” IEEE transactions on pattern analysis and machine intelligence, [76] C. C. Gramazio, D. H. Laidlaw, and K. B. Schloss, “Colorgorical:
vol. 39, no. 12, pp. 2481–2495, 2017. Creating discriminable and preferable color palettes for informa-
[54] X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural tion visualization,” IEEE transactions on visualization and computer
networks,” in Proceedings of the IEEE conference on computer vision graphics, vol. 23, no. 1, pp. 521–530, 2016.
and pattern recognition, 2018, pp. 7794–7803. [77] C. Cook, F. Heath, R. L. Thompson, and B. Thompson, “Score
[55] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” reliability in webor internet-based surveys: Unnumbered graphic
in Proceedings of the IEEE conference on computer vision and pattern rating scales versus likert-type scales,” Educational and Psycholog-
recognition, 2018, pp. 7132–7141. ical Measurement, vol. 61, no. 4, pp. 697–706, 2001.
[56] O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, [78] M. Liu and F. G. Conrad, “Where should i start? on default values
K. Misawa, K. Mori, S. McDonagh, N. Y. Hammerla, B. Kainz for slider questions in web surveys,” Social Science Computer
et al., “Attention u-net: Learning where to look for the pancreas,” Review, vol. 37, no. 2, pp. 248–269, 2019.
in Proceedings of the 1st Conference on Medical Imaging with Deep [79] F. Chollet et al., “Keras,” https://github.com/keras-team/keras,
Learning (MIDL 2018), 2018, pp. 1–10. 2015.
[57] L. Wang, J. Giesen, K. T. McDonnell, P. Zolliker, and K. Mueller, [80] D. Zhou, J. Fang, X. Song, C. Guan, J. Yin, Y. Dai, and R. Yang,
“Color design for illustrative visualization,” IEEE Transactions on “Iou loss for 2d/3d object detection,” in 2019 International Con-
Visualization and Computer Graphics, vol. 14, no. 6, pp. 1739–1754, ference on 3D Vision (3DV). IEEE, 2019, pp. 85–94.
2008. [81] M. A. Rahman and Y. Wang, “Optimizing intersection-over-union
[58] B. Caldwell, M. Cooper, L. G. Reid, G. Vanderheiden, in deep neural networks for image segmentation,” in International
W. Chisholm, J. Slatin, and J. White, “Web content accessibility symposium on visual computing. Springer, 2016, pp. 234–244.
guidelines (wcag) 2.0,” WWW Consortium (W3C), 2008. [82] Q. Wang, Z. Chen, Y. Wang, and H. Qu, “A survey on ml4vis:
[59] R. Rossi and N. Ahmed, “The network data repository with Applying machinelearning advances to data visualization,” IEEE
interactive graph analytics and visualization,” in Twenty-Ninth Transactions on Visualization and Computer Graphics, 2021.
AAAI Conference on Artificial Intelligence, 2015. [83] L. Yuan, W. Zeng, S. Fu, Z. Zeng, H. Li, C.-W. Fu, and H. Qu,
[60] D. H. Kim, E. Hoque, and M. Agrawala, “Answering questions “Deep colormap extraction from visualizations,” IEEE Transac-
about charts and generating visual explanations,” in Proceedings tions on Visualization and Computer Graphics, pp. 1–1, 2021.
of the 2020 CHI Conference on Human Factors in Computing Systems, [84] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan,
2020, pp. 1–13. P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects
[61] C. Lai, Z. Lin, R. Jiang, Y. Han, C. Liu, and X. Yuan, “Automatic in context,” in European conference on computer vision. Springer,
annotation synchronizing with textual description for visualiza- 2014, pp. 740–755.
tion,” in Proceedings of the 2020 CHI Conference on Human Factors [85] W. S. Cleveland, The elements of graphing data. Wadsworth Publ.
in Computing Systems, 2020, pp. 1–13. Co., 1985.
[62] B. Saleh, M. Dontcheva, A. Hertzmann, and Z. Liu, “Learning
style similarity for searching infographics,” in Proceedings of the
41st Graphics Interface Conference, 2015, pp. 59–64.
[63] M. Bostock, V. Ogievetsky, and J. Heer, “D3 data-driven docu- Sicheng Song received his B.Eng. from
ments,” IEEE transactions on visualization and computer graphics, Hangzhou Dianzi University, China, in 2019.
vol. 17, no. 12, pp. 2301–2309, 2011. He is working toward the Ph.D. degree with
[64] J. D. Hunter, “Matplotlib: A 2d graphics environment,” IEEE East China Normal University, Shanghai, China.
Annals of the History of Computing, vol. 9, no. 03, pp. 90–95, 2007. His main research interests include information
[65] S. Van der Walt, J. L. Schönberger, J. Nunez-Iglesias, F. Boulogne, visualization and visual analysis.
J. D. Warner, N. Yager, E. Gouillart, and T. Yu, “scikit-image:
image processing in python,” PeerJ, vol. 2, p. e453, 2014.
[66] Z. Tian, W. Huang, T. He, P. He, and Y. Qiao, “Detecting text
in natural image with connectionist text proposal network,” in
European conference on computer vision. Springer, 2016, pp. 56–72.
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: ASTAR. Downloaded on February 08,2023 at 04:50:38 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Visualization and Computer Graphics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2022.3225554
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 17
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: ASTAR. Downloaded on February 08,2023 at 04:50:38 UTC from IEEE Xplore. Restrictions apply.