0% found this document useful (0 votes)
2 views6 pages

The Implementation Solution for Automatic Visualization of Tabular Data in Relational Databases Based on Large Language Models (1)

This document discusses an implementation solution for automatic visualization of tabular data in relational databases using large language models. The process involves generating SQL queries based on user descriptions, determining chart types, and mapping data to visual channels, utilizing the Chain-of-Thought technique for improved reasoning. The study evaluates the effectiveness of this approach on the nvBench dataset, demonstrating its potential for enhancing data visualization accessibility.

Uploaded by

carter TLC
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views6 pages

The Implementation Solution for Automatic Visualization of Tabular Data in Relational Databases Based on Large Language Models (1)

This document discusses an implementation solution for automatic visualization of tabular data in relational databases using large language models. The process involves generating SQL queries based on user descriptions, determining chart types, and mapping data to visual channels, utilizing the Chain-of-Thought technique for improved reasoning. The study evaluates the effectiveness of this approach on the nvBench dataset, demonstrating its potential for enhancing data visualization accessibility.

Uploaded by

carter TLC
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

The implementation solution for automatic

visualization of tabular data in relational databases


based on large language models
2024 International Conference on Asian Language Processing (IALP) | 979-8-3315-4085-2/24/$31.00 ©2024 IEEE | DOI: 10.1109/IALP63756.2024.10661162

Hao Yang Zhaoyong Yang


Beijing Advanced Innovation Center for language Resources Beijing Advanced Innovation Center for language Resources
Beijing Language and Culture University Beijing Language and Culture University
Beijing, China Beijing, China
[email protected] yangzhaoyong [email protected]

Ruyang Zhao Xiaoran Li


Beijing Advanced Innovation Center for language Resources Beijing Advanced Innovation Center for language Resources
Beijing Language and Culture University Beijing Language and Culture University
Beijing, China Beijing, China
[email protected] [email protected]

Gaoqi Rao
Research Institute of International Chinese Language Education
Beijing Language and Culture University
Beijing, China
[email protected]

Abstract—In the data analysis process, visualized data can Power BI [1], Redash [23], Tableau [24], etc. Users can utilize
help users gain better insights. To make it easier and faster these software tools to visualize their data through interface
for users to obtain visual charts from data, natural language selections, drag-and-drop operations, and other interactions.
interfaces for data visualization have emerged. Users only need
to provide the visualization model with the data to be visualized Although these tools can help users visualize tabular data,
and a description of their visualization needs, and the model will they require users to have professional data analysis skills and
return a visual chart(NL2VIS). In real-world scenarios, most visualization knowledge, which creates a relatively high barrier
data is stored in relational databases. To visualize this data, to entry. Therefore, the academic and industrial communities
it is first necessary to generate a structured query statements have begun researching natural language interfaces for tabular
based on the user’s visualization requirements(NL2SQL), and
then proceed with the subsequent visualization operations. This data visualization, aiming to achieve automatic visualization
study breaks down the task of automatic visualization of tabular of tabular data. The goal is to automatically generate a chart
data in relational databases into three main steps: generating based on the user’s natural language description.
SQL, determining the chart type, and mapping data to visual Research on automatic visualization of tabular data has gen-
channels. We utilize the Chain-of-Thought(CoT) technique of erally gone through three main stages: rule-based stage [2], [5],
generative large language models to address the task of automatic
visualization of tabular data. Finally, we evaluated our approach deep neural network-based stage [8], [9], and large language
on the nvBench dataset, and the results show that CoT-based model-based stage [12], [15]. Its generalization ability and ro-
automatic visualization of tabular data performs well. bustness have gradually increased with the iterative upgrading
Index Terms—NL2VIS, NL2SQL, Chain-of-Thought, Large of technology. From the perspective of model output, previous
Language Model research on automatic visualization of tabular data can mainly
be divided into two categories. One is to output executable
I. I NTRODUCTION
visualization language scripts, and the other is to output
The visualization of tabular data is extremely important in abstract expressions. The method of outputting visualization
data analysis. A good chart not only clearly presents data char- language scripts directly utilizes generative language models
acteristics but also helps users gain insights into data patterns. to generate target code, such as Vega-Lite [13], Python [11],
To assist users in automatically visualizing tabular data, a etc. The method of outputting abstract expressions mainly
series of commercial software solutions have emerged, such as involves the model generating a predefined visualization query
statement, such as Vega-Zero (as is illustrated in the Fig. 1)
Supported by NSFC(No. 62076038), achievements of the Project of In-
telligent International Chinese Education at Beijing Language and Culture [9], DVQ (as is illustrated in the Fig. 2) [21], etc., which is
University then converted into a visualization language program. Unlike

979-8-3315-4085-2/24/$31.00 ©2024 IEEE 175


Authorized licensed use limited to: Hong Kong University of Science and Technology. Downloaded on July 29,2025 at 08:29:39 UTC from IEEE Xplore. Restrictions apply.
these two mainstream approaches, in this study, we need the [23] have emerged successively. Users can utilize these tools
model to output SQL. for data visualization, but they often require a significant
amount of manual operation. Subsequently, both academic
and industry began researching natural language interfaces for
data visualization, making user operations more convenient.
These natural language interfaces enable users, even those
lacking data analysis and visualization experience, to generate
charts through language descriptions. The research on natural
language interfaces for visualization has gone through three
stages: rule-based methods, such as NL4DV [2], Orko [3],
Eviza [4], FlowSense [5], DataTone [6]; deep neural network-
based methods, such as ADVisor [7], Seq2vis [8], ncNet [9],
RGVisNet [10]; and generative large language model based
methods, such as Chat2Vis [11], Prompt4Vis [12], Mirror [13],
ChartGPT [14], and LIDA [15]. At present, the development
of automated data visualization has progressed to a stage
Fig. 1: Vega-Zero abstract statement expression. where large-scale models are used to solve problems. Based
on large language models, two mainstream solutions have
emerged: directly generating target code and visualization
abstract expressions. For example, Chat2Vis [11] directly
generates Python programs, while Prompt4Vis [12] generates
visualization abstract expressions.To further improve the ac-
curacy of data visualization, researchers tend to decompose
visualization tasks into several sub-tasks, with each subtask
responsible for solving a specific problem. The results of
each subtask are then inputted into subsequent sub-tasks. For
example, both ChartGPT [14] and Prompt4Vis [12] decompose
the visualization task into sub-tasks and then solve each
sub-task one by one, ultimately achieving good results in
completing the visualization task.
Large language models possess powerful In-Contextual
Learning abilities [16]. In order to further enhance their
Fig. 2: Data Visualization Query(DVQ) abstract statement reasoning capabilities, researchers have investigated prompt
expression. techniques such as zero-shot [17] and few-shot [16]. Sub-
sequently, researchers discovered a Chain-of-Thought [18]
To visualize tabular data in relational databases, data ac- prompt approach, which mimics the human thinking process
quisition is the first step. Retrieving data from relational by solving problems step by step. In academia, based on the
databases relies on the execution of SQL statements by the chain-of-thought, other variants have been studied, such as
database engine. Therefore, to visualize tabular data in rela- Contrastive Chain-of-Thought [19], and Least-to-Most [20],
tional databases, we must first generate SQL queries based etc.
on the user’s natural language description. Subsequently, we
proceed with chart selection and mapping data to visual III. D EFINITION OF P ROBLEM
channels. When solving problems using prompt-based techniques with
Large language models like GPT-3.5 have demonstrated generative large language models, users need to provide the
powerful text generation and semantic understanding capabil- model with a prompt for problem solving. The model then
ities, achieving state-of-the-art performance in many down- performs inference based on the prompt provided by the user
stream tasks. Prompt techniques using large language models, and generates a response to the problem. We define the large
such as CoT, imitate human problem-solving approaches by language model as LLM , the prompt provided by the user
step-by-step reasoning, and have been widely researched by as P , and the model’s response as A. This process can be
scholars. In this study, we will utilize CoT prompt techniques formalized as follows:
to address the task of automatic visualization of tabular data
LLM (P ) → A
in relational databases (CoT-VIS).
The automatic visualization of tabular data is the process by
II. R ELATED W ORK which a model provides users with a chart based on natural
In order to facilitate data analysis and visualization, a series language descriptions and database schema information. Gen-
of tools such as Power BI [1], Tableau [24], and Redash erating a chart typically requires three types of information:

2024 International Conference on Asian Language Processing (IALP) 176


Authorized licensed use limited to: Hong Kong University of Science and Technology. Downloaded on July 29,2025 at 08:29:39 UTC from IEEE Xplore. Restrictions apply.
the chart type, the data, and the mapping between the data A. Prompt Strategies for Large Language Models
and visual channels. Therefore, it is necessary to assemble the
Large language models can leverage their ICL abilities to
user’s natural language descriptions and the database schema
respond to user queries based on the provided prompt text.
information into a prompt as input to the LLM. The data
Depending on the number of examples given by the user,
objects in this study are tabular data in relational databases.
prompt strategies can be categorized into two types: zero-shot
The model needs to provide structured query statements, chart
and few-shot styles. Zero-shot prompts do not provide the
types, and mappings between data and visual channels based
model with question-answer examples, requiring the model
on the user’s natural language descriptions and the database
to respond directly to the query. Few-shot prompts, on the
schema information. The process of automatic visualization
other hand, provide the model with several question-answer
of tabular data in relational databases is illustrated in the
examples, which can further enhance the accuracy of the
Fig. 3. We define the user’s natural language descriptions as
model’s responses.
N L, the database schema information as D, structured query
The user provides the model with question-answer pairs in
statements as SQL, chart types as CHART , and the mapping
the form < q, a >. Here, q represents the user’s question
between data and visual channels as M AP . This process can
along with any additional information needed to solve the
be formalized as follows:
problem, and a represents the model’s response. Typically,
LLM (P (N L, D)) → {SQL, CHART, M AP } prompts in the < q, a > format are referred to as standard-
prompt. Researchers have discovered that by adding reasoning
steps r, prompting the model to derive the answer step by
step, the accuracy of the model’s responses can be significantly
improved. Consequently, researchers proposed the < q, r, a >
format prompt and named it Chain-of-Thought(CoT) prompt.
Usually, users need to manually create several CoT examples
to enable the model to mimic this format to generate answers.
This type of prompt is known as few-shot CoT prompting.
Additionally, there is a method that does not require manually
crafted prompts: by providing the model with ”Let’s think step
by step.”, the model is induced to produce CoT responses. This
is referred to as zero-shot CoT prompting. Empirical evidence
Fig. 3: The workflow of automatic visualization of tabular data shows that manually few-shot CoT prompts outperform zero-
in relational databases . shot CoT prompts in terms of performance.CoT prompt breaks
down problems step by step, solving them gradually, which
enhances problem-solving capabilities compared to standard
IV. S OLUTION OF C OT-VIS prompts.
To enhance the performance of large language models
on downstream tasks, there are typically two mainstream B. CoT-VIS
approaches: first, fine-tuning the model on a dataset specific to As previously mentioned, visualizing tabular data in a
the downstream task; second, constructing prompts to leverage relational database involves three major steps: generating
the model’s In-Context Learning(ICL) ability to solve prob- SQL, determining the chart type, and mapping data to visual
lems. Fine-tuning a model demands significant computational channels. We will now detail these three steps, followed by
power and high-quality downstream task datasets, making it developing a CoT prompt based on this process.
highly resource-intensive. In contrast, prompt engineering is 1) Generating SQL: The model needs to generate the
simple and easy to implement, as it does not require updating corresponding SQL based on the user’s natural language
the model’s parameters. A well-crafted prompt can guide the description to query data. Unlike traditional NL2SQL tasks,
model to generate highly accurate answers. Prompt engineer- in a visualization task, it is not only necessary to identify the
ing is currently a subject of extensive research in the academic correct data columns and tables but also to perform appropriate
community and has already achieved remarkable success in data transformations on the data columns, such as data binning.
tasks such as commonsense reasoning, mathematical problem- a) Step 1: Determining Data Columns: After seman-
solving, and symbolic reasoning [18]. tically understanding the user’s natural language description
Next, we will elaborate on how to use prompt techniques and the database schema, the language model determines the
in large language models to address the task of automatic data columns that need to be queried. Assuming there are
visualization of tabular data in relational databases. First, we n data columns {column1 , column2 , . . . , columnn } in the
will summarize the current prompt strategies employed to database, this process will ultimately yield an intermediate
enhance the reasoning capabilities of large language models. SQL expression in the following form:
Then, we will select an appropriate solution specifically for
the task of automatic visualization of tabular data. SELECT columni | columnj |...

2024 International Conference on Asian Language Processing (IALP) 177


Authorized licensed use limited to: Hong Kong University of Science and Technology. Downloaded on July 29,2025 at 08:29:39 UTC from IEEE Xplore. Restrictions apply.
Fig. 4: Chain-of-thought deduction process for automatic visualization of tabular data in relational databases.

b) Step 2: Determining Data Tables: Based on step 1, de- afternoon’


termine the data tables according to the selected data columns. WHEN strftime(’%H’, timestamp)
If the data columns come from a single table Ti , then the target BETWEEN ’18’ AND ’23’ THEN ’
data T = Ti . If the selected columns come from multiple evening’
tables {T1 , T2 , . . . , Tn }, it is necessary to perform multi-table ELSE ’unknown’
joins based on foreign key information. This process will END AS bucket
generate an intermediate SQL expression in the following FROM
form: T;
SELECT columni | columnj |... Almost all database engines support basic binning operations
FROM T for time or numeric data. After the data has been binned, we
will obtain the following intermediate SQL:
where T = Ti | JOIN (Ti , Tj , . . . | F oreign Keys).
c) Step 3: Data Transformation: To determine whether SELECT columni | columnj |...
transformation operations are needed for the selected columns, FROM T
primarily data binning operations. Many previous works re- where, columni ∈ {columni , BIN (columni )}, columnj ∈
lied on the data binning functionality inherent in frontend {columnj , BIN (columnj )} . . .
visualization frameworks (such as vega-lite). In this study, d) Step 4: Data Filtering: Based on the user’s natural
data is obtained through generating SQL queries, and SQL language description, determine the data filtering conditions
itself supports data transformation operations. Specifically, to obtain the data that meets the user’s requirements. In SQL,
data binning operations are typically implemented using the data filtering is done through the ` WHERE ` clause. After
` CASE ` expression in SQL. If there is a field named this step, we will obtain an intermediate SQL in the following
timestamp, representing timestamps, and the user requires data form (` Cond `represents the condition filtering operation):
to be binned based on early morning, morning, afternoon, and
evening time periods, the ` CASE ` expression can be used SELECT columni | columnj |...
to achieve this: FROM T
WHERE Cond(columni ) | Cond(columnj )|...
SELECT
CASE where, columni ∈ {columni , BIN (columni )}, columnj ∈
WHEN strftime(’%H’, timestamp) {columnj , BIN (columnj )} . . .
e) Step 5: Data Group By: Determine whether to per-
BETWEEN ’00’ AND ’05’ THEN ’
form a grouping operation (GROUP BY) based on certain data
early morning’
columns. After binning and grouping operations, the data will
WHEN strftime(’%H’, timestamp)
be divided into different groups based on values. After this
BETWEEN ’06’ AND ’11’ THEN ’
step, we will obtain an intermediate SQL in the following
morning’
form:
WHEN strftime(’%H’, timestamp)
BETWEEN ’12’ AND ’17’ THEN ’ SELECT columni | columnj |...

2024 International Conference on Asian Language Processing (IALP) 178


Authorized licensed use limited to: Hong Kong University of Science and Technology. Downloaded on July 29,2025 at 08:29:39 UTC from IEEE Xplore. Restrictions apply.
FROM T V. E XPERIMENT
GROUP BY columni | columnj |... a) Dataset: We chose nvBench [8] as the evaluation
where, columni ∈ {columni , BIN (columni )}, columnj ∈ dataset, widely used in the field of data visualization. nvBench
{columnj , BIN (columnj )} . . . is a large dataset designed for complex and cross-domain
f) Step 6: Data Aggregation: After dividing the data into NL2VIS tasks, covering 105 domains, supporting seven com-
different groups, it is common to perform aggregation (AGG) mon types of visualizations (Bar, Line, Scatter, Pie, Stacked
operations on the data within the same group. Basic column Bar, Grouping Line, Grouping Scatter) and containing 25,750
aggregation operations include SUM(), COUNT(), AVG(), (NL, VIS) pairs. Following the experimental validation method
etc. After the data aggregation operation, we will obtain the outlined in the Prompt4Vis paper [12], we randomly selected
following intermediate SQL: 141 databases from nvBench and divided them into training,
validation, and test sets in a ratio of 7:2:1. Specifically, the
SELECT columni | columnj |... training set contains 98 databases, the validation set contains
FROM T 14 databases, and the test set contains 29 databases.
GROUP BY columni | columnj |... b) Model and Methods: In this experiment, we selected
the GPT-3.5-turbo model interface provided by OpenAI and set
where,
temperature=0. The comparative methods include: zero-shot
columni ∈ {columni , BIN (columni ), AGG(columni )}, prompt, few-shot prompt, zero-shot-CoT prompt, and few-
shot-CoT prompt. In the zero-shot prompt method, the model
columnj ∈ {columnj , BIN (columnj ), AGG(columnj )}
directly responds based on the provided database schema in-
formation and user natural language description. The few-shot
prompt method requires manually writing several examples to
g) Step 7: Data Order By: The final step in generating provide to the model before letting it respond. The Zero-shot-
the target SQL is to determine whether certain columns need CoT prompt method adds ”Let’s think step by step.” to induce
to be sorted in ascending or descending order. In visualization the model to generate chain-of-thought responses based on the
charts, it is very common to sort by the names on the x-axis, zero-shot prompt. The few-shot-CoT prompt method involves
for example. After this step, we will obtain the final SQL manually writing seven chain-of-thought examples (one for
expression: each type of chart) to provide to the model for its response.
SELECT columni | columnj |... c) Metrics: This study comprehensively evaluates the
FROM T data accuracy (data acc), axis accuracy(axis acc), chart accu-
GROUP BY columni | columnj |... racy(chart acc), and overall accuracy(overall acc) of the data
ORDER BY columni | columnj |...ASC|DESC visualization system. Data accuracy refers to the execution
where, accuracy of the SQL predicted by the model, calculated as the
proportion of correctly executed results to the total number of
columni ∈ {columni , BIN (columni ), AGG(columni )}, results. Axis accuracy is the accuracy of mapping data to visual
columnj ∈ {columnj , BIN (columnj ), AGG(columnj )} channels, calculated similarly as a proportion. Chart accuracy
compares the predicted chart type by the model with the gold
chart, also calculated as a proportion.
2) Determining the Type of Chart: The language model Method Data acc Axis acc Chart acc Overall acc
will determine the type of chart based on the user’s natural zero-shot 0.304 0.371 0.759 0.217
language description and the generated SQL. Our data visu- few-shot 0.501 0.497 0.928 0.326
zero-shot-CoT 0.490 0.494 0.939 0.274
alization system supports seven types of charts: Scatter, Pie, few-shot-CoT 0.559 0.810 0.975 0.490
Bar, Stacked Bar, Line, Grouping Scatter, and Grouping Line.
3) Mapping: Mapping the target fields of the SQL query TABLE I: Experiment result
to the visual channels of the data chart. For example, a bar
chart has two dimensions: the x-axis and the y-axis. If the d) Experiment Result: Experimental results show that
SQL query fields are column1 and column2 , where column1 with zero-shot and zero-shot-CoT prompt methods, large lan-
corresponds to the x-axis and column2 corresponds to the guage models struggle with the automatic visualization of
y-axis of the bar chart. After reasoning based on contextual tabular data, exhibiting very low accuracy. Providing the model
information, the language model will obtain: {‘‘x-axis’’: with a few examples using few-shot prompts can improve the
‘‘column1 ’’, ‘‘y-axis’’: ‘‘column2 ’’}. model’s accuracy to some extent. Our designed few-shot-CoT
After generating SQL, determining the chart type, and prompts significantly enhance the model’s response accuracy.
mapping the visual channels, the model will output all the This proves that the chain-of-thought prompts designed for the
information needed for visualization through chain-of-thought automatic visualization of tabular data in relational databases
reasoning. The complete deductive process is illustrated in the are highly effective. However, we must also note that there
Figure 4. is still considerable room for improvement in accuracy, which

2024 International Conference on Asian Language Processing (IALP) 179


Authorized licensed use limited to: Hong Kong University of Science and Technology. Downloaded on July 29,2025 at 08:29:39 UTC from IEEE Xplore. Restrictions apply.
inspires us to design more sophisticated deduction algorithms [13] Canwen Xu, Julian McAuley, and Penghan Wang. 2023. Mirror: A
to further enhance accuracy. Natural Language Interface for Data Querying, Summarization, and
Visualization. In Companion Proceedings of the ACM Web Conference
2023. 49–52.
VI. C ONCLUSION [14] Yuan Tian, Weiwei Cui, Dazhen Deng, Xinjing Yi, Yurun Yang, Haidong
This study divides the task of automatic visualization of Zhang, andYingcai Wu.2024. Chartgpt: Leveraging llms to generate
charts from abstract natural language. IEEE Transactions on Visualiza-
tabular data in relational databases into three main steps: tion and Computer Graphics (2024).
generating SQL, determining the chart type, and mapping data [15] Victor Dibia. 2023. LIDA: A Toolfor Automatic Generation of
to visual channels. Using the Chain-of-Thought technique of Grammar-Agnostic Visualizations and Infographics using Large Lan-
guage Models. In Proceedings of the 61st Annual Meeting of the
large language models, we perform step-by-step reasoning for Association for Computational Linguistics (Volume 3: System Demon-
these three steps. Experimental validation demonstrates that strations), Danushka Bollegala, Ruihong Huang, and Alan Ritter (Eds.).
the Chain-of-Thought technique can be effectively applied to Association for Computational Linguistics, Toronto, Canada, 113–126.
https://doi.org/10.18653/v1/2023.acl-demo.11
the task of automatic data visualization, significantly improv- [16] Brown T, Mann B, Ryder N, et al. Language models are few-shot
ing its accuracy. However, we should also note that there is learners[J]. Advances in neural information processing systems, 2020,
still considerable room for improvement in the accuracy of 33: 1877-1901.
[17] Wei J, Bosma M, Zhao V Y, et al. Finetuned language models are zero-
automatic data visualization tasks, and we will continue to shot learners[J]. arXiv preprint arXiv:2109.01652, 2021.
explore ways to enhance this accuracy. Additionally, for the [18] Wei J, Wang X, Schuurmans D, et al. Chain-of-thought prompting elicits
task of automatic visualization of tabular data, the only eval- reasoning in large language models[J]. Advances in neural information
processing systems, 2022, 35: 24824-24837.
uation dataset available is nvBench. We look forward to the [19] Chia Y K, Chen G, Tuan L A, et al. Contrastive chain-of-thought
academic community producing more high-quality evaluation prompting[J]. arXiv preprint arXiv:2311.09277, 2023.
datasets. [20] Zhou D, Schärli N, Hou L, et al. Least-to-most prompting en-
ables complex reasoning in large language models[J]. arXiv preprint
arXiv:2205.10625, 2022.
R EFERENCES [21] Luo Y, Qin X, Tang N, et al. Deepeye: Towards automatic data visual-
[1] Louis T Becker and Elyssa M Gould. 2019. Microsoft power BI: ization[C]//2018 IEEE 34th international conference on data engineering
extending excel to manipulate, analyze, and visualize diverse data. (ICDE). IEEE, 2018: 101-112.
Serials Review 45, 3 (2019), 184–188. [22] Satyanarayan A, Moritz D, Wongsuphasawat K, et al. Vega-lite: A
[2] Arpit Narechania, Arjun Srinivasan, and John Stasko. 2020. NL4DV: grammar of interactive graphics[J]. IEEE transactions on visualization
A toolkit for generating analytic specifications for data visualization and computer graphics, 2016, 23(1): 341-350.
from natural language queries. IEEE Transactions on Visualization and [23] Leibzon A, Leibzon Y. Redash V5 Quick Start Guide: Create and Share
Computer Graphics 27, 2 (2020), 369–379. Interactive Dashboards Using Redash[M]. Packt Publishing Ltd, 2018.
[3] Arjun Srinivasan and JohnStasko.2017.Orko:Facilitating multimodal in- [24] Batt S, Grealis T, Harmon O, et al. Learning Tableau: A data visu-
teraction for visual exploration and analysis of networks. IEEE transac- alization tool[J]. The Journal of Economic Education, 2020, 51(3-4):
tions on visualization and computer graphics 24, 1 (2017), 511–521. 317-328.
[4] Vidya Setlur, Sarah E Battersby, Melanie Tory, Rich Gossweiler, and
Angel X Chang. 2016. Eviza: A natural language interface for visual
analysis. In Proceedings of the 29th annual symposium on user interface
software and technology. 365–377.
[5] Bowen Yu and Cláudio T Silva. 2019. FlowSense: A natural language
interface for visual data exploration within a dataflow system. IEEE
transactions on visualization and computer graphics 26, 1 (2019), 1–11.
[6] Tong Gao, Mira Dontcheva, Eytan Adar, Zhicheng Liu, and Karrie G
Karahalios. 2015. Datatone: Managing ambiguity in natural language
interfaces for data visualization. In Proceedings of the 28th annual acm
symposium on user interface software & technology. 489–500.
[7] Can Liu, Yun Han, Ruike Jiang, and Xiaoru Yuan. 2021. Advisor:
Automatic visualization answer for natural-language question on tabular
data. In 2021 IEEE 14th Pacific Visualization Symposium (PacificVis).
IEEE, 11–20.
[8] Yuyu Luo, Nan Tang, Guoliang Li, Chengliang Chai, Wenbo Li,
and Xuedi Qin. 2021. Synthesizing natural language to visualization
(NL2VIS) benchmarks from NL2SQL benchmarks. In Proceedings of
the 2021 International Conference on Management of Data. 1235–1247.
[9] Yuyu Luo, Nan Tang, Guoliang Li, Jiawei Tang, Chengliang Chai, and
Xuedi Qin. 2021. Natural language to visualization by neural machine
translation. IEEE Transactions on Visualization and Computer Graphics
28, 1 (2021), 217–226.
[10] Yuanfeng Song, Xuefang Zhao, Raymond Chi-Wing Wong, and Di
Jiang. 2022. Rgvisnet: A hybrid retrieval-generation neural framework
towards automatic data visualization generation. In Proceedings of the
28th ACM SIGKDD Conference on Knowledge Discovery and Data
Mining. 1646–1655.
[11] Paula Maddigan and Teo Susnjak. 2023. Chat2vis: Generating data
visualisations via natural language using chatgpt, codex and gpt-3 large
language models. Ieee Access (2023).
[12] Shuaimin Li, Xuanang Chen, Yuanfeng Song, Yunze Song, and Chen
Zhang. 2024. Prompt4Vis: Prompting Large Language Models with
Example Mining and Schema Filtering for Tabular Data Visualization.
arXiv preprint arXiv:2402.07909 (2024).

2024 International Conference on Asian Language Processing (IALP) 180


Authorized licensed use limited to: Hong Kong University of Science and Technology. Downloaded on July 29,2025 at 08:29:39 UTC from IEEE Xplore. Restrictions apply.

You might also like