The Implementation Solution for Automatic Visualization of Tabular Data in Relational Databases Based on Large Language Models (1)
The Implementation Solution for Automatic Visualization of Tabular Data in Relational Databases Based on Large Language Models (1)
Gaoqi Rao
Research Institute of International Chinese Language Education
Beijing Language and Culture University
Beijing, China
[email protected]
Abstract—In the data analysis process, visualized data can Power BI [1], Redash [23], Tableau [24], etc. Users can utilize
help users gain better insights. To make it easier and faster these software tools to visualize their data through interface
for users to obtain visual charts from data, natural language selections, drag-and-drop operations, and other interactions.
interfaces for data visualization have emerged. Users only need
to provide the visualization model with the data to be visualized Although these tools can help users visualize tabular data,
and a description of their visualization needs, and the model will they require users to have professional data analysis skills and
return a visual chart(NL2VIS). In real-world scenarios, most visualization knowledge, which creates a relatively high barrier
data is stored in relational databases. To visualize this data, to entry. Therefore, the academic and industrial communities
it is first necessary to generate a structured query statements have begun researching natural language interfaces for tabular
based on the user’s visualization requirements(NL2SQL), and
then proceed with the subsequent visualization operations. This data visualization, aiming to achieve automatic visualization
study breaks down the task of automatic visualization of tabular of tabular data. The goal is to automatically generate a chart
data in relational databases into three main steps: generating based on the user’s natural language description.
SQL, determining the chart type, and mapping data to visual Research on automatic visualization of tabular data has gen-
channels. We utilize the Chain-of-Thought(CoT) technique of erally gone through three main stages: rule-based stage [2], [5],
generative large language models to address the task of automatic
visualization of tabular data. Finally, we evaluated our approach deep neural network-based stage [8], [9], and large language
on the nvBench dataset, and the results show that CoT-based model-based stage [12], [15]. Its generalization ability and ro-
automatic visualization of tabular data performs well. bustness have gradually increased with the iterative upgrading
Index Terms—NL2VIS, NL2SQL, Chain-of-Thought, Large of technology. From the perspective of model output, previous
Language Model research on automatic visualization of tabular data can mainly
be divided into two categories. One is to output executable
I. I NTRODUCTION
visualization language scripts, and the other is to output
The visualization of tabular data is extremely important in abstract expressions. The method of outputting visualization
data analysis. A good chart not only clearly presents data char- language scripts directly utilizes generative language models
acteristics but also helps users gain insights into data patterns. to generate target code, such as Vega-Lite [13], Python [11],
To assist users in automatically visualizing tabular data, a etc. The method of outputting abstract expressions mainly
series of commercial software solutions have emerged, such as involves the model generating a predefined visualization query
statement, such as Vega-Zero (as is illustrated in the Fig. 1)
Supported by NSFC(No. 62076038), achievements of the Project of In-
telligent International Chinese Education at Beijing Language and Culture [9], DVQ (as is illustrated in the Fig. 2) [21], etc., which is
University then converted into a visualization language program. Unlike