Sequential_Feature_Augmentation_for_Robust_Text-to-SQL
Sequential_Feature_Augmentation_for_Robust_Text-to-SQL
218
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on January 10,2025 at 13:00:11 UTC from IEEE Xplore. Restrictions apply.
select number of high-voltage users
from workload of meter readers
Selection
How many high-voltage users
Augmentation
Augmentation
Sequential
Sequential
Decoder
Decoder
Encoder
Encoder
Feature
Feature
Selection
Column
prelimanary subsequent
encoding encoding decoding
pooling
augmentation
D+
feature
D-
D-
D+
close subsequent
pooling
encoding
distant
Sequential Feature Augmentation
neural estimator called MINE, which is a novel method for III. M ETHOD
estimating mutual information designed based on gradient
descent in the corresponding neural network. There are In this paper, we propose a novel approach called Sequen-
also application cases of mutual information maximization tial Feature Augmentation Method (SFAM) for Text-to-SQL.
theory in graph neural networks and computer vision. DGI In SFAM, we designed a sequential feature augmentation
was proposed by Veliković et al.[21], which maximizes the module between the encoder and the decoder, which can
mutual information between the node characteristics and effectively improve the robustness of the Text-to-SQL model
the whole graph representation for better representations through constraints based on sequential consistency learning
of graph nodes. Ren et al.[22] maximizes the use of mu- and mutual information maximization methods. The overall
tual information for heterogeneous graph. Zhang et al.[23] architecture of SFAM is shown in the Figure1.
designed a pooling method based on maximizing mutual
information in cross-modal retrieval, which can effectively A. Step-by-step Text-to-SQL Model
maintain the information of the original map nodes in the Our work is based on the work of Shen et al. (SPSQL) [8],
pooled features. Hjelm et al.[24] designed a method based which is a step-by-step Text-to-SQL model. In this model,
on mutual information maximization to achieve better high the Text-to-SQL task is divided into four subtasks:
dimensional representations in the model. In SFAM, we
• Table Selection: The purpose of table selection is to
introduce a pooling method based on mutual information
select the correct table, and a Bert model is used in the
maximization in the sequence feature augmentation module,
table selectio subtask.
which effectively enhances the robustness of Text-to-SQL
• Column Selection: The purpose of column selection is
model.
to select correct columns, and a Bert model is used in
the column selection subtask.
219
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on January 10,2025 at 13:00:11 UTC from IEEE Xplore. Restrictions apply.
Original Feature where
0 1 2 3 4 5 exp(sim(so , si ))
Sequence βi = N (3)
si ∈Sall exp(sim(so , si ))
T
u v
where sim(u, v) = ||u||||v|| means the cosine similarity
Augmented
Feature Sequence
3 4 5 0 1 2 between the two vectors u and v. βi is the the similarity
distribution which signifies the proximity between so and
Figure 2. The feature augmentation method in our SFAM. each si ∈ Sall .
Then we utilize the feature-pairs (so , sj ) between so and
Sos to apply with contrastive loss, which is defined similar
• SQL Generation: The purpose of SQL generation is to InfoNCE [28] as follows:
to generate SQL without values, which is essentially exp(sim(so , sj )/τ )
an Encoder-Decoder model, and the SQL generation Loc = −log N (4)
si ∈Sall exp(sim(so , si )/τ )
subtask is made up of a T5 model.
• Value Filling: The purpose of value filling is to convert where τ is a temperature parameter. Minimizing Loc en-
SQL without values into SQL with values, which is courages the network to make the pooling original feature
essentially an Encoder-Decoder model, and the value so close to Sos , and distant to Sas .
filling subtask is made up of a T5 model. In the same way, loss Lac of augmented consistency can
In SFAM, we mainly explore how to enhance the robust- also be obtained. Then we obtain the sequential consistency
ness of the encoder-decoder model structure. Our work is learning loss Lsc as follows:
based on the work of Shen et al [8].
Lsc = Loc + Lac (5)
B. Feature Augmentation Method
D. Pooling based on Mutual Information Maximization
In the field of computer vision, data augmentation has
In the SFAM method, we design a pooling method based
shown a powerful effect[25], [26], [27]. Inspired by these
on mutual information maximization to reduce the loss of
works[25], [26], [27], we designed a sequential feature
information of the feature sequence after pooling. In the
augmentation method for Text-to-SQL. Specifically, for a
following, we will introduce our method by taking the
feature sequence, our method first segments it, and then
pooling method in the direction of the original feature
randomly shuffles these feature sequence fragments. Our
sequence as an example.
feature augmentation method is shown in Figure2.
Let the original feature sequence and the augmented
C. Sequential Consistency Module feature sequence after preliminary encoding in the T5 model
The augmented feature sequence has different information be denoted as Sop and Sap , respectively. The pooling original
from the original feature sequence because the order of feature so can be calculated with pooling method based on
the internal feature fragments is disrupted. If our model attention mechanism as follows:
can enhance the ability to distinguish between the original so = αi si (6)
feature sequence and the augmented feature sequence, the si ∈Sos
performance of the model will obviously be enhanced. Based
on this idea, we design a module based on sequential consis- exp(W2 σ(W1 si + b1 ) + b2 )
αi = (7)
tent learning. This process is divided into two parts: original sj ∈Sos exp(W2 σ(W1 sj + b1 ) + b2 )
feature consistency and enhanced feature consistency, and
where σ is the activation function, W1 and W2 are two
the original feature consistency process is as follows:
learnable transformation weights matrices, and b1 and b2
Let the original feature sequence and the augmented
are two biases.
feature sequence after subsequent encoding in the T5 model
To reduce the loss of information of the feature sequence
be denoted as Sos and Sas , respectively. Assume that a feature
after pooling, we design strategies based on mutual in-
set is constructed as Sall = {Sos , Sas }. Then we calculate the
formation maximization. In order to maximize the mutual
pooling original feature so as follows:
information between so and Sos , same as Veliković et al.[21],
so = pool(Sos ) (1) we employ a discriminator D, and the discriminator is a
bilinear layer:
where pool is is the pooling operation along temporal
dimension. D(si , so ) = σ((ssi )T WD sp ) (8)
The soft nearest neighbor ŝ is defined as follows:
where WD is a learnable transformation weights matrix.
N
Veliković et al.[21] theoretically proves that the bidi-
ŝ = βi si (2) rectional cross-entropy loss function can maximize mutual
si ∈Sall
220
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on January 10,2025 at 13:00:11 UTC from IEEE Xplore. Restrictions apply.
Table I Table II
OVERALL COMPARISON EXPERIMENT. ROBUSTNESS COMPARISON EXPERIMENT.
Table III
A BLATION EXPERIMENTS ON ROBUSTNESS COMPARISON .
information. To maximize the mutual information between
so and Sos and minimize the mutual information between so Model Logic Form Accuracy
and Sop , we use the bidirectional cross-entropy loss function
SPSQL [8] 91.6%
in the work of Veliković et al.[21] to achieve this process: average pooling 93.7%
1 w/o. MI 92.8%
LD = ( E[logD(si , so )]+ E[1−logD(sj , so )]) SFAM (our method) 95.1%
2T s p
si ∈So sj ∈So
(9)
where T is the number of features in Sos .
because of the role of the sequential feature augmentation
In the same way, loss L˜D in the direction of the original
module in our SFAM method to achieve such an effect.
feature sequence can be obtained. So the loss function of
To verify the robustness of our SFAM method, we design
the pooling part can be expressed as:
the experiments shown in Table II. Compared with the
LM I = LD + L˜D (10) results in Table I, it can be seen that when other models
[29], [30], [31], [8] encounter different types of text queries,
E. Overall Formulation the accuracy of their SQL generation will be significantly
Assume that the original calculation function of the model reduced, while our SFAM method can almost maintain the
is expressed as Lbase , the overall loss function L is ex- accuracy of SQL generation. The experimental results in
pressed as: Table II demonstrate the strong robustness of our SFAM
L = Lbase + λLsc + μLM I (11) method.
221
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on January 10,2025 at 13:00:11 UTC from IEEE Xplore. Restrictions apply.
濌濉澸 濌濉澸 濌濉澸
濌濈澸 濌濈澸
濌濈澸
濌濈澸 濌濈澸
濌濇澸
濌濇澸 濌濇澸
濌濆澸 濌濆澸
濌濅澸
濌濆澸 濌濆澸
濌濄澸
濌濅澸 濌濅澸
(a) The influence of different λ (b) The influence of different μ (c) The influence of different τ
not only improve the robustness of the model but also further [10] F. Li and H. V. Jagadish, “Constructing an interactive natural
improve the accuracy of the model SQL generation. In the language interface for relational databases,” Proceedings of
future, we will consider applying our SFAM to practical the VLDB Endowment, vol. 8, no. 1, pp. 73–84, 2014.
scenarios. [11] L. Wan, Q. Wang, A. Papir, and I. L. Moreno, “General-
ized end-to-end loss for speaker verification,” in 2018 IEEE
ACKNOWLEDGMENT International Conference on Acoustics, Speech and Signal
This work is supported by Zhejiang Electric Power Co., Processing (ICASSP). IEEE, 2018, pp. 4879–4883.
Ltd. Science and Technology Project (No. 5211YF220006).
[12] A.-M. Popescu, A. Armanasu, O. Etzioni, D. Ko, and
R EFERENCES A. Yates, “Modern natural language interfaces to databases:
Composing statistical parsing with semantic tractability,” in
[1] L. Dong and M. Lapata, “Language to logical form with COLING 2004: Proceedings of the 20th International Con-
neural attention,” arXiv preprint arXiv:1601.01280, 2016. ference on Computational Linguistics, 2004, pp. 141–147.
[2] C. Wang, M. Brockschmidt, and R. Singh, “Pointing out sql [13] C. Unger, L. Bühmann, J. Lehmann, A.-C. Ngonga Ngomo,
queries from text,” 2018. D. Gerber, and P. Cimiano, “Template-based question answer-
ing over rdf data,” in Proceedings of the 21st international
[3] V. Zhong, C. Xiong, and R. Socher, “Seq2sql: Generating conference on World Wide Web, 2012, pp. 639–648.
structured queries from natural language using reinforcement
learning,” arXiv preprint arXiv:1709.00103, 2017. [14] C. Jinchao, H. Tao, C. Gang, W. Xiaofan, and C. Ke,
“Research on technology of generating multi-table sql query
[4] X. Xu, C. Liu, and D. Song, “Sqlnet: Generating structured statement by natural language,” Journal of Frontiers of Com-
queries from natural language without reinforcement learn- puter Science & Technology, vol. 14, no. 7, p. 1133, 2020.
ing,” arXiv preprint arXiv:1711.04436, 2017.
[15] T. Zhou, P. Krahenbuhl, M. Aubry, Q. Huang, and A. A.
[5] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, Efros, “Learning dense correspondence via 3d-guided cycle
M. Lewis, L. Zettlemoyer, and V. Stoyanov, “Roberta: A consistency,” in Proceedings of the IEEE conference on
robustly optimized bert pretraining approach,” arXiv preprint computer vision and pattern recognition, 2016, pp. 117–126.
arXiv:1907.11692, 2019.
[16] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-
[6] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, to-image translation using cycle-consistent adversarial net-
and Q. V. Le, “Xlnet: Generalized autoregressive pretraining works,” in Proceedings of the IEEE international conference
for language understanding,” Advances in neural information on computer vision, 2017, pp. 2223–2232.
processing systems, vol. 32, 2019.
[17] Q. Kong, W. Wei, Z. Deng, T. Yoshinaga, and T. Murakami,
[7] Z. Zhang, X. Han, Z. Liu, X. Jiang, M. Sun, and Q. Liu, “Cycle-contrast for self-supervised video representation learn-
“Ernie: Enhanced language representation with informative ing,” Advances in Neural Information Processing Systems,
entities,” arXiv preprint arXiv:1905.07129, 2019. vol. 33, pp. 8089–8100, 2020.
[8] R. Shen, G. Sun, H. Shen, Y. Li, L. Jin, and H. Jiang, [18] W. Jin, Z. Zhao, P. Zhang, J. Zhu, X. He, and Y. Zhuang,
“Spsql: Step-by-step parsing based framework for text-to-sql “Hierarchical cross-modal graph consistency learning for
generation,” in CMVIT 2023 conference proceeding, 2023. video-text retrieval,” in Proceedings of the 44th International
ACM SIGIR Conference on Research and Development in
[9] X. Meng and S. Wang, “Nchiql: The chinese natural language Information Retrieval, 2021, pp. 1114–1124.
interface to databases,” in Database and Expert Systems
Applications: 12th International Conference, DEXA 2001 [19] C. E. Shannon, “A mathematical theory of communication,”
Munich, Germany, September 3–5, 2001 Proceedings 12. ACM SIGMOBILE mobile computing and communications
Springer, 2001, pp. 145–154. review, vol. 5, no. 1, pp. 3–55, 2001.
222
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on January 10,2025 at 13:00:11 UTC from IEEE Xplore. Restrictions apply.
[20] M. I. Belghazi, A. Baratin, S. Rajeshwar, S. Ozair, Y. Bengio,
A. Courville, and D. Hjelm, “Mutual information neural
estimation,” in International conference on machine learning.
PMLR, 2018, pp. 531–540.
223
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on January 10,2025 at 13:00:11 UTC from IEEE Xplore. Restrictions apply.