NL2SQL_Schema_Linked_Guide
NL2SQL_Schema_Linked_Guide
Linking
This guide walks you through two complete approaches for building a
Natural Language to SQL system with schema linking.
Goal
Translate natural language questions to SQL by understanding schema
structure using: 1. NER-Based Schema Linking with a BERT classifier 2.
Schema-Aware Prompting using T5
Train a BERT-based NER model to label parts of the natural language query
as table names, column names, or values.
Why:
⚙ How:
Dataset Class
class NL2SQLNERDataset(Dataset): def init(self, encodings, labels):
self.encodings = encodings self.labels = labels
def __len__(self):
return len(self.labels)
Training
model = BertForTokenClassification.frompretrained("bert-base-uncased",
numlabels=4) trainingargs = TrainingArguments(outputdir='./nermodel',
perdevicetrainbatchsize=8, numtrain_epochs=5)
Why:
This lets the model learn to align question tokens to schema elements using
attention mechanisms.
Output:
Next Steps
• Integrate grammar-constrained decoding using PICARD
• Evaluate using Spider dataset metrics: exact match, exec accuracy