0% found this document useful (0 votes)

70 views6 pages

Query GPT

QueryGPT

Uploaded by

inaki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

70 views6 pages

Query GPT

QueryGPT

Uploaded by

inaki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

🛳️

QueryGPT - Natural Language to

SQL Using Generative AI
Status Completed

Tags LLM QueryGPT SQL genAI

https://www.uber.com/en-IN/blog/query-gpt/?uclick_id=6cfc9a34-
Link
aa3e-4140-9e8e-34e867b80b2b

Author Jeffrey Johnson

End
@October 22, 2024
Date

Source Uber

Start
@October 22, 2024
Date

Time 15

QueryGPT - Natural Language to SQL Using

Generative AI
Introduction
SQL → access and manipulate data

SQL syntax to craft queries to look things up in relational databases

QueryGPT → generate SQL queries through natural language prompts

uses LLMs, vector databases, and similarity search

QueryGPT - Natural Language to SQL Using Generative AI 1

Motivation
Uber → data platform handles 1.2 million interactive queries each montj

authoring queries requires lot of time between searching for relevant datasets
in data dictionary and then authoring the query inside editor

query authoring = creating and refining queries to extract specific

information from a database, search engine, or other data systems

Architecture
original architecture

relied on simple RAG to fetch (retrieving relevant data from a database) the
relevant samples needed to include in our query generation call to the LLM
(few-shot prompting) → take prompt, vectorize it and do similarity search
on SQL samples and schemas to fetch 3 relevant tables and 7 relevant
SQL samples

SQL sample queries → provide the LLM guidance on how to use the
table schemas provided

schema samples provided the LLM information about the columns that
existed on those tables

to help the LLM understand internal lingo and work with specific datasets,
some custom instructions were added in the LLM call

QueryGPT - Natural Language to SQL Using Generative AI 2

wrap all relevant schema samples, SQL samples, prompt, and business
instructions around a system prompt and send the request to the LLM

answer include an SQL query and an explanation of how the LLM

generated the query

worked well for a small set of schemas and SQL samples, nut as more
tables and SQL samples were added, accuracy was declining

better RAG

simple similarity search for prompt on schema samples and SQL queries
doesn’t return relevant results

understanding user’s intent

very challenging to go from user’s prompt to finding relevant schemas →

intermediate step was needed, which classifies the user’s prompt into an
“intent” that maps to relevant schemas and SQL samples

handling large schemas

combining really large schemas → lots of tokens → problems with context

window

Current Design
workspaces

= curated collections of SQL samples and tables tailored to specific

business domains → help narrow the focus for the LLM, improving
relevance and accuracy of generated queries

QueryGPT - Natural Language to SQL Using Generative AI 3

you can also create custom workspaces to fit very niche requirements

intent agent

incoming prompt first runs through an intent agent → map user question to
one or more business domains/workspaces (and by extension a set of SQL
samples and tables mapped to the domain)

LLM call to infer intent and mapping to workspaces

table agent

allows users to select the tables used in query generation → agent

provides suggestions for tables, user can validate

column prune agent

intermittent token size issue → when some requests included one or more
tables that consumed a large amount of tokens

LLM prunes irrelevant columns from schemas provided to LLM +

explanation why

improved cost and latency

Evaluation
to track incremental improvements in performance → standardized evaluation
procedure is needed

QueryGPT - Natural Language to SQL Using Generative AI 4

evaluation set

curating set of golden question-to-SQL answer mappings → manual

investment

set of real questions from logs, manually verified correct intent, schemas
required, and the golden SQL

evaluation procedure

for each question in evaluation, capture following signals

intent → is intent assigned accurate?

table overlap → are tables identified via Seach + Table Agent correct?

successful run → does generated query run successfully?

run has output → does query execution return >0 records (to check for
hallucinations such as “Finished” instead of “Completed”)

qualitative query similarity → how similar is generated query relative to

golden SQL → LLM assigns similarity score between 0 and 1

also aggregate accuracy and latency metrics for each evaluation run to
track performance over time

limitations

stochastic nature of LLMs → different outcomes for exactly the same

query

identify error patterns over longer time periods that can be addressed
by specific feature improvements

not one correct answer, same question can be answered by multiple

queries/tables

Learnings
LLMs are excellent classifiers (intermediate agents)

hallucinations (LLMs might generate query with tables or columns that don’t
exist)

QueryGPT - Natural Language to SQL Using Generative AI 5

user prompts are not always ‘context’-rich (prompts range from very detailed
with the right keywords to very short prompts → requires ‘good’ input from
users)

QueryGPT - Natural Language to SQL Using Generative AI 6

Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Database Administration (SQL Server)
100% (2)
Database Administration (SQL Server)
43 pages
Ai SQL Accuracy 2023 08 17
No ratings yet
Ai SQL Accuracy 2023 08 17
12 pages
Dbms Lab El Report
No ratings yet
Dbms Lab El Report
20 pages
Elements of Android Room
From Everand
Elements of Android Room
Mark Murphy
No ratings yet
SQL 101 Crash Course: Comprehensive Guide to SQL Fundamentals and Practical Applications
From Everand
SQL 101 Crash Course: Comprehensive Guide to SQL Fundamentals and Practical Applications
Emrys Callahan
5/5 (1)
Thesis 1
No ratings yet
Thesis 1
46 pages
Large Language Model Enhanced Text-to-SQL Generation - A Survey
No ratings yet
Large Language Model Enhanced Text-to-SQL Generation - A Survey
18 pages
Lucy: Think and Reason To Solve Text-to-SQL: Nina Narodytska Shay Vargaftik
No ratings yet
Lucy: Think and Reason To Solve Text-to-SQL: Nina Narodytska Shay Vargaftik
33 pages
Internship Report Final
No ratings yet
Internship Report Final
26 pages
Advanced SQL Queries: Writing Efficient Code for Big Data
From Everand
Advanced SQL Queries: Writing Efficient Code for Big Data
Robert Johnson
5/5 (2)
In Context Reinforcement Learning Based Retrieval Augmented Generation For Text To SQL
No ratings yet
In Context Reinforcement Learning Based Retrieval Augmented Generation For Text To SQL
8 pages
SQLAlchemy Essentials: Definitive Reference for Developers and Engineers
From Everand
SQLAlchemy Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering the Art of Scala Programming: Unraveling the Secrets of Expert-Level Programming
From Everand
Mastering the Art of Scala Programming: Unraveling the Secrets of Expert-Level Programming
Steve Jones
No ratings yet
SQL Made Easy: Tips and Tricks to Mastering SQL Programming
From Everand
SQL Made Easy: Tips and Tricks to Mastering SQL Programming
Ryan Campbell
No ratings yet
Slick in Depth: Definitive Reference for Developers and Engineers
From Everand
Slick in Depth: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Queryai: A Conversational Interface For SQL Database Querying Using Natural Language Processing
No ratings yet
Queryai: A Conversational Interface For SQL Database Querying Using Natural Language Processing
8 pages
Integrating - Generative - AI - Into - DBMS - Project
No ratings yet
Integrating - Generative - AI - Into - DBMS - Project
3 pages
LLM Based Survey Text 1741015993
No ratings yet
LLM Based Survey Text 1741015993
20 pages
Context-Based Ontology Modelling For Database: Enabling Chatgpt For Semantic Database Management
No ratings yet
Context-Based Ontology Modelling For Database: Enabling Chatgpt For Semantic Database Management
16 pages
SQLite Database Programming for Xamarin: Cross-platform C# database development for iOS and Android using SQLite.XM
From Everand
SQLite Database Programming for Xamarin: Cross-platform C# database development for iOS and Android using SQLite.XM
Anthony Serpico
No ratings yet
Rake Automation Techniques: Definitive Reference for Developers and Engineers
From Everand
Rake Automation Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
From Natural Language To SQL Review of
No ratings yet
From Natural Language To SQL Review of
15 pages
Spider 2 0 Can Language
No ratings yet
Spider 2 0 Can Language
45 pages
Semantic Parsing For Complex Data Retrieval: Targeting Query Plans vs. SQL For No-Code Access To Relational Databases
No ratings yet
Semantic Parsing For Complex Data Retrieval: Targeting Query Plans vs. SQL For No-Code Access To Relational Databases
17 pages
E Book Unleashing AI Powered Search Pureinsights
No ratings yet
E Book Unleashing AI Powered Search Pureinsights
48 pages
Acropolis Institute of Technology & Research, Indore
No ratings yet
Acropolis Institute of Technology & Research, Indore
21 pages
A Natural Language Interface To Relational Databases Using An Online Analytic Processing Hypercube
No ratings yet
A Natural Language Interface To Relational Databases Using An Online Analytic Processing Hypercube
18 pages
Project Report - 7 - Merged
No ratings yet
Project Report - 7 - Merged
46 pages
Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL
No ratings yet
Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL
18 pages
Project Report - 3
No ratings yet
Project Report - 3
3 pages
Project Report - 6
No ratings yet
Project Report - 6
7 pages
JavaScript Data Structures Explained: A Practical Guide with Examples
From Everand
JavaScript Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Clojure Essentials: Definitive Reference for Developers and Engineers
From Everand
Clojure Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Applied Machine Learning with MLlib: Definitive Reference for Developers and Engineers
From Everand
Applied Machine Learning with MLlib: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
How We Built Text-to-SQL at Pinterest - by Pinterest Engineering - Pinterest Engineering Blog - Medium
No ratings yet
How We Built Text-to-SQL at Pinterest - by Pinterest Engineering - Pinterest Engineering Blog - Medium
9 pages
LLM Based TXT To SQL
No ratings yet
LLM Based TXT To SQL
18 pages
SQL Fundamentals for New Developers: A Practical Guide with Examples
From Everand
SQL Fundamentals for New Developers: A Practical Guide with Examples
William E. Clark
No ratings yet
DB GPT Hub 2024
No ratings yet
DB GPT Hub 2024
17 pages
FYP Idea Proposal Meow Group
No ratings yet
FYP Idea Proposal Meow Group
4 pages
PLpgSQL Programming Essentials: Definitive Reference for Developers and Engineers
From Everand
PLpgSQL Programming Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mid Sem Report
No ratings yet
Mid Sem Report
11 pages
Structure-Guided Large Language Models For
No ratings yet
Structure-Guided Large Language Models For
24 pages
Paper 1-Integrating Advanced Language Models
No ratings yet
Paper 1-Integrating Advanced Language Models
6 pages
An Algorithm To Transform Natural Languages To SQL Queries For Relational Databases
No ratings yet
An Algorithm To Transform Natural Languages To SQL Queries For Relational Databases
7 pages
Prompt to Profit: AI Patterns That Give Solo Builders an Unfair Advantage
From Everand
Prompt to Profit: AI Patterns That Give Solo Builders an Unfair Advantage
Lucas Merritt
No ratings yet
LLM-driven Text-To-SQL and Database Querying
No ratings yet
LLM-driven Text-To-SQL and Database Querying
3 pages
Natural Language Processing With Some Abbreviation To SQL
No ratings yet
Natural Language Processing With Some Abbreviation To SQL
5 pages
AI Agent For Info Retrieval
No ratings yet
AI Agent For Info Retrieval
3 pages
Scripting with PowerShell for Beginners: A Practical Guide with Examples
From Everand
Scripting with PowerShell for Beginners: A Practical Guide with Examples
William E. Clark
No ratings yet
SQLite Essentials: Definitive Reference for Developers and Engineers
From Everand
SQLite Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
AI Professional Workshop
No ratings yet
AI Professional Workshop
32 pages
Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL
No ratings yet
Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL
17 pages
Mastering the Art of Clojure Programming: Unraveling the Secrets of Expert-Level Programming
From Everand
Mastering the Art of Clojure Programming: Unraveling the Secrets of Expert-Level Programming
Steve Jones
No ratings yet
Q D F C: Uerying Atabases With Unction Alling
No ratings yet
Q D F C: Uerying Atabases With Unction Alling
23 pages
Fine-Tuning of Small/Medium LLMs For Business QA On Structured Data
No ratings yet
Fine-Tuning of Small/Medium LLMs For Business QA On Structured Data
17 pages
Enhancing Text-To-SQL Translation For Financial System Design
No ratings yet
Enhancing Text-To-SQL Translation For Financial System Design
11 pages
Kotlin Essentials: Definitive Reference for Developers and Engineers
From Everand
Kotlin Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Knex.js Query Building and Migration Essentials: Definitive Reference for Developers and Engineers
From Everand
Knex.js Query Building and Migration Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Service Composition in The Chatgpt Era: Marco Aiello Ilche Georgievski
No ratings yet
Service Composition in The Chatgpt Era: Marco Aiello Ilche Georgievski
6 pages
Databricks Generative AI Engineer Associate Exam Free Dumps
No ratings yet
Databricks Generative AI Engineer Associate Exam Free Dumps
10 pages
The SQL Tutorial For Data Analysis
No ratings yet
The SQL Tutorial For Data Analysis
103 pages
Harish Mudedla
No ratings yet
Harish Mudedla
5 pages
Database Lec 1
No ratings yet
Database Lec 1
46 pages
Task 3
No ratings yet
Task 3
8 pages
Kriteria SPLaSK - Tagging
No ratings yet
Kriteria SPLaSK - Tagging
52 pages
Tutorial 2 & 3 Update
No ratings yet
Tutorial 2 & 3 Update
29 pages
How To Find SqlID of Statement
No ratings yet
How To Find SqlID of Statement
2 pages
Qad System Administration Training
No ratings yet
Qad System Administration Training
46 pages
What Is The Difference Between DBMS and RDBMS??
No ratings yet
What Is The Difference Between DBMS and RDBMS??
2 pages
DBS201 SQL Practice Problems: Sample Questions and SQL Answers
No ratings yet
DBS201 SQL Practice Problems: Sample Questions and SQL Answers
5 pages
Unit 5
No ratings yet
Unit 5
69 pages
Data Modeling - Wikipedia
No ratings yet
Data Modeling - Wikipedia
10 pages
Overall DWH Concepts Handbook
No ratings yet
Overall DWH Concepts Handbook
27 pages
Differences DBMS RDBMS
No ratings yet
Differences DBMS RDBMS
4 pages
DMS PR 1
No ratings yet
DMS PR 1
10 pages
Database Notes
No ratings yet
Database Notes
47 pages
External Databases From RPG
No ratings yet
External Databases From RPG
20 pages
Databases Mar 2012 MS - FINAL PDF
100% (1)
Databases Mar 2012 MS - FINAL PDF
4 pages
DBMS Answers Question Bank Ch01 PDF
No ratings yet
DBMS Answers Question Bank Ch01 PDF
17 pages
Modern Information Retrieval
No ratings yet
Modern Information Retrieval
58 pages
CHAPTER 2: Tutorial: Lesson I: Create A Tabular Report
No ratings yet
CHAPTER 2: Tutorial: Lesson I: Create A Tabular Report
18 pages
Data Integration For BI: TIBCO Jaspersoft ETL
No ratings yet
Data Integration For BI: TIBCO Jaspersoft ETL
2 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
25 pages
Scenario Based Questions
No ratings yet
Scenario Based Questions
14 pages
Important Oracle Query & Script
90% (10)
Important Oracle Query & Script
57 pages
Antivirus Scanning Exclusions For Skype For Business Server 2015
No ratings yet
Antivirus Scanning Exclusions For Skype For Business Server 2015
3 pages
Oracle 12c Architecture 7493015
No ratings yet
Oracle 12c Architecture 7493015
19 pages
Finding N The Business Day in Peoplesoft
No ratings yet
Finding N The Business Day in Peoplesoft
3 pages
DATA ANALYTICS Syllabus 3 Units
No ratings yet
DATA ANALYTICS Syllabus 3 Units
37 pages

Query GPT

Uploaded by

Query GPT

Uploaded by

🛳️

QueryGPT - Natural Language to

Tags LLM QueryGPT SQL genAI

Author Jeffrey Johnson

QueryGPT - Natural Language to SQL Using

SQL syntax to craft queries to look things up in relational databases

QueryGPT → generate SQL queries through natural language prompts

uses LLMs, vector databases, and similarity search

QueryGPT - Natural Language to SQL Using Generative AI 1

query authoring = creating and refining queries to extract specific

QueryGPT - Natural Language to SQL Using Generative AI 2

answer include an SQL query and an explanation of how the LLM

understanding user’s intent

very challenging to go from user’s prompt to finding relevant schemas →

handling large schemas

combining really large schemas → lots of tokens → problems with context

= curated collections of SQL samples and tables tailored to specific

QueryGPT - Natural Language to SQL Using Generative AI 3

LLM call to infer intent and mapping to workspaces

allows users to select the tables used in query generation → agent

column prune agent

LLM prunes irrelevant columns from schemas provided to LLM +

improved cost and latency

QueryGPT - Natural Language to SQL Using Generative AI 4

curating set of golden question-to-SQL answer mappings → manual

for each question in evaluation, capture following signals

intent → is intent assigned accurate?

successful run → does generated query run successfully?

qualitative query similarity → how similar is generated query relative to

stochastic nature of LLMs → different outcomes for exactly the same

not one correct answer, same question can be answered by multiple

QueryGPT - Natural Language to SQL Using Generative AI 5

QueryGPT - Natural Language to SQL Using Generative AI 6

You might also like