0% found this document useful (0 votes)

86 views9 pages

Project 2

The document describes Project 2 which involves writing a program to automatically generate user-friendly explanations of how query execution plans change during data exploration using SQL queries. Students are asked to design algorithms to identify changes in query plan trees and explain them to users using natural language and visuals. They must implement the program in Python using PostgreSQL and submit code files, a report, and peer assessments. The task aims to help non-technical users understand how their SQL queries are executed.

Uploaded by

jablejinx

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

86 views9 pages

Project 2

Uploaded by

jablejinx

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

PROJECT 2: UNDERSTANDING QUERY PLANS DURING DATA

EXPLORATION
SC3020 DATABASE SYSTEM PRINCIPLES
TOTAL MARKS: 100

Due Date: April 16, 2023; 11:59 PM

Real-world users may write a sequence of SQL queries to explore the underlying
relational database for a specific task. For instance, a user may start with a SQL query
Q, execute it, browse the results and then modify Q to Q’ (e.g., by modifying certain
predicates in the WHERE clause), reexecute it, and view the refined results. Such
exploration can go on iteratively by executing a sequence of related SQL queries and
browsing corresponding results repeatedly. The following example shows Q and Q’ with
changes to Q highlighted in red.

select * select *
from customer C, orders O from customer C, orders O
where C.c_custkey = O.o_custkey where C.c_custkey = O.o_custkey
and customer.name like ‘%cheng’
Query Q Query Q’

The RDBMS query optimizer will execute a query execution plan (QEP) to process each
such SQL query during exploration. For instance, there will be two QEPs, P and P’,
associated with Q and Q’, respectively, in the above example. These QEPs are typically
displayed in the form of tree-structure by a DBMS software (e.g., PostgreSQL).
Unfortunately, to an end user who is not proficient in database technology, this may not
be the best way to understand how each of her queries has been executed during
data exploration.

Your task is to write a program that automatically generates user-friendly explanation

(e.g., natural and visual language description) of the changes to the query execution
plans that take place during data exploration. Specifically, let P 1 , P 2 , …, P n are the QEPs
generated by the DBMS for executing a sequence of queries Q 1 , Q 2 , …, Q n ,
respectively, during data exploration. Note that the queries are related as they have
evolved from the original query Q 1 . Hence, the QEPs may also share common content
among themselves. Your task is to generate user-friendly description of the way the
plans have evolved during data exploration (e.g., a hash join in P 1 has now evolved to
sort-merge join in P 2 due to changes in the WHERE clause in Q 2 ).

Project 2/SC3020
Hint: Design algorithm to efficiently identify the parts of a plan that have evolved in the
query plan trees and explain those to the end user using a combination of visual and
natural language form and connecting them with the changes to SQL.

To this end, your tasks are as follows:

• Design and implement an algorithm that takes as input the followings:

a. Old query Q 1 , its QEP P 1
b. New query Q 2, its QEP P 2

It generates a user-friendly description of what has changed from P 1 to P 2 , and

why. Your goal is to ensure generality of the solution (i.e., it can handle a wide
variety of query plans on different database instances) and the user-friendly
explanation should be concise without sacrificing important information related
to the plan. The better is the algorithm design for the task, the more credit you
will receive. Similarly, the more functionalities you support, the more credit you
will receive.

• A user-friendly, graphical user interface (GUI) to enable the aforementioned

goals.

You should use Python as the host language on Windows platform for your project.
For students using Mac platform, you can install Windows on your Mac by following
instructions in https://support.apple.com/en-sg/HT201468. The DBMS allowed in this
project is PostgreSQL. The example dataset you should use for this project is TPC-H
(see Appendix). You are free to use any off-the-shelf toolkits for your project.

Note that several parts of the project are left open-ended (e.g., how the GUI should
look like? What are the functionalities we should support? How should you explain to
an end user?) intentionally so that the project does not curb a group’s creative
endeavors. You are free to make realistic assumptions to achieve these tasks.

SUBMISSION REQUIREMENTS

You submission should include the followings:

• You should submit three program files: interface.py, explain.py, and project.py.
The file interface.py contains the code for the GUI. The explain.py contains the
code for generating the explanation. The project.py is the main file that invokes
all the necessary procedures. Note that we shall be running the project.py file
(either from command prompt or using the Pychamp IDE) to execute the
software. Make sure your code follows good coding practice: sufficient
comments, proper variable/function naming, etc. We will execute the software

Project 2/SC3020
to check its correctness using different query sets and dataset to check for the
generality of the solution. We will also check quality of algorithm design w.r.t
processing of the query plans.
• Softcopy report containing details of the software including formal descriptions of
the key algorithms with examples. You should also discuss limitations of the
software (if any).
• Peer assessment report from each member of the team. Each individual
member of a team needs to assess contributions of the group members. Details
of peer assessment form will be provided closer to the submission date.
• You must submit a document containing instructions to run your software
successfully. You will not receive any credit if your software fails to execute
based on your instructions.
• All submissions will be through NTU Learn.

Note: Late submission will be penalized.

Project 2/SC3020
Appendix

I. Creating TPC-H database in PostgreSQL

Follow the following steps to generate the TPC-H data:

1) Go to
http://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp
and download TPC-H Tools v2.18.0.zip. Note that the version may defer as the tool
may have been updated by the developer.
2) Unzip the package. You will find a folder “dbgen” in it.
3) To generate an instance of the TPC-H database:
 Open up tpch.vcproj using visual studio software.
 Build the tpch project. When the build is successful, a command prompt will
appear with “TPC-H Population Generator <Version 2.17.3>” and several *.tbl
files will be generated. You should expect the following .tbl files: customer.tbl,
lineitem.tbl, nation.tbl, orders.tbl, part.tbl, partsupp.tbl, region.tbl, supplier.tbl
 Save these .tbl files as .csv files
 These .csv files contain an extra “|” character at the end of each line. These
“|” characters are incompatible with the format that PostgreSQL is
expecting. Write a small piece of code to remove the last “|” character in
each line. Now you are ready to load the .csv files into PostgreSQL
 Open up PostgreSQL. Add a new database “TPC-H”.
 Create new tables for “customer”, “lineitem”, “nation”, “orders”, “part”,
“partsupp”, “region” and “supplier”
 Import the relevant .csv into each table. Note that pgAdmin4 for PostgreSQL
(windows version) allows you to perform import easily. You can select to view
the first 100 rows to check if the import has been done correctly.
If encountered error (e.g., ERROR: extra data after last expected column)
while importing, create columns of each table first before importing. Note
that the types of each column has to be set appropriately. You may use the
SQL commands in Appendix II to create the tables.

Alternatively, you can also refer to https://docs.verdictdb.org/tutorial/tpch/ for

additional help on creating the TPC-H database

Project 2/SC3020
II. SQL commands for creating TPC-H data tables

Region table

1) Nation table

Project 2/SC3020
2) Part table

3) Supplier table

Project 2/SC3020
4) Partsupp table

5) Customer table

Project 2/SC3020
6) Orders table

Project 2/SC3020
7) Lineitem table

Project 2/SC3020

Practical Exam Sample Questions For Practice - 2025
No ratings yet
Practical Exam Sample Questions For Practice - 2025
15 pages
Sagar Python
No ratings yet
Sagar Python
26 pages
Choice Based Credit System: Semester Total Credit I
No ratings yet
Choice Based Credit System: Semester Total Credit I
18 pages
Result Analysis of A Student
No ratings yet
Result Analysis of A Student
34 pages
Cs Journal Dhruval
No ratings yet
Cs Journal Dhruval
53 pages
ClearVue 350 550 580 2.0 and ClearVue 650 1.0
No ratings yet
ClearVue 350 550 580 2.0 and ClearVue 650 1.0
268 pages
Practical List Class 12 (PROGRAMS 16-21)
No ratings yet
Practical List Class 12 (PROGRAMS 16-21)
8 pages
Database Management Systems Lab ETCS-256
No ratings yet
Database Management Systems Lab ETCS-256
28 pages
Numpy Notes
No ratings yet
Numpy Notes
38 pages
CS Record Term 2
No ratings yet
CS Record Term 2
3 pages
Govt. Boys Sr. Sec. School Baprola: School Code: 1617258 Computer Science
No ratings yet
Govt. Boys Sr. Sec. School Baprola: School Code: 1617258 Computer Science
12 pages
LabBook II - SY PDF
No ratings yet
LabBook II - SY PDF
47 pages
12cs Practical2024
No ratings yet
12cs Practical2024
3 pages
Prac Ques Class Xii
No ratings yet
Prac Ques Class Xii
6 pages
Ssce Practical Examination-24
No ratings yet
Ssce Practical Examination-24
12 pages
Xii Cs Rev 3 Practical QP
No ratings yet
Xii Cs Rev 3 Practical QP
12 pages
Together With (Python) Class-12 Term-2 2022
No ratings yet
Together With (Python) Class-12 Term-2 2022
126 pages
Prac Qus CS
No ratings yet
Prac Qus CS
4 pages
Screenshot 2025-01-22 at 10.17.20 PM
No ratings yet
Screenshot 2025-01-22 at 10.17.20 PM
4 pages
7.CDS - Ddm-Aids
No ratings yet
7.CDS - Ddm-Aids
9 pages
Computer Practical File (1) New
No ratings yet
Computer Practical File (1) New
68 pages
Csboardprac
No ratings yet
Csboardprac
19 pages
Aissce 2024 Xii
No ratings yet
Aissce 2024 Xii
2 pages
Sample Record XII
No ratings yet
Sample Record XII
30 pages
Cs Hiba
No ratings yet
Cs Hiba
32 pages
DBMS Lab Manual1
No ratings yet
DBMS Lab Manual1
44 pages
Atharv Practical File
No ratings yet
Atharv Practical File
19 pages
Dbms Lab Manual MR21 Syllabus Final
No ratings yet
Dbms Lab Manual MR21 Syllabus Final
119 pages
Final DBMSLab Manual
No ratings yet
Final DBMSLab Manual
54 pages
II PUC Labprograms NEW
No ratings yet
II PUC Labprograms NEW
61 pages
Introduction in MS Office
100% (2)
Introduction in MS Office
59 pages
Cse 2
No ratings yet
Cse 2
13 pages
Xii Practical Solutions
No ratings yet
Xii Practical Solutions
26 pages
COS221 Assignment 1 2025
No ratings yet
COS221 Assignment 1 2025
3 pages
Cs File Pages
No ratings yet
Cs File Pages
32 pages
Informatics Practices Practical List22-23
No ratings yet
Informatics Practices Practical List22-23
3 pages
Adobe Photoshop MCQ
100% (2)
Adobe Photoshop MCQ
6 pages
Ip - Xii - HHW Summer 2025
No ratings yet
Ip - Xii - HHW Summer 2025
2 pages
Practical File
No ratings yet
Practical File
19 pages
Prac Format 24 - 25
No ratings yet
Prac Format 24 - 25
23 pages
DB Lab Manuals
No ratings yet
DB Lab Manuals
87 pages
MD Kaifee and Harsh Negi CS Practical File
No ratings yet
MD Kaifee and Harsh Negi CS Practical File
57 pages
CS Pract 2023 Question Bank
No ratings yet
CS Pract 2023 Question Bank
7 pages
CS - Class XII 2021-22 Programs List For Record: Part A - Python (Use User Defined Functions)
No ratings yet
CS - Class XII 2021-22 Programs List For Record: Part A - Python (Use User Defined Functions)
7 pages
FSD - OP2023 - Latest Feature Scope Desription
No ratings yet
FSD - OP2023 - Latest Feature Scope Desription
778 pages
Q - Pratical Program 24 - 25
No ratings yet
Q - Pratical Program 24 - 25
6 pages
Projects Guider
No ratings yet
Projects Guider
19 pages
CSE - Database Management Systems
No ratings yet
CSE - Database Management Systems
17 pages
Class Xi Final Informatics Practices 2023-24
No ratings yet
Class Xi Final Informatics Practices 2023-24
6 pages
PHP GHC 0 P E
No ratings yet
PHP GHC 0 P E
7 pages
CCS341 Set3
100% (1)
CCS341 Set3
3 pages
CS Syllabus
No ratings yet
CS Syllabus
5 pages
Software Quality, Dilemma, Achieving
33% (3)
Software Quality, Dilemma, Achieving
21 pages
Capstone-Thesis 1 To 3
No ratings yet
Capstone-Thesis 1 To 3
83 pages
EX - No.15-20 and SQL Exercises
No ratings yet
EX - No.15-20 and SQL Exercises
17 pages
WS011T00 Windows Server 2019 Administration
No ratings yet
WS011T00 Windows Server 2019 Administration
4 pages
Buss Pass
No ratings yet
Buss Pass
1 page
Class: Xii Computer Science Practical Program List 2021-2022 Term - I
No ratings yet
Class: Xii Computer Science Practical Program List 2021-2022 Term - I
7 pages
Informatics Practices
No ratings yet
Informatics Practices
9 pages
MC GCMSReferenceManual
No ratings yet
MC GCMSReferenceManual
493 pages
Class 12 Sample Paper
No ratings yet
Class 12 Sample Paper
6 pages
BTP EA Intro BB Ver 1.01 SAP Mobile Cards
No ratings yet
BTP EA Intro BB Ver 1.01 SAP Mobile Cards
13 pages
Product Keys
No ratings yet
Product Keys
3 pages
Functional Setup Manager - FSM
No ratings yet
Functional Setup Manager - FSM
34 pages
LS10200 000NF E A1 IFC NOTIFIER Compatibility Document
No ratings yet
LS10200 000NF E A1 IFC NOTIFIER Compatibility Document
2 pages
BRMS Detail
No ratings yet
BRMS Detail
290 pages
Fuzzy Relations
No ratings yet
Fuzzy Relations
23 pages
How To Convert A PDF File To Word, Excel or JPG Format
No ratings yet
How To Convert A PDF File To Word, Excel or JPG Format
4 pages
Dora Error 2
No ratings yet
Dora Error 2
39 pages
CODASYL
No ratings yet
CODASYL
3 pages
Chapter 14
No ratings yet
Chapter 14
37 pages
CX1015 Lecture 1 - Introduction
No ratings yet
CX1015 Lecture 1 - Introduction
32 pages
Data Peserta Didik Kec. Betoambari - Dapodikdasmen
No ratings yet
Data Peserta Didik Kec. Betoambari - Dapodikdasmen
3 pages
Science - BSC Information Technology - Semester 5 - 2018 - November - Internet of Things Cbcs
No ratings yet
Science - BSC Information Technology - Semester 5 - 2018 - November - Internet of Things Cbcs
30 pages
HIT 400 Documentation Template
No ratings yet
HIT 400 Documentation Template
22 pages
Lecture 05 and 06 Conventional Indexes
No ratings yet
Lecture 05 and 06 Conventional Indexes
79 pages
Data Dictionary
No ratings yet
Data Dictionary
9 pages
As A Man Thinketh
No ratings yet
As A Man Thinketh
52 pages
Basic Organization of A Computer System
No ratings yet
Basic Organization of A Computer System
6 pages
HW0188 Proposal (Final)
No ratings yet
HW0188 Proposal (Final)
6 pages
Lecture 09 Hash Index - Without Answers
No ratings yet
Lecture 09 Hash Index - Without Answers
37 pages
CV-Lazar-v4 0 0
No ratings yet
CV-Lazar-v4 0 0
3 pages
Keyboard
No ratings yet
Keyboard
2 pages
BrixNGN Solution Overview
No ratings yet
BrixNGN Solution Overview
31 pages
Focusrite Control 3.7.4 Release Notes
No ratings yet
Focusrite Control 3.7.4 Release Notes
1 page
Alexsey Belan
No ratings yet
Alexsey Belan
1 page
Advatnages of Intranet Disadvantages of Intranet
No ratings yet
Advatnages of Intranet Disadvantages of Intranet
2 pages
SpectraLink 8000 SVP Admin Password CS 04 06 0
No ratings yet
SpectraLink 8000 SVP Admin Password CS 04 06 0
2 pages
Python Beyond Limits: Python, #3
From Everand
Python Beyond Limits: Python, #3
AnwaarX
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Getting Started With Quick Test Professional (QTP) And Descriptive Programming
From Everand
Getting Started With Quick Test Professional (QTP) And Descriptive Programming
Gaurav Garg
4.5/5 (2)
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)
The Software Programmer: Basis of common protocols and procedures
From Everand
The Software Programmer: Basis of common protocols and procedures
S Mathioudakis
No ratings yet
Microsoft AZ-400: Designing and Implementing Microsoft DevOps Solutions - Certification Exam Prep
From Everand
Microsoft AZ-400: Designing and Implementing Microsoft DevOps Solutions - Certification Exam Prep
Steve Brown
No ratings yet
C# 2010 Coding Briefs Data Access
From Everand
C# 2010 Coding Briefs Data Access
Kevin Hough
No ratings yet

Project 2

Uploaded by

Project 2

Uploaded by

PROJECT 2: UNDERSTANDING QUERY PLANS DURING DATA

Due Date: April 16, 2023; 11:59 PM

Your task is to write a program that automatically generates user-friendly explanation

To this end, your tasks are as follows:

• Design and implement an algorithm that takes as input the followings:

It generates a user-friendly description of what has changed from P 1 to P 2 , and

• A user-friendly, graphical user interface (GUI) to enable the aforementioned

You submission should include the followings:

Note: Late submission will be penalized.

I. Creating TPC-H database in PostgreSQL

Alternatively, you can also refer to https://docs.verdictdb.org/tutorial/tpch/ for

You might also like