Follow the link https://www.postgresql.org/download/ to download and install the suitable distribution of the database for your platform.
- Open the TPC webpage following the link: https://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp
- In the
Active Benchmarkstable (first table), follow the link ofDownload TPC-H_Tools_v3.0.1.zip, it'll redirect toTPC-H Tools Downloadpage - Give your details and click
download, it'll email you the download link. Use the link to download the zip file. - Unzip the zip file, and it must have the
dbgenfolder among the extracted contents
- Download the code
tpch-pgsqlfrom the link: https://github.com/Data-Science-Platform/tpch-pgsql/tree/master. - Follow the
tpch-pgsqlproject Readme to prepare and load the data. - (In case the above command gives error as
malloc.hnot found, showing the filenames, go inside dbgen folder, open the file and replacemalloc.hwithstdlib.h)
TPCH 100MB (sf=0.1) data is provided at: https://github.com/ahanapradhan/UnionExtraction/blob/master/mysite/unmasque/test/experiments/data/tpch_tiny.zip
The load.sql file in the folder needs to be updated with the corresponding location of the data .csv files.
https://duckdb.org/docs/extensions/tpch.html
A developement environment for python project is required next. Here is the link to PyCharm Community Edition: https://www.jetbrains.com/pycharm/download/ (Any other IDE is also fine)
- Python 3.8.0 or above
django==4.2.4sympy==1.4psycopg2==2.9.3numpy==1.22.4
The code is organized into the following directories:
The mysite directory contains the main project code.
Inside unmasque, you'll find the following subdirectories:
The src directory contains code PRISM code: which is unmasque core extended with the notion of FIT-ness, and the full implementation of PRISM extractors.
(obsolete) The test directory houses unit test cases for each extractor module. These tests are crucial for ensuring the reliability and correctness of the code.
Please explore the individual directories for more details on the code and its purpose.
inside mysite directory, there are two files as follows:
pkfkrelations.csv --> contains key details for the TPCH schema. If any other schema is to be used, change this file accordingly.
config.ini --> This contains database login credentials and flags for optional features. Change the fields accordingly.
database section: set your database credentials.
support section: give support file name. The support file should be present in the same directory of this config file.
logging section: set logging level. The developer mode is DEBUG. Other valid levels are INFO, ERROR.
feature section: set flags for advanced features, as the flag names indicate. Included features are, UNION, OUTER JOIN, <> or != operator in arithmetic filter predicates and IN operator.
options section: extractor options. E.g. the maximum value for LIMIT clause is 1000. If the user needs to set a higher value, use limit=value.
Open mysite/unmasque/src/main_cmd.py file.
This script has one default input specified.
Change this query to try PRISM for various inputs.
test.util package has queries.py file, containing a few sample queries. Any of them can be used for testing.
Change the current directory to mysite.
Use the following command:
python -m unmasque.src.main_cmd
the main function in main_cmd.py can be run from the IDE.
(Current code uses relative imports in main_cmd.py script. If that causes import related error while trying to run from IDE, please change the imports to absolute.)
In the terminal, go inside unmasque folder and start the Django app using the command: python3 manage.py runserver
Once the server is up at the 8080 port of localhost, the GUI can be accessed through the link: http://localhost:8080/unmasque/