# GpJSON - Leveraging Structural Indexes for High-Performance JSON Data Processing on GPUs
This Truffle language exposes JSONPath query execution to the polyglot [GraalVM](http://www.graalvm.org).

The goals are:
 1. Present a couple of parsing techniques based on structural indexes to quickly execute queries on JSON files
 2. Introduce a batching approach to improve performance and allow the processing of datasets bigger than the GPU’s memory
 3. Implement the above concepts into a Truffle Language to provide an engine that can be used from any host language that can run on the GraalVM.
 
## Using GpJSON in the GraalVM
To compile a JAR file containing GpJSON move to the language folder and run ```mvn package```.

Next, copy the JAR file from ```target/gpjson.jar``` into `jre/languages/gpjson` (Java 8) or `languages/gpjson` (Java 11) of the Graal installation. 

Note that `--jvm` and `--polyglot` must be specified in both cases as well.

In the examples folder, you can find a couple of files containing examples of the GpJSON's syntax.

## Benchmarks suite

To run the benchmarks provided in the [`benchmarks`](./benchmarks/) folder you first need to install the following dependencies:

- [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads)
- [GraalVM Community Edition](https://github.com/graalvm/graalvm-ce-builds/releases)
- [gcc](https://gcc.gnu.org)
- [Node.js](https://nodejs.org/en)

Then, add the following variables to your `.bashrc` (or equivalent):

```
export CUDA_DIR=[your-cuda-path]
export PATH=$PATH:$CUDA_DIR/bin
export GRAAL_DIR=[your-graalvm-path]
export PATH=$PATH:$GRAAL_DIR/bin
export NODE_DIR=[your-node-path]
```

Copy the `grcuda` and `gpjson` JARs from the [`deliverables`](./deliverables/) folder to `[your-graalvm-path]/languages/[grcuda/gpjson]/`.

Move to the benchmarks folder `cd benchmarks` and run `make setup` to install jsonpath, jsonpath-plus and simdjson.

Finally, run `./[name-of-the-benchmark].sh`. Results will be saved to `[name-of-the-benchmark].csv`.

The following options can be added to the command above:

- `-g` to exclude the GPU-based benchmarks (GpJSON only). Default is `false`
- `-w [number]` to set the number of warmup runs. Default is `5`
- `-r [number]` to set the number of runs. Default is `10`
- `-t [number]` to set the number of threads (Java JSONPath only). Default is `11`
- `-d [path]` to set the path of the dataset. Default value is `/home/ubuntu/datasets-ext/`

Datasets can be downloaded [here](https://polimi365-my.sharepoint.com/:f:/g/personal/10604455_polimi_it/ElAPYQNeE1BLtcyR_BbFGS0BcaFPp2NiF1kGM1MtjFjmLA).

For further details, such as the versions of the dependencies used or the queries executed by the benchmarks suite, please refer to the official thesis and/or publication.