You find here all the necessary materials for the labs of the High Performance Programming Course.
For each session of the course, a notion will be introduced (Data Structure, Algorithms, Archictecture) and will be applied in the following lab.
The general framework of the lab is a maven project that process data from the DEBS 2015 Grand Challenge. This challenge contains data from taxi trips in NYC.
The framework reads trips from the file and sends them to the different registered query processors. Query processors process the data to produce
You will be asked to answer queries on the data. Each query will reflect the notions seen during the course. The goal being to answer these queries as fast as possible.
First of all, fork this project into your own account: click on the Fork icon on this page. Clone the forked project on your computer. Import the project in Eclipse via Import->Maven Project.
Two main classes are at your disposition, the first one , MainNoNStreaming first loads all data in memory then sends the data to each query processor. The second one, MainStreamingstreams the data to the query processors.
The repository contains a small data file with 1000 records. This file is sufficient for test purpose but is too limited for large scale processing.
You will need to download the different files from here. Unzip them in src/main/resources/data.
To create a new query processor, create a new class in the package fr.tse.fi2.hpp.labs.queries.impl. Your class must extend AbstractQueryProcessor.
An exemple of an empty class:
public class SampleQueryProcessor extends AbstractQueryProcessor{
public SampleQueryProcessor(QueryProcessorMeasure measure) {
super(measure);
}
@Override
protected void process(DebsRecord record) {
// Process the record
}
}
You must complete the process method to implement the queries. This method is called for each DebsRecord that is sent by the framework. A DebsRecord contains information for one taxi trip: coordinates for pickup and dropoff, price paid, tip, ... The full list is available in the file as well as here (Data Section).
To be executed, your query processor must be registered in one (or both) main classes. Edit the files to add your own query processor:
List<AbstractQueryProcessor> processors = new ArrayList<>();
// Add you query processor here
processors.add(new SimpleQuerySumEvent(measure));
To add a result to the output file simply use the writeLine(String line) method. It will automatically append a line in the results/queryN.txt file, where N is the identifier of your query processor (automatically generated).
The framework includes a basic measurement system. Global execution time, per query execution time and throughput are automatically written in results/result.txt.
For some labs, specific instructions will be given to produce measure with JMH.
Follow the installation instruction. Verify that everything is ok with a mvn install. Install the extra data in your project. Modify the main classes to parse the sorted_data.csv file.
Remove the existing query that counts the events.
To compare performance for two implementations of the same feature, create the following queries:
StupidAveragePricethat puts every new trip price into a list and compute the average based on every number in the listIncrementalAveragePricethat uses the previous results to incrementally compute the average.
Execute both queries and measure the difference of running time and throughput, for both streaming and non streaming case.
TBD
TBD
Evaluation will be made based on the code available on your forked version of this project. No additional material will be accepted.