0% found this document useful (0 votes)
39 views2 pages

Badly Written Tips For Fast-Ish C++: Use Openmp For Very Easy Parallelisation

The document provides badly written tips for fast C++ programming including using OpenMP for parallelization, storing data in vectors rather than arrays, writing to HDF5 files which is faster than CSV, and using compiler optimizations like -O3, -ffast-math, and -march=native to increase speed. Chunking is recommended when writing to HDF5 datasets to improve access performance along non-contiguous dimensions.

Uploaded by

WilliamSucky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views2 pages

Badly Written Tips For Fast-Ish C++: Use Openmp For Very Easy Parallelisation

The document provides badly written tips for fast C++ programming including using OpenMP for parallelization, storing data in vectors rather than arrays, writing to HDF5 files which is faster than CSV, and using compiler optimizations like -O3, -ffast-math, and -march=native to increase speed. Chunking is recommended when writing to HDF5 datasets to improve access performance along non-contiguous dimensions.

Uploaded by

WilliamSucky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Badly Written Tips for Fast-ish C++

Use OpenMP for very easy parallelisation


#pragmaompparallelfor before a for loop parallelises it
Putting schedule with some arguments after this can make things faster. There's an
extensive description in the documentation but I found schedule(guided,3)
worked very well on my machine (Intel, 4 cores).
Adding reduction on the end (look in documentation for format of arguments) does
some surprisingly effective optimisation on simple operations like increments and
decrements
Various easy mutexes and locks etc. #pragmaompcritical,#pragmaomp
atomic. Use the critical directive if you're pushing back to vectors or you'll get
sporadic errors which will take a long time to debug.
Data containers
Use the STL vectors. They're much better than C-style arrays.
emplace_back rather than push_back. This constructs vector elements in place
rather than constructing them and copying them in. When dealing with lots of
data/objects this gives huge performance increases. Requires C++11, which is worth
using anyway (it has a to-string function, very good random number generators, smart
pointers...)
Use shared_ptr and unique_ptr when practical (rather than plain old pointers). If
you have a vector of big objects, then combining this with the above gives you speedy
code:

vector<shared_ptr<BigObject>>someObjects;
someObjects.reserve(nOfObjects);
for(intn=0;n<nOfObjects;++n){
allElectrodes.emplace_back(make_shared<BigObject>(constructorArgs));
}

For matrices/tensors, it's probably best to use Blitz++. I didn't know this so I wrote my
own very bad (but quite speedy) class, and learnt the following:
Contiguous storage is key! So it's not ideal to use multidimensional arrays/vectors.
Also if you're using vectors you end up with vector<vector<vector<
vector<double>>>> etc.
Solution: write a method to access a 1D vector as if it were multidimensional.
Example below.
It's probably worthwhile overloading the addition operator (at least) to do things with
your matrices.
I don't really understand this at all, but I think it helps to minimise copies by trying
to return references when accessing values.
float&MyBadMatrix::at(constintx,constinty,constintz,constintd){
returndataVector[d+sizeOfD*z+sizeOfD*sizeOfZ*y
+sizeOfD*sizeOfZ*sizeOfY*x];
}

Practices which seem to make a difference (but probably don't)


Loop using iterators rather than the usual (inti=0;i<n;++i), I'm pretty

sure this is genuinely faster, and it's definitely better practice in modern C++. Iterator
loops look like
(autoit=container.begin();it!=container.end();++it).
Admittedly it becomes nasty to get the iteration number within the loop, so if this is
fundamental then probably best to use the old style loop.
If you have some kind of conditional within a loop, then put the most likely outcome
first (preferably before any other code).
Outputting files
CSVs and indeed any ASCII formats are really slow to write and read.
HDF5 is very very fast indeed and MATLAB gets on very well with it. Unfortunately
the C++ library is awful and the documentation is sparse, to say the least. Look at their
readdata.cpp and create.cpp examples for some kind of vague idea of how to
use it. The auto-generated documentation is useful if you persevere.
Chunking is a (unfortunately advanced) feature of HDF5 which is very useful. As I
understand it rearranges the physical storage in the buffer (and in the file?). This is
useful because accessing a block of memory along the first dimension is fast, whereas
the (for example) 3rd dimension is pretty slow as each piece needs to be accessed
separately rather than by copying a contiguous block. If you need to access like this then
chunking rearranges so that the data is not stored by dimension.
A vague guide to chunking is something like this:
Create a H5::DSetCreatPropList
Create an hsize_t array of length equal to your number of dimensions
Fill this array with values corresponding to the size of your chunks. For example,
if you wanted to access blocks of length 100 along the 2nd dimension of a 4D dataset,
you'd set all elements of the array to 1 except array[1]=100.
When you do your outHandle>createDataSet(), pass your
H5::DSetCreatPropList as the 4th argument.
When you write to this chunked DataSet, access it using hyperslabs (ie not elementby-element access) that are the same shape as your chunks. If you have a 1D vector of
the right length for your hyperslab, then just pass in the data() method of the vector to
thewrite() method of the DataSet. I couldn't work out how to pass in
multidimensional data directly.
Compilation
Aside from linking with whatever libraries I've mentioned (and OpenMP requires
fopenmp), I think the following optimisation flags are useful.
O3: Big binaries but makes big differences to speed
ffastmath: Apparently unsafe but speeds up things a bit
march=native: Compiles specifically for the processor of the machine you're using,
takes advantage of any special features. Quite effective.
pipe: Maybe faster build times.
DBZ_THREADSAFE: Makes Blitz++ threadsafe
-D_GLIBCXX_PARALLEL: Enables experimental g++ features to parallelise some STL
vector functions (ie std::fill, std::transform)

You might also like