Badly Written Tips For Fast-Ish C++: Use Openmp For Very Easy Parallelisation
Badly Written Tips For Fast-Ish C++: Use Openmp For Very Easy Parallelisation
vector<shared_ptr<BigObject>>someObjects;
someObjects.reserve(nOfObjects);
for(intn=0;n<nOfObjects;++n){
allElectrodes.emplace_back(make_shared<BigObject>(constructorArgs));
}
For matrices/tensors, it's probably best to use Blitz++. I didn't know this so I wrote my
own very bad (but quite speedy) class, and learnt the following:
Contiguous storage is key! So it's not ideal to use multidimensional arrays/vectors.
Also if you're using vectors you end up with vector<vector<vector<
vector<double>>>> etc.
Solution: write a method to access a 1D vector as if it were multidimensional.
Example below.
It's probably worthwhile overloading the addition operator (at least) to do things with
your matrices.
I don't really understand this at all, but I think it helps to minimise copies by trying
to return references when accessing values.
float&MyBadMatrix::at(constintx,constinty,constintz,constintd){
returndataVector[d+sizeOfD*z+sizeOfD*sizeOfZ*y
+sizeOfD*sizeOfZ*sizeOfY*x];
}
sure this is genuinely faster, and it's definitely better practice in modern C++. Iterator
loops look like
(autoit=container.begin();it!=container.end();++it).
Admittedly it becomes nasty to get the iteration number within the loop, so if this is
fundamental then probably best to use the old style loop.
If you have some kind of conditional within a loop, then put the most likely outcome
first (preferably before any other code).
Outputting files
CSVs and indeed any ASCII formats are really slow to write and read.
HDF5 is very very fast indeed and MATLAB gets on very well with it. Unfortunately
the C++ library is awful and the documentation is sparse, to say the least. Look at their
readdata.cpp and create.cpp examples for some kind of vague idea of how to
use it. The auto-generated documentation is useful if you persevere.
Chunking is a (unfortunately advanced) feature of HDF5 which is very useful. As I
understand it rearranges the physical storage in the buffer (and in the file?). This is
useful because accessing a block of memory along the first dimension is fast, whereas
the (for example) 3rd dimension is pretty slow as each piece needs to be accessed
separately rather than by copying a contiguous block. If you need to access like this then
chunking rearranges so that the data is not stored by dimension.
A vague guide to chunking is something like this:
Create a H5::DSetCreatPropList
Create an hsize_t array of length equal to your number of dimensions
Fill this array with values corresponding to the size of your chunks. For example,
if you wanted to access blocks of length 100 along the 2nd dimension of a 4D dataset,
you'd set all elements of the array to 1 except array[1]=100.
When you do your outHandle>createDataSet(), pass your
H5::DSetCreatPropList as the 4th argument.
When you write to this chunked DataSet, access it using hyperslabs (ie not elementby-element access) that are the same shape as your chunks. If you have a 1D vector of
the right length for your hyperslab, then just pass in the data() method of the vector to
thewrite() method of the DataSet. I couldn't work out how to pass in
multidimensional data directly.
Compilation
Aside from linking with whatever libraries I've mentioned (and OpenMP requires
fopenmp), I think the following optimisation flags are useful.
O3: Big binaries but makes big differences to speed
ffastmath: Apparently unsafe but speeds up things a bit
march=native: Compiles specifically for the processor of the machine you're using,
takes advantage of any special features. Quite effective.
pipe: Maybe faster build times.
DBZ_THREADSAFE: Makes Blitz++ threadsafe
-D_GLIBCXX_PARALLEL: Enables experimental g++ features to parallelise some STL
vector functions (ie std::fill, std::transform)