0% found this document useful (0 votes)

13 views

Using-Modern-Cpp-Techniques-To-Enhance-Multicore-Optimizations - Das's Edution

Uploaded by

dastech1998

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Using-Modern-Cpp-Techniques-To-Enhance-Multicore-Optimizations - Das's Edution

Uploaded by

dastech1998

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Using Modern C++ Techniques to Enhance Multicore Optimizations

Garret Romaine ([email protected])

Rick Blacker ([email protected])
Jeremy Servoz, ([email protected])

With multicore processors now commonplace in PCs, and core counts continually climbing, software
developers must adapt. By learning to tackle potential performance bottlenecks and issues with
concurrency, engineers can future-proof their code to seamlessly handle additional cores as they are
added to consumer systems.

To help with this effort, Intel software teams have created a graphics toolkit that shows how parallel
processing techniques are best applied to eight different graphics filters. The entire source code package
contains C++ files, header files, project files, filters, and database files. A DLL overlay with a simple user
interface shows the speed at which each filter can be applied, both in a single-core system and when
using parallel-processing techniques.

Using Modern C++ Techniques to Enhance Multicore Optimizations

Page 1
In this white paper, readers learn to use modern C++ techniques to process data in parallel, across cores.
By studying the sample code, downloading the application, and learning the techniques, developers will
better understand Intel® architecture and multicore technology.

Getting Started with Parallel Processing

There are countless articles and books written on parallel processing techniques. Ian Foster has a good
recap, and multiple papers have been presented at SIGGRAPH, including one by John Owens. A good
reference is the 2015 book Programming Models for Parallel Computing, edited by Pavan Balaji. It
covers a wide range of parallel programming models, starting with a description of the Message Passing
Interface (MPI), the most common parallel programming model for distributed memory computing.

With applications across the computing spectrum, from database processing to image rendering,
parallel processing is a key concept for developers to understand. Readers are assumed to have some
experience and background in computer science to benefit from the concepts described here. The
source code was written for C++, but the concepts extend to other languages, and will be of interest to
anyone looking to better understand how to optimize their code for multicore processors.

An in-depth discussion of the Intel architecture is beyond the scope of this article. Software developers
should register at the Intel® Developer Zone and check out the documentation download page for Intel
architecture to read some of the following manuals:

• Combined Volume Set of Intel® 64 and IA-32 Architectures Software Developer’s Manuals
• Four-Volume Set of Intel® 64 and IA-32 Architectures Software Developer’s Manuals
• Ten-Volume Set of Intel® 64 and IA-32 Architectures Software Developer's Manuals
• Software Optimization Reference Manual
• Uncore Performance Monitoring Reference Manuals

Getting Started
The system requirements to explore the Image Filters project solution are minimal. Any multicore
system with Microsoft Windows® 10 is sufficient.

This project assumes that you have a C++ toolkit, such as Microsoft Visual Studio* with the .NET
framework. Freebyte* has a full set of links here if you want to explore different C++ tools. To simply
look through code, you may want to use a free code viewer such as NotePad++* or a similar product.

To begin exploring the Image Filters project, follow these steps:

1. Create an empty directory on your computer with a unique title such as “Image Filters Project”
or “Intel C++ DLL Project.”
Use whatever naming strategy you are comfortable with; you can include the year, for example.
2. Download the .zip file to your new directory.

Using Modern C++ Techniques to Enhance Multicore Optimizations

Page 2
The file is not large—about 40 KB. After extracting the files and building the project, you will
consume about 65 MB of disk space.
3. Extract the files in the new directory.
For example, if using 7-Zip*, right-click on the .zip file and select 7-Zip > Extract here. You can
use any file compression software, such as 7-Zip, WinZip*, or WinRAR*.
4. Open Microsoft Visual Studio or similar C++ tool. These instructions assume that you loaded the
most current version of Microsoft Visual Studio.
5. Open the project by using File > Open > Project/Solution and locating the ImageFilters.sln file.
The ImageFilters.sln file should appear in the Solution Explorer on the left.
The ImageFilters solution has two projects:
a) ImageFiltersWPF—The client application that utilizes the ImageProcessing DLL
and shows how to interact with it using C#.
b) ImageProcessing—The C++ DLL that contains the multicore image processing
algorithms.

Figure 1: You must build both projects inside the Solution Explorer.

6. From the Solution Explorer, select the ImageFiltersWPF project; then hold down the CTRL key
and select the ImageProcessing project.
7. Right-click on one of the highlighted files to pull up an Actions menu and select Build Selection.
This starts the compiler for both.
8. Wait while the system quickly compiles the existing source files into the binary, .DLL, and .EXE
files.

The following files display in the project solutions bin directory:

Using Modern C++ Techniques to Enhance Multicore Optimizations

Page 3
Figure 2: Compiled files in the bin > debug folder, including the ImageFiltersWPF executable.

Multithreading Technique
By default, applications run on a single processing core of a system. Because all new computing systems
feature a CPU with multiple cores and threads, this implies that some complex calculations could be
distributed intelligently, greatly speeding computation times.

OpenMP* (Open Multi-Processing) is an API first published in 1981 for Fortran 1.0 that supports
multiplatform shared memory multiprocessing programming in C, C++, and Fortran on most
platforms, instruction set architectures, and operating systems. It consists of a set of compiler
directives, library routines, and environment variables that influence run-time behavior.

In the case of the C++ DLL, which is where the execution is actually happening, OpenMP involves using
compiler directives to execute the filtering routines in parallel. For picture processing, each pixel of the
input image has to be processed in order to apply the routine to that image. Parallelism offers an
interesting way of optimizing the execution time by spreading out the work across multiple threads,
which can work on different areas of the image.

#pragma omp parallel

In each filtering routine in the application, processing is implemented as a loop. The goal is to study
every pixel in the image, one by one. The “#pragma omp parallel” compiler directive causes the loop
exercise to be divided and distributed to the cores.

#pragma omp parallel for if(openMP)

for (int i = 0; i < height; ++i) {
auto offset = i * stride;
BGRA* p = reinterpret_cast<BGRA*>(inBGR + offset);
BGRA* q = reinterpret_cast<BGRA*>(outBGR + offset);
for (int j = 0; j < width; ++j) {
if (i == 0 || j == 0 || i == height - 1 || j == width - 1)
q[j] = p[j]; // if conv not possible (near the edges)

Using Modern C++ Techniques to Enhance Multicore Optimizations

Page 4
else {
BYTE R, G, B;
double red(0), blue(0), green(0);
// Apply the conv kernel to every applicable
// pixel of the image
for (int jj = 0, dY = -radius; jj < size; jj++, dY++) {
for (int ii = 0, dX = -radius; ii < size; ii++, dX++)
{
int index = j + dX + dY * width;
// Multiply each element in the local
// neighborhood
// of the center pixel by the corresponding
// element in the convolution kernel
// For the three colors
blue += p[index].B * matrix[ii][jj];
red += p[index].R * matrix[ii][jj];
green += p[index].G * matrix[ii][jj];
}
}
// Writing the results to the output image
B = blue;
R = red;
G = green;
q[j] = BGRA{ B,G,R,255 };
}
}
}

Figure 3: Sample code for setting up parallel processing for BoxBlur.cpp.

If you follow the comments in the code, “BoxBlur.cpp” is setting up offsets, handling calculations when
edge conditions make convolution impossible, and applying the convolution kernel to each element for
red, blue, and green colors.

#pragma omp parallel for if(openMP)

for (int i = 0; i < height; ++i) {
auto offset = i * stride;
BGRA* p = reinterpret_cast<BGRA*>(tmpBGR + offset);
BGRA* q = reinterpret_cast<BGRA*>(outBGR + offset);
for (int j = 0; j < width; ++j) {
if (i == 0 || j == 0 || i == height - 1 || j == width - 1)
q[j] = p[j]; // if conv not possible (near the
edges)
else {
double _T[2];
_T[0] = 0; _T[1] = 0;
// Applying the two Sobel operators (dX dY) to
// every available pixel

Using Modern C++ Techniques to Enhance Multicore Optimizations

Page 5
for (int jj = 0, dY = -radius; jj < size;
jj++, dY++) {
for (int ii = 0, dX = -radius; ii < size;
ii++, dX++) {
int index = j + dX + dY * width;
// Multiplicating each pixel in the
// neighborhood by the two Sobel
// Operators
// It calculates the vertical and
// horizontal derivatives of the image
// at a point.
_T[1] += p[index].G * M[1][ii][jj];
_T[0] += p[index].G * M[0][ii][jj];
}
}
// Then is calculated the magnitude of the
// derivatives
BYTE a = sqrt((_T[0] * _T[0]) + (_T[1] * _T[1]));
// Condition for edge detection
q[j] = a > 0.20 * 255 ? BGRA{ a,a,a,255 } : BGRA{
0,0,0,255 };
}
}
}

// Delete the allocated memory for the temporary grayscale image

delete tmpBGR;
}
return 0;
}

Figure 4: Parallelism structure for SobelEdgeDetector.cpp.

In the second example of “omp parallel for” taken from “SobelEdgeDetector.cpp”, similar filtering
operations take place, with the edge detector working with grayscale pictures.

Memory Management
In software development, developers must be careful about memory management to avoid serious
impacts on application performance. In the case of the Harris corner detector and the Shi-Tomasi corner
detector, memory management is crucial to creating three matrices and storing the results of Sx, Sy and
Sxy.

// Creating a temporary memory to keep the Grayscale picture

BYTE* tmpBGR = new BYTE[stride*height * 4];
if (tmpBGR) {
// Creating the 3 matrices to store the Sobel results, for each thread
int max_threads = omp_get_max_threads();
double *** Ix = new double**[max_threads];
double *** Iy = new double**[max_threads];
double *** Ixy = new double**[max_threads];
for (int i = 0; i < max_threads; i++) {

Using Modern C++ Techniques to Enhance Multicore Optimizations

Page 6
Ix[i] = new double*[size_kernel];
Iy[i] = new double*[size_kernel];
Ixy[i] = new double*[size_kernel];
for (int j = 0;j < size_kernel;j++) {
Ix[i][j] = new double[size_kernel];
Iy[i][j] = new double[size_kernel];
Ixy[i][j] = new double[size_kernel];
}
}

Figure 5: Setting up temporary memory for the Shi-Tomasi corner detector filter.

Allocating such matrices for every pixel of the source would require considerable memory, and
multithreading probably wouldn’t be beneficial when applied. In fact, it could even result in slower
calculations, due to the overhead of working with a large memory space.

In order to avoid memory issues and set up a scenario where multithreading makes the calculations
faster, these three matrices can be considered as a set of matrices, with each available thread
containing its own set of matrices. The application can then be set up to allocate, outside of the parallel
section, as many sets of matrices as there are available threads for this application. To get the maximum
number of threads the following function is used: “omp_get_max_threads()” from the “omp.h” file.
This file is found with the rest of the header files in the ImageProcessing > External Dependencies
directory.

As described at the Oracle support site, this function should not be confused with the similarly named
“omp_get_num_threads()”. The “max” call returns the maximum number of threads that can be put to
use in a parallel region. The “num” call returns the number of threads that are currently active and
performing work. Obviously, those are different numbers. In a serial region “omp_get_num_threads”
returns 1; in a parallel region it returns the number of threads that are being used.

The call “omp_set_num_threads” sets the maximum number of threads that can be used (equivalent to
setting the OMP_NUM_THREADS environment variable).

In the processing loop, each thread accesses its proper set of matrices. These sets are stored in a
dynamically allocated array of sets. To access the correct set the index of the actual thread is used with
the function “omp_get_thread_num()”.Once the routine is executed the three matrices are reset to
their initial values, so that the next time the same thread has to execute the routine for another pixel,
the matrix is already prepared for use.

Principle of Convolution
Image filtering is a good showcase for multicore processing because it involves the principle of
convolution. Convolution is a mathematical operation that accumulates effects; think of it as starting
with two functions, such as f and g, to produce a third function that represents the amount of overlap as
one function is shifted over another.

Using Modern C++ Techniques to Enhance Multicore Optimizations

Page 7
In this version of iteration, convolution is the process of adding each element of the image to its local
neighbors, weighted by a kernel. By adding each element of the image to its local neighbors, weighted
by the kernel, convolution can be used for blurring, sharpening, embossing, edge detection, and more.
There are numerous resources available with a quick search, such as detailed discussions of kernels and
image processing.

In the case of image filtering, convolution works with a kernel, which is a matrix of numbers. This kernel
is then applied to every pixel of the image, with the center element of the convolution kernel placed
over the source pixel (Px). The source pixel is then replaced by the weighted sum of itself and nearby
pixels. Multithreading helps filter the image faster by breaking the process into pieces.

Figure 6: Convolution using weighting in a 3 x 3 kernel (source: GNU Image Manipulation Program).

In this example, the convolution kernel is a 3 x 3 matrix represented by the green box. The source pixel
is 50, shown in red at the center of the 3 x 3 matrix. All local neighbors, or nearby pixels, are the pixels
directly within the green square. The larger the kernel, the larger the number of neighbors.

In this case, the only weight different than zero is the second element of the first row and represented
by a 1. All other elements in the 3 x 3 matrix are 0. The operation multiplies each element by zero,
removing it, except for the single pixel represented by 42. So, the new source pixel is 42 x 1 = 42. Thus,
the pixel just above the source pixel is overlapped by the weight 1 of the convolution kernel.

If you imagine each weighting as a fraction rather than zero, you can picture how images could be
blurred by analyzing and processing each surrounding pixel.

Filtering Techniques
To see the result of a filtering technique, you’ll have to build the project as described in the “Getting
Started” section. Then double-click on the Image Filters > ImageFiltersWPF > Bin > Debug >
ImageFiltersWPF.exe file.

Using Modern C++ Techniques to Enhance Multicore Optimizations

Page 8
Figure 7: ImageFiltersWPF executable main window. Use the small gray box at the top-right corner of the screen to locate
directories with images you want to experiment with.

The interface is very simple. You can select images on your system using the directory search feature in
the gray box in the upper-right corner. Use the “Stretch” button to make sure an image completely fills
the graphical user interface (GUI). Select an image, then apply a filter. Watch the “Time in seconds”
calculations at the bottom-left of the interface to see how long a filter would take to apply in a multicore
system versus a system with a single core.

There are eight filters in all; each alters the original image, but some filters create more dramatic
changes.

Box Blur Filter

A box blur is a spatial domain linear filter in which each pixel in the resulting image has a value equal to
the average value of its neighboring pixels in the input image. Due to its property of using equal weights,

Using Modern C++ Techniques to Enhance Multicore Optimizations

Page 9
it can be implemented using a simple accumulation algorithm, and is generally a fast way to achieve a
blur. The name refers to the boxy, pixelated result.

The weights of the convolution kernel for the box blur are all the same. Assuming that we have a 3 x 3
matrix, this means that we have nine elements inserted into the matrix in total.

The weight for every element is calculated so that the sum of every element is 1.

Figure 8: The original image on the left is vivid, with good detail, while the image on the right has had the Box Blur effect
applied.

When using the app, it was calculated that a single core system would take 0.1375 seconds to apply the
Box Blur while a multicore system, in this case with an Intel® Core™ i7-4220 processor, took 0.004
seconds.

Let’s look in depth at what is going on in BoxBlur.cpp to understand the multithreading principles.

include "stdafx.h"
#include <fstream>
#include "omp.h"

using namespace std;

extern "C" __declspec(dllexport) int __stdcall BoxBlur(BYTE* inBGR, BYTE* outBGR, int
stride, int width, int height, KVP* arr, int nArr)
{
// Pack the following structure on one-byte boundaries:
// smallest possible alignment
// This allows us to use the minimal memory space for this type:
// exact fit - no padding

Using Modern C++ Techniques to Enhance Multicore Optimizations

Page 10
#pragma pack(push, 1)
struct BGRA {
BYTE B, G, R, A;
};

Figure 9: Beginning of the BoxBlur.cpp file.

First, the file is set up to include the “stdafx.h” header, a standard system include. Finally, omp.h is the
header file that brings in OpenMP instructions.

BoxBlur then uses the extern “C” function to declare calls and variables. The rest of the C++ file is
devoted to multicore functionality. First, using “#pragma pack(push, 1)”,the file defines how to
efficiently handle a BGRA (blue green red alpha) color component packing structure on one-byte
boundaries using the smallest possible alignment.

Next, the file declares “#pragma pack (pop)” to set up the default packing mode, defines the Boolean
operator for whether multiple cores have been detected, sets up the convolution kernel, and allocates
memory.

Finally, if there are multiple cores (OpenMP = true), the file uses “#pragma omp parallel for if
(openMP)”. The code determines offsets and casts, and handles situations at the edges where
convolution is not possible. Results are written to the output image, and the allocated memory is
cleared for the convolution kernel. There are similar sections of code in each of the filters.

Gaussian Blur Filter

Gaussian blur is the result of blurring an image by a Gaussian kernel to reduce image noise and reduce
detail. It is similar to Box Blur filtering; each pixel in the image gets multiplied by placing the center pixel
of the Gaussian kernel on the image pixel and multiplying the values in the original image with the pixels
in the kernel that overlap. The values resulting from these multiplications are added up, and that result
is used for the value at the destination pixel.

The weights of the elements of a Gaussian matrix N x N are calculated by the following:

Here x and y are the coordinates of the element in the convolution kernel. The top-left corner element is
at the coordinates (0, 0), and the bottom-right at the coordinates (N-1, N-1).

For the same reason as the Box Blur, the sum of every element has to be 1. Thus, at the end, we need
each element of the kernel to be divided by the total sum of the weights of the kernel.

Using Modern C++ Techniques to Enhance Multicore Optimizations

Page 11
Threshold Filter
The threshold routine is the only technique that does not use the convolution principle. Threshold filters
examine each value of the input dataset and change all values that do not meet the boundary
conditions. The result is that, if the luminance is smaller than the threshold, the pixel is turned to black;
otherwise, it remains the same.

Figure 10: Threshold filtering example.

Sobel Edge Detection

The Sobel operator is an edge detector that relies on two convolutions by using two different kernels.
These two kernels calculate the horizontal and vertical derivatives of the image at a point. Though its
goal is different, it is used to detect the edges inside a picture. Applying such a kernel provides a score,
indicating whether or not the pixel can be considered as being part of an edge. If this score is greater
than a given threshold, it can be considered as part of an edge.

1 0 -1 1 2 1

2 0 -2 0 0 0

1 0 -1 -1 -2 -1

Figure 11: Sobel edge detection relies on two different kernels.

This means that for each pixel there are two results of convolution, Gx and Gy. Considering them as a
scalar of a 2D vector, the magnitude G is calculated as follows:

Using Modern C++ Techniques to Enhance Multicore Optimizations

Page 12
Laplacian Edge Detector
This filter technique is quite similar to Box Blur and Gaussian Blur. It relies on a single convolution, using
a kernel such as the one below to detect edges within a picture. The results can be applied to a
threshold so that the visual results are smoother and more accurate.

-1 -1 -1

-1 8 -1

-1 -1 -1

Laplacian of Gaussian
The Laplacian edge detector is particularly sensitive to noise so, to get better results, we can apply a
Gaussian Blur to the whole image before applying the Laplacian filter. This technique is named
“Laplacian of Gaussian.”

Harris Corner Detector

Convolutions are also used for the Harris corner detector and the Shi-Tomasi corner detector, and the
calculations are more complex than earlier filter techniques. The vertical and horizontal derivatives
(detailed in the Sobel operator) are calculated for every local neighbor of the source pixel Px (including
itself). The size of this area (window) has to be an odd number so that it has a center pixel, which is
called the source pixel.

Thus, Sx and Sy are calculated for every pixel in the window. Sxy is calculated as Sxy = Sx * Sy.

These results are stored in three different matrices. These matrices respectively represent the values Sx,
Sy, and Sxy for every pixel around the source pixel (and also itself).

A Gaussian matrix (the one used for the Gaussian blur) is then applied to these three matrices, which
results in three weighted values of Sx, Sy, and Sxy. We will name them Ix, Iy, and Ixy.

These three values are stored in a 2 x 2 matrix A:

Using Modern C++ Techniques to Enhance Multicore Optimizations

Page 13
Then a score k is calculated, representing whether that source pixel can be considered as a part of a
corner, or not.

Then, if k is greater than a given threshold, the source pixel is turned into white, as part of a corner. If
not, it is set to black.

Shi-Tomasi Corner Detector

This detector is based on the Harris corner detector; however, there is a change relating to the condition
of corner detection. This detector has better performance than the previous one.

Once the matrix A is calculated, in the same way as above, the eigenvalues 𝜆1and 𝜆2 of the matrix A are
calculated. The eigenvalues of a matrix are the solutions of the following equation 𝑑𝑒𝑡(𝐴) = 0.

As with the Harris corner detector, with regard to the value of k, this source pixel will be or will not be
considered as part of a corner.

Figure 12: Example of Shi-Tomasi corner detector filter, resulting in almost total conversion to black pixels. The filtering took 61
seconds using a single core, versus 14 seconds using multiple cores.

Using Modern C++ Techniques to Enhance Multicore Optimizations

Page 14
Conclusion
This C++ DLL application is a good example of how important it is to apply multithreading techniques to
software development projects. In almost every scenario, the calculations involved on a four-core
system to apply the various filters required about three times as long to complete using a single core
versus using multicore techniques.

Developers should not expect to get an N times speedup when running a program parallelized using
OpenMP on an N processor platform. According to sources, there are several reasons why this is true:

● When a dependency exists, a process must wait until the data it depends on is computed.

● When multiple processes share a nonparallel proof resource (like a file to write in), their
requests are executed sequentially. Therefore, each thread must wait until the other thread
releases the resource.

● A large part of the program may not be parallelized by OpenMP, which means that the
theoretical upper limit of speedup is limited, according to Amdahl's law.

● N processors in symmetric multiprocessing (SMP) may have N times the computation power,
but the memory bandwidth usually does not scale up N times. Quite often, the original memory
path is shared by multiple processors, and performance degradation may be observed when
they compete for the shared memory bandwidth.

● Many other common problems affecting the final speedup in parallel computing also apply to
OpenMP, like load balancing and synchronization overhead.

With current systems powered by processors such as the Intel® Core™ i9-7980XE Extreme Edition
processor, which has 18 cores and 36 threads, the advantages of developing code that is optimized to
handle multithreading is obvious. To learn more, download the app, analyze it with an integrated
development environment such as Microsoft Visual Studio, and get started with your own project.

Appendix A: About ImageFiltersWPF

ImageFiltersWPF is a Windows* WPF client application that uses Extensible Application Markup
Language (XAML) to display its GUI. The entry point into this app is
MainWindow.xaml/MainWindow.xaml.cs. Along with the main window are several supporting classes to
help keep functionality clean.

ImageFileList.cs
This class’s primary function is to generate a list of image files that can be selected in order to apply
filters.

Using Modern C++ Techniques to Enhance Multicore Optimizations

Page 15
ImageFilterList.cs
This is a very simple list that encapsulates a list of the filters that the C++ DLL provides. Once created, it
is used by the GUI element lbImageFilters.

BitmapWrapper.cs
This class accepts a source bitmap image and takes that bitmap and turns it into a byte array that can be
consumed by the C++ DLL.

ImageProcessWrapper.cs
This class loads up the DLL and supplies the function to call, which passes the byte arrays to the DLL.

MainWindow.xmal.cs
This is the MainWindow GUI code. This function gets the current filter name and sends the
bitmapwrapper instance to the DLL. It does this twice, once for single core, then again as multicore.
After each of those run, it updates labels that contain the number of seconds it took to process the
image. Once the processing is complete, the new image is displayed.

Resources
Intel Developer Zone: https://software.intel.com

OpenMP: http://www.openmp.org/

Intel® Many Integrated Core (Intel® MIC) Architecture: https://software.intel.com/en-us/node/522480

Notices
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is
granted by this document.
Intel disclaims all express and implied warranties, including without limitation, the implied
warranties of merchantability, fitness for a particular purpose, and non-infringement, as well
as any warranty arising from course of performance, course of dealing, or usage in trade.
This document contains information on products, services and/or processes in development.
All information provided here is subject to change without notice. Contact your Intel
representative to obtain the latest forecast, schedule, specifications and roadmaps.
The products and services described may contain defects or errors known as errata which may
cause deviations from published specifications. Current characterized errata are available on
request.
Copies of documents which have an order number and are referenced in this document may
be obtained by calling 1-800-548-4725 or by visiting www.intel.com/design/literature.htm.

Using Modern C++ Techniques to Enhance Multicore Optimizations

Page 16
This sample source code is released under the Intel Sample Source Code License Agreement.
Intel, the Intel logo, and Intel Core are trademarks of Intel Corporation in the U.S. and/or other
countries.
Microsoft, Windows, and the Windows logo are trademarks, or registered trademarks of
Microsoft Corporation in the United States and/or other countries.
*Other names and brands may be claimed as the property of others.
© 2018 Intel Corporation

Using Modern C++ Techniques to Enhance Multicore Optimizations

Page 17

09 ParallelizationRecap PDF
No ratings yet
09 ParallelizationRecap PDF
62 pages
OpenMP 4.0 For GPU, Accelerators and Other Things - Michael Wong - CppCon 2014
No ratings yet
OpenMP 4.0 For GPU, Accelerators and Other Things - Michael Wong - CppCon 2014
128 pages
CImgTutorial PDF
No ratings yet
CImgTutorial PDF
4 pages
HANDS On LAB S4793 Image Processing Using NPP
No ratings yet
HANDS On LAB S4793 Image Processing Using NPP
22 pages
WINSEM2022-23 CSE4001 ELA VL2022230504176 Reference Material I 16-12-2022 How To Create New Project - Sample OpenMP Code
No ratings yet
WINSEM2022-23 CSE4001 ELA VL2022230504176 Reference Material I 16-12-2022 How To Create New Project - Sample OpenMP Code
22 pages
Migration To Multicore: Tools That Can Help: Tasneem G. Brutch
No ratings yet
Migration To Multicore: Tools That Can Help: Tasneem G. Brutch
10 pages
Migdalskiy Sergiy Physics Optimization Strategies
No ratings yet
Migdalskiy Sergiy Physics Optimization Strategies
104 pages
C++ in Huge AAA Games - Nicolas Fleury - CppCon 2014
No ratings yet
C++ in Huge AAA Games - Nicolas Fleury - CppCon 2014
51 pages
Per Flab
No ratings yet
Per Flab
7 pages
Introduction To The Opencv Library
No ratings yet
Introduction To The Opencv Library
6 pages
OpenCL Image Convolution Filter - Box Filter
No ratings yet
OpenCL Image Convolution Filter - Box Filter
8 pages
Parallel N Distributed Systems
No ratings yet
Parallel N Distributed Systems
44 pages
.Trashed-1650000204-Hpc Prac Exam
No ratings yet
.Trashed-1650000204-Hpc Prac Exam
5 pages
Parallel Computing Lab: How To Create New Project?
100% (1)
Parallel Computing Lab: How To Create New Project?
18 pages
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
From Everand
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
Kanto
No ratings yet
Multi Threaded Game Engine Design 1st Edition Jonathan S. Harbour - The ebook is ready for instant download and access
No ratings yet
Multi Threaded Game Engine Design 1st Edition Jonathan S. Harbour - The ebook is ready for instant download and access
51 pages
Design of Parallel Algorithm'S: Faculty Guide: Group Members
No ratings yet
Design of Parallel Algorithm'S: Faculty Guide: Group Members
49 pages
ImageProcessingTutorial
No ratings yet
ImageProcessingTutorial
28 pages
Lecture1 Notes
No ratings yet
Lecture1 Notes
19 pages
SFML Blueprints - Sample Chapter
No ratings yet
SFML Blueprints - Sample Chapter
20 pages
Chapter 3
No ratings yet
Chapter 3
20 pages
Report On Development of Codes in C++ For A Few Image Processing Toolbox Functions of Matlab
No ratings yet
Report On Development of Codes in C++ For A Few Image Processing Toolbox Functions of Matlab
32 pages
IP Fundamentals
No ratings yet
IP Fundamentals
41 pages
Intro To MPI
No ratings yet
Intro To MPI
44 pages
Download ebooks file Multi Threaded Game Engine Design 1st Edition Jonathan S. Harbour all chapters
No ratings yet
Download ebooks file Multi Threaded Game Engine Design 1st Edition Jonathan S. Harbour all chapters
51 pages
Full Download Multi Threaded Game Engine Design 1st Edition Jonathan S. Harbour PDF DOCX
100% (10)
Full Download Multi Threaded Game Engine Design 1st Edition Jonathan S. Harbour PDF DOCX
40 pages
Parallel Multiverse
No ratings yet
Parallel Multiverse
46 pages
Parallel Solving Tasks of Digital Image Processing
No ratings yet
Parallel Solving Tasks of Digital Image Processing
5 pages
Intro To OpenMP Mattson Customized
No ratings yet
Intro To OpenMP Mattson Customized
94 pages
C++ Optimization Strategies and Techniques: Pete Isensee
No ratings yet
C++ Optimization Strategies and Techniques: Pete Isensee
38 pages
C++ Coding Idea with Example
From Everand
C++ Coding Idea with Example
Billy H. Green
No ratings yet
Report Template PDF
No ratings yet
Report Template PDF
9 pages
Tuning Programs With Oprofi Le
No ratings yet
Tuning Programs With Oprofi Le
10 pages
MP Report v2
No ratings yet
MP Report v2
10 pages
Parallel Comp Point Main
No ratings yet
Parallel Comp Point Main
18 pages
Introduction To OpenACC Course 20161026 1550 1
No ratings yet
Introduction To OpenACC Course 20161026 1550 1
68 pages
Informe 3D Reconstruction of Confocal Microscope Images
No ratings yet
Informe 3D Reconstruction of Confocal Microscope Images
190 pages
C Programming Wizardry: From Zero to Hero in 10 Days: Programming Prodigy: From Novice to Virtuoso in 10 Days
From Everand
C Programming Wizardry: From Zero to Hero in 10 Days: Programming Prodigy: From Novice to Virtuoso in 10 Days
kok keong teo
No ratings yet
Mahendrababu Resume
No ratings yet
Mahendrababu Resume
5 pages
OpenACC 2
No ratings yet
OpenACC 2
44 pages
The Comparison of CPU Time Consumption For PDF
No ratings yet
The Comparison of CPU Time Consumption For PDF
4 pages
2d Computer Graphics in Modern C and Standard Library Compress
No ratings yet
2d Computer Graphics in Modern C and Standard Library Compress
466 pages
icl-utk-1031-2017
No ratings yet
icl-utk-1031-2017
45 pages
Gttse 07
No ratings yet
Gttse 07
68 pages
Parallel Algorithm and Programming
No ratings yet
Parallel Algorithm and Programming
4 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
Begin Parallel Programming With OpenMP - CodeProject
No ratings yet
Begin Parallel Programming With OpenMP - CodeProject
8 pages
Data-Oriented Design and C++ - Mike Acton - CppCon 2014
No ratings yet
Data-Oriented Design and C++ - Mike Acton - CppCon 2014
201 pages
GPU_Programming_on_Windows_11
No ratings yet
GPU_Programming_on_Windows_11
176 pages
A Modern C++ Point of View of Programming in Image Processing - 2022
No ratings yet
A Modern C++ Point of View of Programming in Image Processing - 2022
14 pages
Internship, Software Engineer
No ratings yet
Internship, Software Engineer
10 pages
Clusterguide-V3 0
No ratings yet
Clusterguide-V3 0
80 pages
Mergesort 1
No ratings yet
Mergesort 1
11 pages
14 Parallel Algorithms CUDA Basics s20
No ratings yet
14 Parallel Algorithms CUDA Basics s20
89 pages
Lecture Week - 3 Amdahl Law 1
No ratings yet
Lecture Week - 3 Amdahl Law 1
19 pages
Compilers: Tools For Scientists and Engineers
No ratings yet
Compilers: Tools For Scientists and Engineers
42 pages
High Performance Computing For Computational Mechanics: ISCM-10
No ratings yet
High Performance Computing For Computational Mechanics: ISCM-10
63 pages
Parallel Programming
No ratings yet
Parallel Programming
42 pages
CS 133 Parallel & Distributed Computing: Course Instructor: Adam Kaplan Lecture #1: 4/2/2012
No ratings yet
CS 133 Parallel & Distributed Computing: Course Instructor: Adam Kaplan Lecture #1: 4/2/2012
22 pages
vcl_manual
No ratings yet
vcl_manual
96 pages
Weekly-Notes
No ratings yet
Weekly-Notes
5 pages
phase-1-project-proposal
No ratings yet
phase-1-project-proposal
4 pages
MLP_vs_CNN_Comparison
No ratings yet
MLP_vs_CNN_Comparison
1 page
ai-specialist-web-developer - Copy
No ratings yet
ai-specialist-web-developer - Copy
2 pages
Resumefile
No ratings yet
Resumefile
3 pages
Demba Sow Web Developer2
No ratings yet
Demba Sow Web Developer2
1 page
ESL Curriculum Lesson 1, Level 1 Interactive Worksheet Answers
No ratings yet
ESL Curriculum Lesson 1, Level 1 Interactive Worksheet Answers
3 pages
A Method To Improve Interest Point Detection and Its Gpu Implementation
No ratings yet
A Method To Improve Interest Point Detection and Its Gpu Implementation
60 pages
1.an Analytical Approach For Enhancing The Automatic Detection & Recognition of Skewed Bangla LP
No ratings yet
1.an Analytical Approach For Enhancing The Automatic Detection & Recognition of Skewed Bangla LP
4 pages
6 2c. Corner Detection 14-08-2024
100% (1)
6 2c. Corner Detection 14-08-2024
36 pages
CSE 185 Introduction To Computer Vision: Local Invariant Features
No ratings yet
CSE 185 Introduction To Computer Vision: Local Invariant Features
57 pages
Architecture of Computing Systems ARCS 2014 27th International Conference Lübeck Germany February 25 28 2014 Proceedings 1st Edition Erik Maehle download pdf
100% (3)
Architecture of Computing Systems ARCS 2014 27th International Conference Lübeck Germany February 25 28 2014 Proceedings 1st Edition Erik Maehle download pdf
55 pages
Notes On The Harris Detector: From Rick Szeliski's Lecture Notes, CSE576, Spring 05
No ratings yet
Notes On The Harris Detector: From Rick Szeliski's Lecture Notes, CSE576, Spring 05
12 pages
Harris Corner Detection - TheAILearner
No ratings yet
Harris Corner Detection - TheAILearner
6 pages
Feature Detection - Overview of Harris Corner Feature Detection
No ratings yet
Feature Detection - Overview of Harris Corner Feature Detection
9 pages
Corner Detector in Computer Vision
No ratings yet
Corner Detector in Computer Vision
57 pages
OpenCV - Cheatsheet
100% (1)
OpenCV - Cheatsheet
12 pages
Li Rui Chek 2014
No ratings yet
Li Rui Chek 2014
25 pages
Feature Descriptors: Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab
No ratings yet
Feature Descriptors: Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab
88 pages
Harris Detector
No ratings yet
Harris Detector
23 pages
CS4670: Computer Vision: Lecture 5: Feature Detection and Matching
No ratings yet
CS4670: Computer Vision: Lecture 5: Feature Detection and Matching
46 pages
Automated High Precision Optical Tracking of Flying Objects: Master Thesis
No ratings yet
Automated High Precision Optical Tracking of Flying Objects: Master Thesis
45 pages
Automatic Object Detection and Dimensional Measurement of Object Using Image Processing
No ratings yet
Automatic Object Detection and Dimensional Measurement of Object Using Image Processing
3 pages
Av 1 Feature From Accelerated Segment Test
100% (1)
Av 1 Feature From Accelerated Segment Test
5 pages
Measuring Fiber Orientations in Nonwoven Web Images Using Corner Detection by Be Zier Fitting Curves
No ratings yet
Measuring Fiber Orientations in Nonwoven Web Images Using Corner Detection by Be Zier Fitting Curves
12 pages
Rizvi College of Engineering: Project Synopsis Report
No ratings yet
Rizvi College of Engineering: Project Synopsis Report
17 pages
Area-Time Efficient Streaming Architecture For FAST and BRIEF Detector
No ratings yet
Area-Time Efficient Streaming Architecture For FAST and BRIEF Detector
6 pages
Quantum Image Processing
No ratings yet
Quantum Image Processing
10 pages
Auto-Adaptive Harris Corner Detection Algorithm Based On Block Processing
No ratings yet
Auto-Adaptive Harris Corner Detection Algorithm Based On Block Processing
4 pages
Introduction To Feature Detection and Matching - by Deepanshu Tyagi - Medium
No ratings yet
Introduction To Feature Detection and Matching - by Deepanshu Tyagi - Medium
15 pages
9 Vision Lec 6
No ratings yet
9 Vision Lec 6
58 pages
OpenCV Python Course_Updated
No ratings yet
OpenCV Python Course_Updated
182 pages
Potential-of-SIFT-SURF-KAZE-AKAZE-ORB-BRISK-AGAST-and-7-More-Algorithms-for-Matching-Extremely-Variant-Image-Pairs
No ratings yet
Potential-of-SIFT-SURF-KAZE-AKAZE-ORB-BRISK-AGAST-and-7-More-Algorithms-for-Matching-Extremely-Variant-Image-Pairs
8 pages
ECT386 - Ktu Qbank
No ratings yet
ECT386 - Ktu Qbank
10 pages
CV GTU ANSWERS
No ratings yet
CV GTU ANSWERS
56 pages
Localized Feature Extraction
No ratings yet
Localized Feature Extraction
6 pages
Introduction FPCV-0-1
No ratings yet
Introduction FPCV-0-1
31 pages

Using-Modern-Cpp-Techniques-To-Enhance-Multicore-Optimizations - Das's Edution

Uploaded by

Using-Modern-Cpp-Techniques-To-Enhance-Multicore-Optimizations - Das's Edution

Uploaded by

Using Modern C++ Techniques to Enhance Multicore Optimizations

Garret Romaine ([email protected])

Using Modern C++ Techniques to Enhance Multicore Optimizations

Getting Started with Parallel Processing

To begin exploring the Image Filters project, follow these steps:

Using Modern C++ Techniques to Enhance Multicore Optimizations

The following files display in the project solutions bin directory:

Using Modern C++ Techniques to Enhance Multicore Optimizations

#pragma omp parallel

#pragma omp parallel for if(openMP)

Using Modern C++ Techniques to Enhance Multicore Optimizations

Figure 3: Sample code for setting up parallel processing for BoxBlur.cpp.

#pragma omp parallel for if(openMP)

Using Modern C++ Techniques to Enhance Multicore Optimizations

// Delete the allocated memory for the temporary grayscale image

Figure 4: Parallelism structure for SobelEdgeDetector.cpp.

// Creating a temporary memory to keep the Grayscale picture

Using Modern C++ Techniques to Enhance Multicore Optimizations

Using Modern C++ Techniques to Enhance Multicore Optimizations

Using Modern C++ Techniques to Enhance Multicore Optimizations

Box Blur Filter

Using Modern C++ Techniques to Enhance Multicore Optimizations

using namespace std;

Using Modern C++ Techniques to Enhance Multicore Optimizations

Figure 9: Beginning of the BoxBlur.cpp file.

Gaussian Blur Filter

Using Modern C++ Techniques to Enhance Multicore Optimizations

Figure 10: Threshold filtering example.

Sobel Edge Detection

Figure 11: Sobel edge detection relies on two different kernels.

Using Modern C++ Techniques to Enhance Multicore Optimizations

Harris Corner Detector

These three values are stored in a 2 x 2 matrix A:

Using Modern C++ Techniques to Enhance Multicore Optimizations

Shi-Tomasi Corner Detector

Using Modern C++ Techniques to Enhance Multicore Optimizations

Appendix A: About ImageFiltersWPF

Using Modern C++ Techniques to Enhance Multicore Optimizations

Intel® Many Integrated Core (Intel® MIC) Architecture: https://software.intel.com/en-us/node/522480

Using Modern C++ Techniques to Enhance Multicore Optimizations

Using Modern C++ Techniques to Enhance Multicore Optimizations

You might also like