0% found this document useful (0 votes)

78 views3 pages

Clenqueuereadbuffer (Queue, C - Buffer,, 0, N, C, 0, ,)

This document discusses measuring the performance of OpenCL vector addition code. It implements vector addition using OpenCL and measures the time taken for memory transfers between the host and device for different array lengths. The results show that memory transfer time varies between around 1 and 2.5 microseconds for array lengths between 1024 and 67108864 elements. The conclusion suggests this provides familiarization with OpenCL and measuring the impact of work group size on performance.

Uploaded by

Samuele Tesfaye

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views3 pages

Clenqueuereadbuffer (Queue, C - Buffer,, 0, N, C, 0, ,)

Uploaded by

Samuele Tesfaye

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Objective

The objective of this assignment is to get familiarized with OpenCL.

Problem statement

1. Measuring the benefit of using OPenCL

1. Measuring the impact of work group size on performance

Methodology

By implementing the vector addition code given on the lecture slide the following results are found.

To calculate the time it takes to complete memory copy from host to device I measured the time of the
clEnqueueReadBuffer (queue, c_buffer, CL_TRUE, 0, N * sizeof (cl_float), c, 0, NULL,
NULL);

Result and discussion

The following result is obtained by using the average of running the code multiple times.

Task Array length Time

the time it takes to complete 1024 1.08 microsec

memory copy from device to host
2048 1.1 microsec
4096 0.95 microsec
8192 1.016667 microsec
16384 1.016667 microsec
32768 2.55 microsec
65536 1.133333 microsec
131072 1.183333 microsec
262144 1.883333 microsec
67108864 2.53333 microsec

Conclusion and suggestion

Appendix A
// Assignment4.cpp : This file contains the 'main' function. Program execution begins and
ends there.
#include<CL\cl.h>
#include<stdio.h>
#include <stdlib.h>
#include <tchar.h>
#include <memory.h>
#include <windows.h>
#include "CL\cl_ext.h"
#include "utils.h"
#include <assert.h>
#include<iostream>
#include<chrono>
#include<ctime>
using namespace std::chrono;
using namespace std;
//====

const char* source =

"_ _kernel void vec_add (_ _global const float *a,\n"

"_ _global const float *b,\n"
"_ _ global float *c) \n"
"{ \n"
" int gid = get_global_id(0); \n"
"c[gid]=a[gid]+b[gid];\n"
"}\n";

//=====

void main() {
chrono::time_point<std::chrono::system_clock> start, end;

int N = 67108864;//array length

cl_platform_id platform;
clGetPlatformIDs(1, &platform, NULL);
cl_device_id device;
clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL);
cl_context context = clCreateContext(0, 1, &device, NULL, NULL, NULL);
cl_command_queue queue = clCreateCommandQueue(context, device,
CL_QUEUE_PROFILING_ENABLE, 0);

cl_program program = clCreateProgramWithSource(context, 1, &source, NULL, NULL);

clBuildProgram(program, 1, &device, NULL, NULL, NULL);

start = std::chrono::system_clock::now();
cl_kernel kernel = clCreateKernel(program, "vec_add", NULL);
cl_float* a = (cl_float*)malloc(N * sizeof(cl_float));
cl_float* b = (cl_float*)malloc(N * sizeof(cl_float));

int i;
for (i = 0; i < N; i++) {
a[i] = i;
b[i] = N - i;
}
cl_mem a_buffer = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
N * sizeof(cl_float), a, NULL);//buffer object read only for kernel copy data from memory
referenced
cl_mem b_buffer = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
N * sizeof(cl_float), b, NULL);
cl_mem c_buffer = clCreateBuffer(context, CL_MEM_WRITE_ONLY |
CL_MEM_COPY_HOST_PTR, N * sizeof(cl_float), NULL, NULL);
size_t global_work_size = N;

clSetKernelArg(kernel, 0, sizeof(a_buffer), (void*)&a_buffer);

clSetKernelArg(kernel, 1, sizeof(b_buffer), (void*)&b_buffer);
clSetKernelArg(kernel, 2, sizeof(a_buffer), (void*)&c_buffer);
cl_event event;
clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &global_work_size, NULL, 0, NULL,
&event);
clWaitForEvents(1, &event);
clFinish(queue);

cl_ulong time_start;
cl_ulong time_end;
clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_START, sizeof(time_start),
&time_start, NULL);
clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_END, sizeof(time_end),
&time_end, NULL);
cout << "hello";
cout << time_start;
cout << time_end;
double nanoSeconds = time_end - time_start;
cout<< nanoSeconds / 1000000.0;

cl_float* c = (cl_float)malloc(N sizeof(cl_float));

//read from a buffer object from device to host memory
clEnqueueReadBuffer(queue, c_buffer, CL_TRUE, 0, N * sizeof(cl_float), c, 0, NULL,
NULL);

free(a);
free(b);
free(c);
clReleaseMemObject(a_buffer);
clReleaseMemObject(b_buffer);
clReleaseMemObject(c_buffer);
clReleaseKernel(kernel);
clReleaseProgram(program);
clReleaseContext(context);
clReleaseCommandQueue(queue);
end = chrono::system_clock::now();
time_t end_time = std::chrono::system_clock::to_time_t(end);
chrono::duration<double> elapsed_seconds = end - start;
cout << "elapsed time: " << elapsed_seconds.count() << " sec\n";

system("pause");
}

Cohen's Pathways of The Pulp Expert Consult 12th Edition Kenneth M. Hargreaves - Ebook PDFPDF Download
100% (4)
Cohen's Pathways of The Pulp Expert Consult 12th Edition Kenneth M. Hargreaves - Ebook PDFPDF Download
44 pages
Deep Learning Based Multi Modal Fusion Architecture For Maritime Vessel Detection
No ratings yet
Deep Learning Based Multi Modal Fusion Architecture For Maritime Vessel Detection
17 pages
The Great Book of Best Quotes of All Time. - Original
100% (3)
The Great Book of Best Quotes of All Time. - Original
204 pages
Multifunction Radar Simulator (MFRSIM) : Defence R&D Canada - Ottawa
No ratings yet
Multifunction Radar Simulator (MFRSIM) : Defence R&D Canada - Ottawa
55 pages
An Introduction To Multisensor Data Fusion
No ratings yet
An Introduction To Multisensor Data Fusion
18 pages
Overpressure Mod-1
No ratings yet
Overpressure Mod-1
51 pages
Deep Learning For Sensor Fusion
No ratings yet
Deep Learning For Sensor Fusion
171 pages
Structure 2
No ratings yet
Structure 2
267 pages
EN - BioMajesty 6010 - C
100% (1)
EN - BioMajesty 6010 - C
2 pages
2300 Vibration Datasheet 105m0340p
No ratings yet
2300 Vibration Datasheet 105m0340p
16 pages
Infrared and Visible Image Fusion Using A Deep Learning Framework
No ratings yet
Infrared and Visible Image Fusion Using A Deep Learning Framework
6 pages
Chemicals Zetag DATA Beads Magnafloc 156 - 0410
No ratings yet
Chemicals Zetag DATA Beads Magnafloc 156 - 0410
2 pages
High-Performance Tomographic Reconstruction Using OpenCL
No ratings yet
High-Performance Tomographic Reconstruction Using OpenCL
99 pages
Multimodal Data Fusion Anoverview of Methods
No ratings yet
Multimodal Data Fusion Anoverview of Methods
29 pages
Germanic Grammar
No ratings yet
Germanic Grammar
16 pages
Brand Building Strategies For Toothbrush - A Case Study With Colgate
No ratings yet
Brand Building Strategies For Toothbrush - A Case Study With Colgate
13 pages
Parallel Programming in Opencl: Advanced Graphics & Image Processing
No ratings yet
Parallel Programming in Opencl: Advanced Graphics & Image Processing
31 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
20 pages
Basement - Construction - CT 3100 PDF
No ratings yet
Basement - Construction - CT 3100 PDF
21 pages
Rood Lighting BOQ (AMBO GUDER)
No ratings yet
Rood Lighting BOQ (AMBO GUDER)
1 page
Sindhu Rudianto - PDF - Wiratman Wangsadinata .PDF - Ellen M. Rathje - Makalah
No ratings yet
Sindhu Rudianto - PDF - Wiratman Wangsadinata .PDF - Ellen M. Rathje - Makalah
10 pages
The Design and Simulation of An S-Band Circularly Polarized Microstrip Antenna Array
No ratings yet
The Design and Simulation of An S-Band Circularly Polarized Microstrip Antenna Array
5 pages
Operating System Practical Assessment C0CSC09 Os Lab File
No ratings yet
Operating System Practical Assessment C0CSC09 Os Lab File
23 pages
Department of Computer Engineering BE Laboratory Practice-I A.Y 2021-22 SEM1
No ratings yet
Department of Computer Engineering BE Laboratory Practice-I A.Y 2021-22 SEM1
45 pages
Opencl: These Notes Will Introduce Opencl
No ratings yet
Opencl: These Notes Will Introduce Opencl
34 pages
Oriental College of Technology: Department of Computer Science & Engineering
No ratings yet
Oriental College of Technology: Department of Computer Science & Engineering
38 pages
My Experiments: Opencl Gpu Matrix Multiplication Program
No ratings yet
My Experiments: Opencl Gpu Matrix Multiplication Program
19 pages
Maths p1 2021 g12 Solutions
No ratings yet
Maths p1 2021 g12 Solutions
5 pages
Embedded Linux - Gse5 Lab5 - Introduction To Opencl: Barriga Ponce de Leon Ricardo Guo Ran
No ratings yet
Embedded Linux - Gse5 Lab5 - Introduction To Opencl: Barriga Ponce de Leon Ricardo Guo Ran
8 pages
State Common Entrance Test Cell, Mumbai
No ratings yet
State Common Entrance Test Cell, Mumbai
5 pages
Shayri
No ratings yet
Shayri
15 pages
Spos-1 Sbpcoe
No ratings yet
Spos-1 Sbpcoe
19 pages
Canon 750d Specifications
No ratings yet
Canon 750d Specifications
6 pages
Design of Hydraulic Structures 711
No ratings yet
Design of Hydraulic Structures 711
2 pages
Run Loop Embedded
No ratings yet
Run Loop Embedded
5 pages
Academic Vacancies
No ratings yet
Academic Vacancies
5 pages
Flange Bolting Torque Values API 6A
100% (1)
Flange Bolting Torque Values API 6A
1 page
A Jump Start To Opencl: March 15, 2009 Cis 565/665 - Gpu Computing and Architecture
No ratings yet
A Jump Start To Opencl: March 15, 2009 Cis 565/665 - Gpu Computing and Architecture
74 pages
Web GPU
0% (1)
Web GPU
40 pages
July 1993 - The Alpha-Beta Filter
100% (1)
July 1993 - The Alpha-Beta Filter
8 pages
OS Practical File
No ratings yet
OS Practical File
48 pages
2 Cache Complexity
No ratings yet
2 Cache Complexity
100 pages
Opencl On Fpga: Marc Gaucheron INTEL Programmable Solution Group
No ratings yet
Opencl On Fpga: Marc Gaucheron INTEL Programmable Solution Group
128 pages
Assignment 5 - OpenCL Optimizations
100% (1)
Assignment 5 - OpenCL Optimizations
2 pages
AccountFullStatement Asif Raza
No ratings yet
AccountFullStatement Asif Raza
4 pages
HPC Practicals
No ratings yet
HPC Practicals
26 pages
Os 2020UIT3063
No ratings yet
Os 2020UIT3063
42 pages
Rencana Daftar Kebutuhan Alat & Consumable: No Uraian Jumlah Satuan Alat Berat
No ratings yet
Rencana Daftar Kebutuhan Alat & Consumable: No Uraian Jumlah Satuan Alat Berat
1 page
Lab 1 Intro To High Performance Computing
No ratings yet
Lab 1 Intro To High Performance Computing
8 pages
OpenCL Guide
No ratings yet
OpenCL Guide
19 pages
Input: Output: 1. Sub String Program
No ratings yet
Input: Output: 1. Sub String Program
8 pages
CS-3006 7 UsingOpenCL DataParallelProgramming
No ratings yet
CS-3006 7 UsingOpenCL DataParallelProgramming
80 pages
Ahmad Yazid Rozaan - TUGAS EAS PAPER
No ratings yet
Ahmad Yazid Rozaan - TUGAS EAS PAPER
5 pages
MODULE 8 9 and ASS.
No ratings yet
MODULE 8 9 and ASS.
23 pages
G80 Cuda
No ratings yet
G80 Cuda
25 pages
Cuda
No ratings yet
Cuda
7 pages
BECOA157 Parallel Matrix Multiplication
No ratings yet
BECOA157 Parallel Matrix Multiplication
3 pages
20 Quiz 14
No ratings yet
20 Quiz 14
12 pages
LP 1,,1
No ratings yet
LP 1,,1
5 pages
00766874
No ratings yet
00766874
8 pages
Lab 7
No ratings yet
Lab 7
3 pages
HPC Int2 Key
No ratings yet
HPC Int2 Key
10 pages
CUDA Exercises
No ratings yet
CUDA Exercises
185 pages
Vector Addition
No ratings yet
Vector Addition
3 pages
Chap9 - CUDA Optimization
No ratings yet
Chap9 - CUDA Optimization
73 pages
Final Assingmentpdfcse 373
No ratings yet
Final Assingmentpdfcse 373
5 pages
Ass Parallel
No ratings yet
Ass Parallel
11 pages
278 hw5
No ratings yet
278 hw5
20 pages
Project Ligtas Mag Aaral
No ratings yet
Project Ligtas Mag Aaral
4 pages
HPC Codes-2
No ratings yet
HPC Codes-2
15 pages
Rishi
No ratings yet
Rishi
30 pages
Osprac 43
No ratings yet
Osprac 43
25 pages
作业2
No ratings yet
作业2
5 pages
Migdalskiy Sergiy Physics Optimization Strategies
No ratings yet
Migdalskiy Sergiy Physics Optimization Strategies
104 pages
CENG443 2023 Final
No ratings yet
CENG443 2023 Final
4 pages
ST - Theressainstituteofengg.&Tech Avanthi'S Theressainstituteofengg.&Tech Theressainstituteofengg.&Tech
No ratings yet
ST - Theressainstituteofengg.&Tech Avanthi'S Theressainstituteofengg.&Tech Theressainstituteofengg.&Tech
52 pages
CG
No ratings yet
CG
28 pages
L06 GPGPU CUDA Programming 1
No ratings yet
L06 GPGPU CUDA Programming 1
23 pages
PDC Assignment
No ratings yet
PDC Assignment
9 pages
CM
No ratings yet
CM
5 pages
Cuda
No ratings yet
Cuda
4 pages
HPC Codes
No ratings yet
HPC Codes
14 pages
3 Cuda
No ratings yet
3 Cuda
5 pages
Marketing Analytics
No ratings yet
Marketing Analytics
9 pages
Lab7 GPU
No ratings yet
Lab7 GPU
10 pages
Subsea Field Architecture Types - Evaluation & Comparison Made Easy in SFACE - SFACE
No ratings yet
Subsea Field Architecture Types - Evaluation & Comparison Made Easy in SFACE - SFACE
7 pages
HPC Codes
No ratings yet
HPC Codes
18 pages
HW 2
No ratings yet
HW 2
12 pages
Addition Cuda
No ratings yet
Addition Cuda
2 pages
Assignment 04
No ratings yet
Assignment 04
16 pages
Lab Experiment 6
No ratings yet
Lab Experiment 6
4 pages
Data Parallelism, Task Parallelism, CPU, GPU
No ratings yet
Data Parallelism, Task Parallelism, CPU, GPU
13 pages
Data Parallelism, Task Parallelism, CPU, GPU
No ratings yet
Data Parallelism, Task Parallelism, CPU, GPU
13 pages
Opencl 1 1 Quick Reference Card
No ratings yet
Opencl 1 1 Quick Reference Card
8 pages
Calalang vs. Williams
No ratings yet
Calalang vs. Williams
6 pages
Projects With Microcontrollers And PICC
From Everand
Projects With Microcontrollers And PICC
Guillermo Perez Guillen
5/5 (1)

Clenqueuereadbuffer (Queue, C - Buffer,, 0, N, C, 0, ,)

Uploaded by

Clenqueuereadbuffer (Queue, C - Buffer,, 0, N, C, 0, ,)

Uploaded by

Objective

The objective of this assignment is to get familiarized with OpenCL.

1. Measuring the benefit of using OPenCL

1. Measuring the impact of work group size on performance

Result and discussion

Task Array length Time

the time it takes to complete 1024 1.08 microsec

Conclusion and suggestion

const char* source =

"_ _kernel void vec_add (_ _global const float *a,\n"

int N = 67108864;//array length

cl_program program = clCreateProgramWithSource(context, 1, &source, NULL, NULL);

clBuildProgram(program, 1, &device, NULL, NULL, NULL);

clSetKernelArg(kernel, 0, sizeof(a_buffer), (void*)&a_buffer);

cl_float* c = (cl_float*)malloc(N * sizeof(cl_float));

You might also like

cl_float* c = (cl_float)malloc(N sizeof(cl_float));