GPU Programming EE 4702-1 Final Examination: Exam Total
GPU Programming EE 4702-1 Final Examination: Exam Total
GPU Programming
EE 4702-1
Final Examination
Tuesday, 4 December 2018 12:30–14:30 CST
Good Luck!
Problem 1: [15 pts] Appearing below is a geometric figure.
y z=3
x
2 4 6
(a) Complete the individual-triangle rendering pass below so that it renders the figure with all triangles
facing in the positive z direction and without overlapping triangles. Use the provided abbreviation glV.
Note: The part about facing the +z direction was not in the original exam.
glEnd();
(b) Complete the triangle-strip rendering pass below so that it renders the figure.
glEnd();
2
Problem 2: [30 pts] Appearing below is shortened host and shader code based on Homework 3, in which
text was drawn on the triangular spiral. One drawback of this code is that it uses old-fashioned, deprecated,
inefficient glVertex calls. On the following pages are routines that will implement a more efficient version
of this code in which data such as ctr are placed in buffer objects and a rendering pass is performed for the
entire chain, not just one triangular spiral.
if ( opt_shader == SO_HW03 ) {
glBegin(GL_TRIANGLE_STRIP); // Render spiral using 1 triangle strip.
glNormal3fv(n);
glTexCoord2f(tex_x,0); glVertex3fv(p + vz);
glTexCoord2f(tex_x,1); glVertex3fv(p - vz);
pprev = p;
}
glEnd();
} }
void vs_main_hw03() { // Vertex Shader Code
gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;
vertex_e = gl_ModelViewMatrix * gl_Vertex;
normal_e = normalize(gl_NormalMatrix * gl_Normal);
tex_coord = gl_MultiTexCoord0.xy; }
(a) But first let c denote the value of chain_length and n denote the value of opt_n_segs. Determine
the amount of data, in bytes, sent from the CPU to the GPU for the code shown above. (Do not consider
uniforms and other hidden code.)
(b) In the vertex shader above, label variables with the appropriate letter as requested below.
Label uniform variables with a U, shader input variables with an I, and fixed-function shader
outputs with an O.
3
Problem 2, continued: Appearing below is the improved triangular spiral host code and shaders. The
host code prepares four buffer objects and then starts an instanced rendering pass with line strips as the
input primitive. The rendering pass uses a vertex shader and a geometry shader to complete the primitives.
(The fragment shader is not a part of this problem.) The only inputs to the vertex shader are the vertex
and instance IDs, which should be used to retrieve or compute information about the spiral segments.
(c) Modify the vertex and geometry shaders on the following pages to efficiently render the triangle spirals,
make any needed changes to the interface blocks, but do not modify or assume modifications to the host
code (above). For your convenience each shader contains a copy of the host code. Cross out or modify that
code as needed. Use the handy abbreviations at the top of the page.
#ifdef _VERTEX_SHADER_
out Data { // Out of Vertex Shader to Geometry Shader
int ins_id, vtx_id;
// Add any needed declarations here.
};
#endif
#ifdef _GEOMETRY_SHADER_
in Data { // In to Geometry Shader from Vertex Shader
int ins_id, vtx_id;
// Add any needed declarations here.
} In[2];
4
Problem 2, continued: Vertex shader code on this page.
Cross out unneeded code. Cross out data type for shader outputs. Avoid redundant computation.
int j = gl_VertexID;
float delta_a = 0.6 / opt_n_segs;
vec4 distv = distv_a[ins_id];
vec3 vz = vz_a[ins_id].xyz;
float tex_scale = 0.2;
vec3 v = va_a[ins_id][j%3].xyz;
vec3 p = ctr_a[ins_id].xyz + j * delta_a * v;
vec3 pprev = vec3(0,0,0); // PLACEHOLDER. Won’t work.
vec3 n = cross( p - pprev, vz );
float tex_x = total_len_compute(j,delta_a,distv) * tex_scale;
5
Problem 2, continued: Geometry shader code on this page.
Cross out unneeded code. Cross out data type for shader outputs. Avoid redundant computation.
vec3 v = va_a[ins_id][j%3].xyz;
vec3 p = ctr_a[ins_id].xyz + j * delta_a * v;
vec3 pprev = vec3(0,0,0); // PLACEHOLDER. Won’t work.
vec3 n = cross( p - pprev, vz );
float tex_x = total_len_compute(j,delta_a,distv) * tex_scale;
6
Problem 3: [15 pts] Answer each CUDA question below.
(a) Both CUDA kernels below do the same thing, but one will execute much less efficiently. Explain why in
terms of the minimum request size.
__global__ void kmain_simple(float4 *d_in, float *d_out) {
const int tid = threadIdx.x + blockIdx.x * blockDim.x;
const int elt_per_thread = ( d_app.array_size + d_app.num_threads - 1 ) / d_app.num_threads;
const int start = elt_per_thread * tid;
const int stop = start + elt_per_thread;
What is the maximum request size that will avoid this inefficiency? Explain.
(b) A CUDA kernel is to run on a GPU with 8 SMs (MPs). Configuration A consists of 4 blocks of 32 threads
each. Configuration B consists of 8 blocks of 16 threads each. Neither is very good. Explain how each one
underutilizes the hardware on current NVIDIA GPUs.
7
Problem 4: [40 pts] Answer each question below.
(a) The screenshot below is the Homework 3 triangular spiral with 37 segments per spiral. Imagine a spiral
with even more segments, say 1000.
With a large number of segments there will usually be a large computational load on both the vertex and
fragment shaders. In one of these shaders the computational load can be considered wasted, depending on
the eye location, even when the spiral is visible. In which shader is computation wasted, and why.
Sketch two views in which the spiral is visible. In one the computational load is high and mostly wasted. In
the other the computational load is lower and not wasted.
View with lower load and little waste. View with high load and waste.
8
(b) The OpenGL call glColor is used to specify a color, say purple glColor3f(1,0,1);. In typical use is
that the color that will be written to the frame buffer? Explain.
Are the arguments to glColor the exact color to be written to the frame buffer? Explain.
(c) Describe the difference between the flat, noperspective, and smooth interpolation qualifiers.
The flat qualifier’s feature, the noperspective qualifier’s, feature, the smooth qualifier’s, feature.
(d) Consider a rendering pass using triangles as the primitive. OpenGL (compatibily profile) allows one to
specify a normal for each triangle vertex, but as we all know a triangle, geometrically, has just one normal.
Why would one specify different normals for each vertex? Explain how such normals are chosen.
Why might one choose different normals for each vertex of a triangle?
9
(e) Describe what the inputs to the rasterization stage are, what the rasterization stage does, and what its
outputs are.
(f) The true-sphere shader used in class rendered spheres perfectly. Would it make sense to use a similar
approach to write a true-cube shader that can perfectly render a cube?
(g) The unlabeled diagram below shows how shadow volumes can be used to render shadows. On the diagram
show the location of the eye and light source, fragment(s) found to be in the shade, and fragment(s) found
to be illuminated.
10