GPU Programming LSU EE 4702-1 Solve-Home Final Examination: Name Solution
GPU Programming LSU EE 4702-1 Solve-Home Final Examination: Name Solution
GPU Programming
LSU EE 4702-1
Solve-Home Final Examination
Wednesday 9 December to Friday 11 December 2020 16:30 CST
Work on this exam alone. Regular class resources, such as notes, papers,
solutions, documentation, and code, can be used to find solutions. In
addition outside OpenGL references and tutorials, other programming
resources, and mathematical references can be consulted. Do not try
to seek out references that specifically answer any question here. Do
not discuss this exam with classmates or anyone else, except questions
or concerns about problems should be directed to Dr. Koppelman.
Warning: Unlike homework assignments collaboration is not allowed
on exams. Suspected copying will be reported to the dean of students.
The kind of copying on a homework assignment that would result in a
comment like “See ee4702xx for grading comments” will be reported if
it occurs on an exam. Please do not take advantage of pandemic-forced
test conditions to cheat!
Good Luck!
Problem 1: [30 pts] The geometry shader below is based on the solution to the pre-final exam Problem 3.
The geometry shader renders a prism centered on the triangle with a hole cut in it. The prism is also called
a monolith in a playful reference to a recent news story. Notice that there is a vec2 array, pts, declared
in the shader. This array has the coordinates of the points on the pointy p from Pre-Final Exam Problem
1. The points are in the pointy p’s local coordinate space and are in the right order to render the p as a
triangle strip.
Modify the shader to render the p’s where the monolith faces would go.
The solution is on the next page.
2
void gs_mono_p() {
vec4 ctr_ce = AVG(vertex_e); // Coord at triangle center.
vec3 ctr_ne = AVG(normal_e); // Normal at triangle center.
// Render the triangle-with-a-hole using a triangle strip that wraps around hole.
for ( int ii=0; ii<=3; ii++ ) {
int i = ii % 3;
normal_e = f * In[i].normal_e + (1-f) * ctr_ne;
vertex_e = f * In[i].vertex_e + (1-f) * ctr_ce;
gl_Position = gl_ProjectionMatrix * vertex_e;
EmitVertex(); // Emit inner triangle vertex.
normal_e = In[i].normal_e;
vertex_e = In[i].vertex_e;
gl_Position = In[i].gl_Position;
EmitVertex(); // Emit original triangle vertex.
}
EndPrimitive();
// Compute and save the coordinates of the top of the monolith, and surface normals.
vec3 vtx_e[3], snorm_e[3];
for ( int i=0; i<3; i++ ) {
snorm_e[i] = size * normalize( f * In[i].normal_e + (1-f) * ctr_ne );
vtx_e[i] = f * In[i].vertex_e.xyz + (1-f) * ctr_ce.xyz; }
3
for ( int j=0; j<4; j++ ) {
vertex_e.xyz = va[j];
gl_Position = gl_ProjectionMatrix * vertex_e;
EmitVertex();
}
EndPrimitive();
}
}
4
Problem 2: [15 pts] The vertex and geometry shaders below are used in rendering pipeline T , in which the
input primitives are individual triangles, and rendering pipeline S, in which the input primitive is a triangle
strip. In both cases the shaders work correctly, but there might be differences in performance.
(a) Modify the vertex and geometry shader to reduce the amount of data sent from the vertex shader to the
geometry shader. Do so by moving some of the work performed by the vertex shader to the geometry shader.
If necessary, declare new input and output variables. For this part the shaders will be run on pipeline S. Do
not make changes that result in additional computation unless those changes also reduce vertex-to-geometry
shader data transfer. Of course, your changes should not change what the shaders do.
5
(b) The requirement to not make changes that result in additional computation unless those changes also
reduce vertex-to-geometry shader data transfer is much easier to comply with when the shaders are run on
pipeline T . Explain why and include an example of such a change. The change should not change what the
shaders do and should be useful. (That is, move something.)
Why is it easier to avoid additional computation when the shaders are used in pipeline T than it is when
the same shaders are used in pipeline S? Note: Don’t compare the absolute performance of S to T ,
compare how much the change impacts computation on each pipeline.
It is easier to avoid additional computation in pipeline T because there is one execution of the i-loop body for each vertex shader
invocation and so moving a computation from the vertex shader to the i loop in the geometry shader does not change the amount
of computation (all other things being equal).
If the shaders were used with pipeline S then for each execution of the vertex shader (in all but two cases) there would be three
executions of the i-loop body (each in a different geometry shader invocation).
Provide an example.
For example, consider a rectangle rendered as 10 triangles. (Yes, 2 would suffice, but suppose there were 10.) In pipeline T the
vertex shader would be executed 10 × 3 = 30 times and the geometry shader would be executed 10 times. In each execution of the
geometry shader the i loop executes 3 iterations, and so the loop body executes a total of 10 × 3 = 30 times. In the solution to
the previous part the a matrix/vector multiply is eliminated from the vertex shader and one is added to the i loop in the geometry
shader. This move does not change the total number of matrix/vector multiplications.
If that same 10-triangle rectangle were rendered in S only 10 + 2 = 12 vertices would be sent in to the rendering pipeline and
so the vertex shader would execute only 12 times. The geometry shader would still execute 10 times and the i loop body would
still execute 30 times. So moving the matrix/vector multiplication from the vertex shader to the geometry shader would increase the
30
amount of work by a factor of 12 = 2.5.
So with pipeline S the modifications from the previous problem impact performance in two ways. The reduction in data being sent
from the vertex to geometry shader will tend to improve performance but the increase in computation will tend to reduce performance.
The 4 × 4 matrix/vector multiply consists of 16 multiply/add (MADD) operations. In current NVIDIA GPUs the time needed for
the 16 MADD instructions is much less than the time needed to move 32 bytes across the GPU chip boundary, and so even with the
additional computation the change would be worthwhile.
6
Problem 3: [15 pts] In Homework 3 we experimented with two ways to render a triangle with a hole in it.
In the one-triangle method we relied on the fragment shader to render the hole. The geometry shader just
emitted one triangle with little work. In the many-triangle method we emitted several triangles, forming a
triangle-with-a-hole shape.
Based on performance measurements we found that the one-triangle method was faster. That must mean
that the extra work done by the geometry shader in the many-triangle method had more of an impact than
the extra work done by the fragment shader in the one-triangle method.
Let tg1 denote the time used by one invocation of the geometry shader for the one-triangle method, let tgm
denote the time used by one invocation of the geometry shader for the many-triangles method. Let tf 1 and
tf m denote the times for one invocation of the respective fragment shaders.
Let n denote the number of spheres rendered, and let g denote the number of triangles in one sphere
tessellation. Finally, let f denote the fraction of the triangle covered by the hole.
(a) Based on these, find an expression for the total time used by the geometry shaders during a render pass
for each method.
Total rendering pass time for geometry shader using one-triangle method:
Each invocation takes tg1 , and a there is one invocation for each tessellated triangle, so the total time is ngtg1 .
Total rendering pass time for geometry shader using many-triangle method:
Each invocation takes tgm , and a there is one invocation for each tessellated triangle, so the total time is ngtgm .
(b) Find an expression for the time used by the fragment shaders. Use nf 1 for the total number of fragment
shader invocations in the one-triangle method. (But use it for both expressions.)
Total rendering pass time for fragment shader using one-triangle method:
This is simply nf 1 tf 1 .
Total rendering pass time for fragment shader using many-triangle method:
Since no nf m has been provided it must be estimated: nf m = f nf 1 . So the total time is f nf 1 tf m .
(c) What does nf 1 depend on? How can nf 1 be made larger or smaller when viewing a scene without changing
the primitives sent into the rendering pipeline. That is, one can’t send fewer spheres or more spheres into
the rendering pipeline.
7
Problem 4: [40 pts] Answer each question below.
(a) The two shaders below do the same thing, though slightly differently.
void vs_plan_a() {
vertex_e = gl_ModelViewMatrix * gl_Vertex;
gl_Position = gl_ProjectionMatrix * vertex_e;
}
void vs_plan_b() {
vertex_e = gl_ModelViewMatrix * gl_Vertex;
gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;
}
OpenGL is provided a modelview matrix and a projection matrix at the beginning of a rendering pass. In
both vs_plan_a and vs_plan_b there are two matrix/vector multiplies, which require 42 = 16 multiply/
add operations each. But vs_plan_b uses gl_ModelViewProjectionMatrix, which is the product of the
modelview and projection matrices. The product of these two matrices is computed using 43 = 64 multiply/
add operations. That brings the total to 16 + 16 + 64 = 96 operations, much more than 32 for vs_plan_a,
right?
Describe a case when the argument is correct, but explain why this case does not reflect typical use.
It would be correct if a rendering pass processed just one vertex.
8
(b) Answer the following questions about view volumes.
It is easy to determine whether a vertex is in the view volume by using its coordinate in
object space,
eye space, or
clip space (check one).
Given the coordinate in that space, how can one tell whether it is inside or outside the view volume?
It is in clip space if the absolute value of each component is ≤ 1 after homogenization.
It is easy to determine whether some triangles are in the view volume. Provide an example of such a
triangle and explain why.
It is easy if all three vertices are in the view volume. In that case no part of the triangle can be outside the view volume.
Provide an example of a triangle for which it is not so easy to determine if it is in the view volume.
Illustrate with a diagram.
One in which all three vertices are outside the view volume. FINISH.
9
(c) Describe how suitable an OpenGL uniform variable is for the following:
Explain whether this a suitable use for a uniform variable: To hold the lighted color computed by a vertex
shader.
That won’t work because uniform variables cannot be written by shaders, including vertex shaders. Even if the uniform were written
by some other means, each vertex can have a different lighted color but the value of a uniform variable must be the same for every
vertex in a rendering pass.
Explain whether this a suitable use for a uniform variable: To hold the location of a light source.
That is suitable because that would be the same for every vertex.
(d) Vertex coordinates are usually three dimensional but texture coordinates are usually two dimensional.
Why? (Ignore the w component in your answer.)
10
(e) In many of our sphere examples we put the coordinates into a buffer object.
What are the advantages of a buffer object over using individual glVertex calls to feed a rendering pipeline?
Much lower overhead compared to calling glVertex to provide one vertex coordinate.
What are the advantages of a buffer object over using a client (CPU) array to feed a rendering pipeline?
With a client array the data must be sent from the GPU to the CPU for each rendering pass, even if that data hasn’t changed. A
buffer object can be reused.
(f) A homogeneous coordinate consists of four components, compared to just three for ordinary Cartesian
coordinates. Homogeneous coordinates increase the amount of work needed for a matrix/vector multiply
from 9 to 16 multiplications. Transformations are realized by multiplying a transformation matrix by a
coordinate.
11