Module
Module
1) Write the C code to initialize Port G bit 6 as an input and then test if it is one or 0.
Remember the I/O bit may have an analog select feature which must be disabled. Use the
structures defined in xc.h, such as LATAbits.LATA3.
ANSELGbits.ANSELG6 = 0;
TRISGbits.TRISG6 = 1;
if (PORTGbits.PORTG6 == 1)
2) Rewrite the previous questions code using the atomic access control registers (SET, CLR,
INV).
ANSELGCLR = 1 << 6;
TRISGSET = 1 << 6;
if (PORTGbits.PORTG6 == 1)
3) Write the C code to initialize Port F bits 8 and 12 to outputs, setting bit 8 to 1 and bit 12 to
0. Remember the I/O bit may have an analog select feature which must be disabled. Use
the structures defined in xc.h, such as LATAbits.LATA3.
TRISFbits.TRISF8 = 0;
ANSELFBits.ANSF12 = 0;
TRISFbits.TRISF12 = 0;
LATFbits.LATF8 = 1;
LATFbits.LATF12 = 0;
4) Rewrite the previous questions code using the atomic access control registers (SET, CLR,
INV).
Assume that each call of Delay(delay_count) takes 100 milliseconds, and all other code takes
no time at all.
5) Describe the situation which results in the shortest delay between switch 2 (connected to
BTN2_PORT_BIT) being pressed and the LEDs turning off. Draw and label a timeline to
show what code the processor executes and when.
Switch 2 is pressed immediately before the if statement. The if statement executes and
then the LEDs are cleared, so there is no delay.
6) Describe the situation which results in the longest delay between switch 2 (connected to
BTN2_PORT_BIT) being pressed and the LEDs turning off. Draw and label a timeline to
show what code the processor executes and when.
Switch 2 is pressed immediately after the if statement. There is a delay of 400 ms before
the LEDs are turned off.
7) Describe the situation which results in the longest time that switch 2 can be pressed (and
then released) without the LEDs turning off. Draw and label a timeline to show what code
the processor executes and when.
8) Describe the situation which results in the shortest delay between switch 2 (connected to
BTN2_PORT_BIT) being pressed and all four of the LEDs turning off. Draw and label a
timeline to show what code the processor executes and when.
Switch 1 is pressed, so the delay calls use FAST_DELAY (20 ms).
Switch 2 is pressed immediately before the second if statement in TASK_Read_Switches.
LD1 is turned off immediately, LD2 after 20 ms, LD3 after 40 ms, and LD4 after 60 ms.
9) Edit the code in the case statements to reduce this delay to zero.
Edit each case so that Delay is called only if G_Allow_Lit_LEDs is true. This eliminates all
delay calls as long as G_Allow_Lit_LEDs is false.
10) Describe the situation for the original code (not including your changes from question 9))
which results in the longest delay between switch 2 (connected to BTN2_PORT_BIT) being
pressed and the LEDs turning off. Draw and label a timeline to show what code the
processor executes and when.
Switch 1 is not pressed, so the delay calls use SLOW_DELAY (100 ms).
Switch 2 is pressed immediately after the start of any Delay call in a case statement. For
this discussion, assume it is case 1. There is a delay of 100 ms before TASK_Read_Switches
runs and polls the switch, changing G_Allow_Lit_LEDs. TASK_Scan_LEDs_Once_FSM runs
case 2, turning off LD2 and delaying again. At 200 ms TASK_Scan_LEDs_Once_FSM runs
case 3, turning off LD3 and delaying again. At 300 ms TASK_Scan_LEDs_Once_FSM runs
case 4, turning off LD4 and delaying again. At 400 ms TASK_Scan_LEDs_Once_FSM runs
case 1, turning off LD1 and delaying again.
So the total delay is 400 ms.
11) Now include your changes from question 19. Describe the situation for your code which
results in the longest delay between switch 2 (connected to BTN2_PORT_BIT) being
pressed and the LEDs turning off. Draw and label a timeline to show what code the
processor executes and when.
Switch 1 is not pressed, so the delay calls use SLOW_DELAY (100 ms).
Switch 2 is pressed immediately after the start of any Delay call in a case statement. For
this discussion, assume it is case 1. There is a delay of 100 ms before TASK_Read_Switches
runs and polls the switch, changing G_Allow_Lit_LEDs. TASK_Scan_LEDs_Once_FSM runs
case 2, turning off LD2 but skipping the delay. TASK_Scan_LEDs_Once_FSM runs case 3,
turning off LD3 but skipping the delay. TASK_Scan_LEDs_Once_FSM runs case 4, turning off
LD4 but skipping the delay. TASK_Scan_LEDs_Once_FSM runs case 1, turning off LD1 but
skipping the delay.
So the total delay is 100 ms.
12) Describe the situation which results in the longest time that switch 2 can be pressed (and
then released) without the LEDs turning off. Draw and label a timeline to show what code
the processor executes and when.
Switch 1 is not pressed, so the delay calls use SLOW_DELAY (100 ms).
Switch 2 is pressed immediately after the start of any Delay call in a case statement. The
Delay call takes 100 ms, and then TASK_Read_Switches will run again. So the switch must
be released before 100 ms to be missed by the code..
Module 5 Analog Interfacing
13) What code would you expect a 12-bit ADC to produce when converting a 2.2 V input?
Assume the positive reference voltage is 3.3 V and the negative reference voltage is 0 V.
14) What code would you expect a 10-bit ADC to produce if converting the voltage from a
temperature sensor at 137 C? Assume the sensors output voltage is 8 mV/C 200 mV.
Assume the positive reference voltage is 3.3 V and the negative reference voltage is 0 V.
15) Write the code to configure a PIC32MZ2048EFG100 to read the analog voltage on pin 100
with the ADC. Assume the ADC has already been configured correctly to read another
input channel. Table 1-1 in the PIC32MZ EF Data manual shows pin connections for the
ADC. Remember that this MCU uses a 100-pin TQFP package. Be sure to configure the
inputs ANSEL field, select channel 18 with the ADC multiplexer, and request a software-
triggered conversion.
Module 7 Communications
16) Configure the UART to communicate at 128300 baud, with 8 data bits, no parity, and one
stop bit. Assume the peripheral bus clock is running at 100 MHz.
Need to divide 100 MHz clock down to either 4X or 16X the desired baud rate.
100 MHz/(4*128300 Hz) = 194.86
100 MHz/(16*128300 Hz) = 48.72. We will get better noise immunity with BRGH = 0.
17) Which PIC32MZ EF MCU pins (for the TQFP-100 package) could be used for the UART3
transmit (U3TX) and receive (U3RX) data signals using Peripheral Pin Select? Examine Table
12-3 for the output and Table 12-2 for the input. Specify the pin name.
Output: RPD2, RPG8, RPF4, RPD10, RPF1, RPB9, RPB10, RPC14, RPB5, RPC1, RPD14, RPG1,
RPA14
Input: RPD3, RPG7, RPF5, RPD11, RPF0, RPB1, RPE5, RPC13, RPB3, RPC4, RPD15, RPG0,
RPA15
18) If we connect U3TX to RPD10 and U3RX to RPD15 using Peripheral Pin Select, what pin
numbers will these signals be on? Assume we are using the PIC32MZ EF MCU in the TQFP-
100 package.
19) Write the C code to connect U3TX to RPD10 and U3RX to RPD15 using Peripheral Pin
Select.
21) What must the application program do in order to use a watchdog timer (WDT)?
The application software must periodically refresh the watchdog timer before it expires.
22) What is the difference between a watchdog timer and a windowed watchdog timer?
A watchdog timer can be refreshed at any time before it expires, while a windowed WDT
can only be refreshed within a fixed time window before it expires.
24) Why is using DMA to copy data faster than using software?
The software copy operation needs to execute multiple instructions per word of data: (1)
read the data from the source memory location into a register, (2) store the data from the
register into a destination memory location. It may also need to (3) update pointers and (4)
decide whether to repeat the copy (if in a loop). Using DMA eliminates the need to fetch
and execute instructions to copy data, since the DMA controller performs the reads, writes,
pointer increments and counting. The only software overhead is configuring the DMA
controller, and possibly triggering it.
25) We wish to create an embedded system which generates a square wave whose frequency
is determined by an analog input voltage. Describe how you could use DMA, two timers,
and an analog-to-digital converter to do this.
Configure the first timer to overflow at a fixed frequency. Configure the second timer to
generate a square wave. Use the first timers overflow event to trigger the ADC to start a
conversion. Use the ADC conversion complete event to trigger a DMA copy from the ADC
result register to the second timers period register.
Module 10 Advanced Concurrency
26) Consider two task sets and a prioritized scheduler. Set A has 10 tasks, each of which takes
40 time units to execute. Set B has 10 tasks; one takes 100 time units to execute and the
rest take 40 time units each. Which task set would benefit more from switching from a
non-preemptive scheduler to a preemptive one, and why?
Set B would benefit more, because it has more variation in task execution time. With set A
and no preemption, the longest a task can be delayed by another task is 40 time units.
With set B and no preemption, the longest time is 100 time units, which is much longer
than 40.
27) Consider a system with a prioritized, non-preemptive scheduler and three tasks (A, B and
C, in order of decreasing priority). Show in the table below the order the tasks execute if
task B starts running at time 0, and tasks A and C are released (become ready to run) at
time 2. Assume each task takes 3 time units to execute.
Time 0 1 2 3 4 5 6 7 8 9
A C (idle)
Runinng B
Task
What is the response time for each task? This is the delay between task being released
(becoming ready to run)) and its completion.
Task Response Time
A 4
B 3
C 9
28) Now consider a system with a prioritized, preemptive scheduler and three tasks (A, B and
C, in order of decreasing priority). Show in the table below the order the tasks execute if
task B starts running at time 0, and tasks A and C are released (become ready to run) at
time 2. Assume each task takes 3 time units to execute. Also show the state of each task
(Ready, running, blocked, or inactive).
Time 0 1 2 3 4 5 6 7 8 9
A B C (idle)
Runinng B
Task
What is the response time for each task? This is the delay between task being released
(becoming ready to run)) and its completion.
A 3
B 6
C 9
29) How can we use a mutex to provide mutually exclusive access to a resource or variable by
tasks?
A task which needs to use the resource must first obtain the mutex. After using the
resource, the task must release the mutex. This gives the other tasks a chance to try to
obtain the mutex in order to use the resource.
30) What happens if a task X tries to obtain a mutex which is not available? What happens
when the mutex finally does become available?
The scheduler moves task X into the blocked state and runs the highest priority ready task.
Task X remains blocked until the mutex becomes available, at which task X may start
running again (if there are no higher priority ready tasks) or be placed into the ready state.
31) What code does a scheduler run if there are no ready tasks?
32) What happens to the contents of the CPU registers when a task is preempted?
b) register 29
c) register 31
32.
37) How is an immediate 32-bit value loaded into a MIPS register, if the instruction itself is
only 32 bits long?
The upper 16 bits are loaded with a LUI instruction, and then the lower 16 bits are loaded
with an ORI instruction.
38) What does the MIPS instruction beq $t1, $t0, next do?
If the values in registers $t0 and $t1 are equal, branch to the label next, else continue
executing instructions in sequence.
39) Which registers are used to pass arguments to called functions? And which is used for the
functions return value?
Arguments go in $a0, $a1, $a2, and $a3 (if more are needed, the stack is used). The return
value is passed through $v0 (or through the stack, if too large).
40) Why are high-level languages used, if you can already write any program in assembly or
machine language?
A high-level language lets the programmer work at a higher level of abstraction compared
with assembly and machine languages, so each statement in the high-level language gets
more work done. As a result, the programmer needs to write fewer statements, finishing
the program sooner.
41) How are assembly and machine language related, and what is their most important
difference?
They both represent the same program. Machine language is a computer-readable form,
while assembly language is the human-readable form..
Module 11B MIPS Memory System
42) Why does the memory system include a cache?
If a piece of data is used, nearby data will probably also be used soon.
Compulsory: this is the first time the data item has been accessed, so it is not in the cache
yet.
Capacity: the cache is too small to hold all the data at the same time, so the data item was
in the cache but was evicted by other data later.
Conflict: data item was evicted because another data item used the same location later.
47) Consider a 1024-byte two-way set-associative cache. Each block is 16 bytes long.
a) How many sets does the cache have?
In a set-associative cache, a data item can be placed into one of multiple cache locations. In
a direct-mapped cache, a data item can only be placed in one cache location. So a direct-
mapped cache will have more conflicts than a set-associative cache.
49) Assume a memory system with cache has an access time of one cycle for a hit and 20
cycles for a miss. How long will a program take to execute if it has
Answer.
4096
Size (bytes) 16384
16
Block size (bytes) 16
4
Associativity (ways) 4
4096/(4*16) = 64
Number of sets 16384/(4*16) = 256
52) Explain how to make the processor ignore interrupts with a priority of 0, 1, or 2.
56) What are the five stages of the PIC32 instruction execution pipeline?
I: Instruction Fetch
E: Execution
M: Memory Fetch
A: Align
W: Writeback
An instruction which depends on the result of an instruction which hasnt written its result
to the register file yet.
A conditional branch which depends on the result of an instruction which hasnt completed
yet.
61) How many pipeline stages are in the PIC32MZ EF Floating Point Unit?
Seven
62) What is the execution latency for the following single-precision floating point instructions
on a PIC32MZ EF CPU?
a) Add
4 cycles
b) Subtract
4 cycles
c) Multiply
4 cycles
d) Divide
17 cycles
e) Square root
17 cycles
f) Reciprocal
13 cycles
17 cycles
63) How long would it take a PIC32MZ EF running at 200 MHz to execute four multiplies and
three adds?
These seven instructions are pipelined, so it would seven cycles plus three cycles for the rest
of the last add instruction, for a total of ten cycles. At 200 MHz this is 50 ns.
Module 12 - Performance
64) What is the compilers primary goal when generating code?
You are likely to waste your efforts by optimizing parts of the program which dont really
matter.
66) What is a programs execution time profile, and how does it help with optimization?
Profiling shows how much time different parts of the program take, allowing you to focus
on optimizing the most important parts.
67) Why is it important to look at the object code which the compiler generates?
68) How does program counter sampling work to provide profiling information?
The program is occasionally interrupted by a sampling ISR, which examines the saved PC on
the stack to determine which function was executing. The ISR then updates the count of
times which that function was interrupted. The relative counts provide the programs
execution profile.
69) Which three parts of the following code do you think will dominate the programs
execution time? Explain why and list the line numbers sorted by largest execution time.
Use the following assumptions:
the CPU does not have floating point math hardware support
a floating-point operation takes 1000 cycles
an integer operation takes 1 cycle
the arrays have been initialized
all memory accesses take ten cycles
int i, m[1000];
float a[1000], c[1000], sum=0;
i=0; // line 1
while (i<1000) { // line 2
if (i & 0x1) { // If odd // line 3
c[i] = 317.1 + a[i]; // line 4
} else { // even // line 5
m[i] += i + m[i-1]; // line 6
} // line 7
sum += m[i+1]; // line 8
i++; // line 9
} // line 10
Line 8 has a memory read and a floating point operation. T = 10 + 1000 = 1010 cycles.
Line 4 has floating point add a read and a write. It only runs half of the time (for odd values
of i). T = (10 + 1000 + 10)/2 = 510
Line 6 has an integer multiply, a subract, two memory reads and a write, and it happens
half of the time. T = (1 + 30 + 1)/2 = 16
70) Which three parts of the following code do you think will dominate the programs
execution time? Explain why and list the line numbers sorted by largest execution time.
Use the following assumptions:
Line 6 has to read memory twice and write it once. It has two math operations. T = (20 + 10
+ 2)/2 = 16
Line 8 has one memory read and two math operations, and it runs on every loop iteration T
= 10 + 2 = 12.
Line 4 only has to read memory once and write it once. It has one math operation. T = (10 +
10 + 1)/2 = 10.5