DMD Semester Project Final Report-1
DMD Semester Project Final Report-1
Student Paper
Author(s):
Yin, Xiaorui
Publication date:
2021
Permanent link:
https://doi.org/10.3929/ethz-b-000510370
Rights / license:
In Copyright - Non-Commercial Use Permitted
This page was generated automatically upon download from the ETH Zurich Research Collection.
For more information, please consult the Terms of use.
High Frame-Rate and Low-Latency
Control of Digital Micromirror Devices
(DMD) using FPGA for Ultracold
Atom-based Quantum Experiments
Semester Project
Xiaorui Yin
[email protected]
Supervisors:
Dr. Kadir Akin (Engineering Unit in Quantum Center)
Alexander Baumgärtner (Quantum Optics Group)
Prof. Dr. Lukas Novotny (Photonics Laboratory)
May, 2021
Acknowledgements
I am very grateful to those people who helped me in the past three and a half
months. My deepest gratitude goes first to my supervisors Dr Kadir Akin and
Alexander Baumgärtner, for their careful and selfless guidance of my semester
project, which greatly improved my FPGA programming skill and taught me a
lot of specific problem-solving skills. I am also deeply indebted to all the other
supervisors, Prof. Lukas Novotny, Prof. Tilman Esslinger, and Jeffrey Mohan,
for their direct and indirect help to me.
i
Abstract
ii
Contents
Acknowledgements i
Abstract ii
1 Introduction 1
4 Results 25
4.1 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 Latency Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Bibliography 29
iii
Chapter 1
Introduction
1
Chapter 2
Figure 2.1: The DMD and the Micromirror Array [4] The figure on the left is
the DMD, the micromirror array is in the centre. The right figure shows a closer
look at the micromirror array, some micromirrors are turned off.
2
2. Digital Micromirror Device (DMD) 3
Figure 2.2: The Micromirror (a) and the CMOS memory (b) [4]
As mentioned above, there are two types of states in the DMD system: the
mechanical state of the micromirror and the digital state in the CMOS memory.
The operation to load the image data into the CMOS memory is called LOAD.
LOAD operation does not update the mechanical state, to transfer the digital
state to the mechanical state another operation called RESET is needed. Peo-
ple may be confused by the name RESET and thought the RESET operation
will set everything to its default states. The name RESET is defined by Texas
instruments, ACTIVATE could be a better name for it. The RESET operation
can be performed in multiple modes with the block as the basic unit. A block
is a set of consecutive rows. Take XGA resolution as an example, there are 16
blocks with 48 rows each. We can only update one block in single block mode,
or we can update two or four blocks at the same time in dual/quad block modes
respectively. Updating one or multiple blocks instead of the entire array is mean-
ingful when only part of the blocks have new data arrived. Figure 2.3 shows the
Quad RESET operation. After receiving a RESET pulse, the target blocks will
update their micromirrors according to the values in their CMOS memory.
The devices used in this project are DLPLCR70EVM (0.7 XGA DMD) and
2. Digital Micromirror Device (DMD) 4
(a) GUI: issue load and reset commands for the ETH logo.
(b) ETH Logo on the DMD: the micromirrors showing the ETH logo are turned off.
In other words, the light reflected by these micromirrors will not enter our eyes.
In this section, I briefly describe the usage of DMD for Bragg potentials
generation[1]. As figure 2.7(a) shows, the imaging system includes the DMD
and two telescopes. The telescopes consisting of many objective lenses are used
to achieve a large demagnification factor which determines the final image size
in the atom plane. The DMD is placed in the Fourier plane by projecting a
sinusoidal pattern that is the Fourier transfer of two delta functions convolved
with a Gaussian envelope. The pattern loaded in micromirror array is referred
as image in this report, although they are not an image itself, but they are used
to generate an image using laser beam. The generated momentum and energy
transfer in the atom plane are proportional to the frame rate of the DMD. With
some calculations, the Bragg potentials require a maximum frame rate 4.6kHz.
(a) Two telescopes imaging system: DMD (left) and atom plane (right) with
some objective lenses in between
(b) Test setup including laser beam path: The DMD is placed in the upper
right corner. Below the DMD is the atom plane with a camera behind it. The
laser beam is generated by the laser source on the left, and the laser reaches
the atom plane after passing through different lenses.
The development tools used in this project are Xilinx ISE 14.7 and ChipScope Pro
Analyzer. Both can be downloaded from the Xilinx archive download page [6].
Noticed that the software should be run on a Linux virtual machine (recommend
Oracle VirtualBox).
3.1 Overview
8
3. Application FPGA Design 9
work with the GUI. When the test pattern generation is enabled, the DMD will
continuously display some dummy patterns like all zeros, all ones, diagonal line.
When the controller board is connected to the PC via USB, the GUI will au-
tomatically detect the DMD and the operation mode will be switched to GUI
mode. In the GUI mode, the 16-bit image data is transmitted through the USB
with a 48MHz clock. The frame rate can be calculated as follows:
48M Hz
fusb = = 976.56Hz (3.1)
1024 × 768bits/16bits
Actually the frame rate is much slower than 976.56Hz. The data is not continuous
and the break time between two data is very long. Approximately, it takes two
seconds to project an image.
It is obvious that the bottleneck of the system is USB communication. To
overcome this bottleneck, utilizing the DDR2 SODIMM can greatly accelerate
the system. With a 150MHz DDR2 memory, the improved frame rate is 24414Hz
as in theory presented below. Practically DDR2 has also bottlenecks as explained
in this chapter. However, we presented that we can reach up to 22kHz speed in
practice using DDR2.
150M Hz
fddr2 = = 24414.06Hz (3.2)
1024 × 768bits/128bits
The USB_IO module is responsible for USB data receiving, some important IO
ports are shown in the following table.
There is a Cypress FX2 chip on the controller board for the PC-FPGA
communication[9]. The FX2 controller simplifies the design on the FPGA side,
the USB interface only needs to read the data from the FIFO. In the GUI mode,
the GUI takes over everything. That means the GUI not only sends the image
data but also data like row address (rowad ) block address (blkad ). All the data
shares the same bus bidir distinguished by the control signals. The demultiplex-
ing is as follows:
The USB data and control signals are clocked by 48MHz USB clock ifclk, but
the frequency of the system clock is 200MHz, which leads to a slow to fast clock
domains crossing problem. An easy way is to use the Xilinx FIFO IP [10] such
that the write and read operations can be independently performed by the USB
clock and the memory clock. Another advantage of using FIFO is that it is also
3. Application FPGA Design 12
capable of data width conversion. The memory data bus is 128 bits wide and the
USB data bus is 16bits wide. The FIFO then concatenates eight USB input data
to form one memory data.
Reading data from the memory is similar, one address corresponds to four
consecutive data (two 128bits user data). This is shown in figures 3.4 3.5.
The memory address (app_af_addr 31bits) contains the bank address, the
row address, and the column address. The number of addresses needed for one
XGA image can be calculated as follows:
1024 × 768bits
#addresses = = 3072d = 110000000000b(12bits) (3.3)
2 × 128bits
3. Application FPGA Design 15
XXXXXXXXXXXXXXX
| {z } XXXXXXXXXXXXXX
| {z } XX
|{z} (3.4)
image selection 15bits data selection 14bits burst 2bits
The memory interface is controlled by a finite state machine (FSM) 3.6 which
has three possible states: S0 (IDLE, default), S1 (WRITE) and S2 (READ).
When starting the system, the user is required to give the parameter num_patterns.
The internal signal write_pattern_count implies how many images have been
written to the memory. If write_pattern_count = num_patterns, it means that
all images have been written and the mem_preload_done becomes high. There-
fore, if the memory preload process is not done and the memory initialization
is done, the FSM transfers to S1. In S1 state, the controller monitors the write
valid signal (wr_valid ) and the write ready signal (wr_ready). The write valid
signal is high when the memory write FIFO in the USB interface is not empty,
and the write ready signal is high when both the address FIFO and the data
FIFO of the memory interface are not full. Once these two signals are both
high, the controller starts to read the data from the memory write FIFO by
setting the mem_get_data to high. The data comes with a data valid signal
(wr_data_valid). Hence the controller starts to write data to the memory if the
data valid signal is high. The FSM moves to S2 if the memory preload process
is done, and it receives a read request pulse (rd_en) issued by the DMD Trigger
Control Module. The memory output data is stored in the memory read FIFO
for clock domain crossing.
3. Application FPGA Design 16
Figure 3.6: Finite state machine of the memory controller with three states and
transition conditions
In the quantum experiment, the DMD needs to LOAD and RESET the next
image stored in the memory when it receives an external trigger pulse. This is
implemented by the DMD Trigger Control Module (DMD_trigger_control.vhd ).
In this module, there is also an FSM with two possible states: S0(IDLE, default)
and S1(OUTPUT DATA). To change the state to S1, it is required that the
memory preload process is done, the DMD initialization process is done and a
trigger pulse is detected. Since the trigger signal is external, it is first put into a
register to avoid timing violation, and then the output of the register is delayed
by one cycle to generate a pulse.
3. Application FPGA Design 17
Figure 3.7: Finite state machine of the DMD Trigger Controller with two states
and transition conditions
Figure 3.8: Simulation of row data loading. Data 1-8 are placed in the the first
row with row mode "11". For the rest rows, row mode "01" indicates placing the
data in the next row.
Considering that after every trigger pulse, the DMD will rewrite all pixels,
we are only interested in global RESET mode. Therefore, the block operation is
No-Op during the data loading and IDLE state, and after the loading is finished,
the global reset request will be sent to the DMD with a lifetime of one row cycle.
For the relationship between block operation type and block mode, refer to Table
3.2.
3. Application FPGA Design 19
The sections above describe how the image data is loaded into the memory,
and how the controller read the image data and send it out. There are still
some important modules to make the system work. The following modules are
originally from Texas Instruments with some adjustments.
3. Application FPGA Design 20
Figure 3.9: Waveform of the LVDS SerDes. In one clock cycle of clk1x, the clk2x
has four edges. The parallel signals d1 - d4 at the positive edge of clk1x appear
one by one at each edge of clk2x
3.6 Simulation
The USB data bus (bidir ) is a bidirectional port with type inout. Therefore,
it needs to be buffered first with Xilinx primitive IOBUF. During the simulation,
We found that the IOBUF does not work properly. After discussing it with my
supervisor Dr. Kadir, we believe this may be a problem with the simulator (ISim)
itself. To continue the simulation, the IOBUF is bypassed and the bidir bus is
changed to a normal input. However, this bypass is only made for simulation.
We still used the IOBUF during the synthesis.
The simulations for most modules are similar, but the memory simulation
requires extra steps. In addition to creating testbench and required signals, the
designer also needs to instantiate the DDR2 models which act as the physical
memory. The number of DDR2 models is determined by the memory parameters,
and the model can be found in the MIG IP folder. During the memory simulation,
we encountered a peculiar problem. When the project language of ISE is set to
VHDL, the ISim will return many warnings and the memory initialization will not
complete. Our solution is to simulate the memory in a separate ISE project with
Verilog. Normally the memory initialization process takes a lot of time. To be
more efficient, the memory is replaced with a large FIFO. After confirming that
the design is correct when using FIFO, it is simulated again using the memory.
3.7 Debugging
Normally good simulation results do not assure correct FPGA behaviour. There
are many factors that affect the actual results, of which timing is particularly
important. We use ChipScope ILA (Integrated Logic Analyzer) to monitor the
internal FPGA signals. This tool allows us to investigate the problems and actual
behaviour of firmware in real-time after programming FPGA.
Clock domain crossing (CDC) signals must be synchronized in multiple clock
domains-based design to meet the timing constraints. In the design process, only
the data buses are synchronized but not the one-bit signals. Later in debugging,
the unprocessed CDC signals would lead to failure. To synchronize the one-bit
CDC signals, we can use FlipFlop synchronizer to solve this problem. Taking
mem_read_enable as an example, mem_read_enable is generated by a combi-
national logic in the DMD Trigger Control module in the system clock domain,
it is firstly registered once by the system clock and then registered twice by the
memory clock.
While sending the data from PC to the USB interface, we found out that we
received redundant data. This is not a problem when using GUI, because GUI
sends FIFO reset signal to clear the redundant data. Since the GUI FIFO and
the memory FIFO are different, the FIFO reset signal cannot be directly used.
Therefore, we use ILA core to check where are the redundant data. Fortunately,
the redundant data only appear after row 0 and row 47, so we use a row counter
3. Application FPGA Design 22
to filter it out. This problem can be automatically corrected later when we use
our own GUI instead of the GUI given by TI. Since we are not able to modify
the GUI of TI, the problem is solved by filtering out redundant data in FPGA.
The read data from the memory was incomplete in our initial tests with
the DDR2 memory. There was a redundant zero (shown in figure 3.10) at the
beginning. Since the controller reads a fixed number of data, the last data was
lost. Investigating the reason of this issue hard because the simulation proves the
correctness of the design. Nevertheless, this problem is solved with some tricks.
For every image write operation, we append one more data at the end to make
sure that the last data can be read out, then we use a counter to bypass the first
data and the last appended data. Figure 3.11 shows this trick.
Figure 3.10: ILA result of memory read data. The first line is the valid signal.
It becomes high at time 0 with the first data "0000" (with read vertical line).
But "0000" should appear only one cycle during the valid signal, not in two
consecutive cycles.
Figure 3.11: Method to solve the memory read data issue. The last pattern data
is written to the address 12284, and an extra zero data is written to the address
12288.
For those who want to reproduce the design, the project settings should be as
shown in figure 3.12. When importing all the source files, two new libraries must
be set for two files: ddc4100 for appsfpga_dmd_types_pkg.vhd and ddr2 for
DDR2_2GB_150MHZ_pkg.vhd, refer to figure 3.13.
3. Application FPGA Design 23
Figure 4.1 shows the FPGA hardware resources utilization from the synthesis
report. It can be seen that the BRAM usage is very large (75%). In fact, this is
after disabling some unused FIFOs, otherwise, the BRAM usage would be 100%.
The disabled FIFOs are the FIFOs of channels C and D in the USB IO module
which are unused for XGA resolution. The ILA core also consumes BRAM, if
there are no more BRAM resources available, one can further disable the FIFOs
of channels C and D in the memory IO module.
3. Application FPGA Design 24
Results
4.1 Testing
I tested my FPGA implementation with six test images in figure 4.2. The trigger
is generated inside the FPGA with a defined frequency which is also the frame
rate. The mem_en and num_patterns parameters are entered through the Chip-
Scope VIO (virtual input/output). the num_patterns parameter (VIO1) starts
from zero, which means that for n images, the parameter is n-1.
Figure 4.1: ChipScope VIO: The upper part is the input and output in system
clock domain (SyncIn: trigger_miss, SyncOut: mem_en). The lower part is the
number of pattern parameter given by the user.
Figure 4.3 shows the results for a 1/3Hz trigger. Driven by the 1/3Hz trigger,
the DMD displays the images one by one in a loop, each image lasts for one
second. Figure 4.4(a) shows the image when the frame rate is 22kHz. Because
human eyes can only see maximal 24 frames per second, it looks like all images
overlap together. A trigger miss signal (connected to VIO SYNC_IN(0)) is cre-
ated to check if the frame rate is indeed 22kHz. The trigger miss signal becomes
high when the FPGA detects a trigger pulse but this trigger pulse is discarded,
which means the read request is rejected. For the 22kHz frame rate, the trig-
ger miss signal is always deasserted, and it is asserted when the frame rate is
above 24kHz. This implies that 22kHz frame rate can be guaranteed, but not for
frame rates above 24kHz. The two complementary checker images (checker1 and
checker2) are used to check the completeness of the display. If the frame rate is
25
4. Results 26
above human eyes limitation, we should see all the pixels are grey, this is shown
in figure 4.4(b).
Figure 4.3: Result at 1/3Hz frame rate. Each image can be displayed clearly.
4. Results 27
Figure 4.4: Result at 22kHz frame rate. Because the frame rate exceeds the
limitation of human eyes, the image is static.
The overall system latency can be roughly measured by calculating the number
of clock cycles consumed by one operation. The latency mainly comes from four
processes:
2. Data loading: Loading 768 rows of data takes about 6144 memory clock
cycles (8192 system clock cycles).
A total of 9150 system clock cycles are required for one operation, which gives us
the maximum frame rate of 21.857kHz. However, the reset delay is usually less
than 4.5µs, and it is not a problem to slightly increase the frame rate to 22kHz.
Chapter 5
In this project, I have implemented an FPGA based solution for achieving high
frame and low latency control of DMD. The proposed approach preloads the
images into a DDR2 memory and uses a trigger signal to read the data from the
memory. The maximal frame rate can be increased to 22kHz which is proved by
an internally generated trigger signal.
In the future, the trigger should be an external signal as a TTL (transistor-
transistor logic) input. The TTL input should be properly configured to be
compatible with the Virtex 5 FPGA in terms of threshold voltage and drive
level. Despite the memory read data issue is solved by a trick, the reason is still
worth investigating. I would suggest making a separate memory test project, and
use other types of DDR2 memory for the test. In addition to using Discovery
D4100 GUI, there are two possible methods to transfer image data to the FPGA.
One is to use ChipScope Engine Tcl (CSE/Tcl) Scripting Interface of the VIO
core by writing a Tcl script. Another one is to fully customize the Cypress FX2
interface. It is also possible to further increase the frame rate. The memory clock
is 150MHz which is too slow compared with the 200MHz system clock. If we use
another DDR2 memory that supports a 200MHz interface clock, the data loading
latency can be shortened from 8192 clock cycles to 6144 clock cycles, which leads
to a 28.36kHz frame rate.
28
Bibliography
29