Lecture 6 - Digital Camera Example
Lecture 6 - Digital Camera Example
Outline
Hardware/Software Introduction
• Introduction to a simple digital camera
Chapter 7 Digital Camera Example • Designer’s perspective
• Requirements specification
•
Courtesy: Prof. Vahid - Embedded Systems Design: A Unified Hardware/Software Introduction
Design
– Four implementations
1 2
Embedded Systems Design: A Unified Hardware/Software Introduction
3 4
Embedded Systems Design: A Unified Hardware/Software Introduction Embedded Systems Design: A Unified Hardware/Software Introduction
• Special sensor that captures an image • Manufacturing errors cause cells to measure slightly above or below actual
light intensity
• Light-sensitive silicon solid-state device composed of many cells
• Error typically same across columns, but different across rows
When exposed to light, each • Some of left most columns blocked by black paint to detect zero-bias error
cell becomes electrically The electromechanical shutter
charged. This charge can Lens area is activated to expose the – Reading of other than 0 in blocked cells is zero-bias error
then be converted to a 8-bit cells to light for a brief
value where 0 represents no Covered columns Electro- – Each row is corrected by subtracting the average error found in blocked cells for
moment.
exposure while 255 mechanical that row
shutter Covered
represents very intense The electronic circuitry, when cells Zero-bias
Pixel rows
5 6
Embedded Systems Design: A Unified Hardware/Software Introduction Embedded Systems Design: A Unified Hardware/Software Introduction
1
Compression DCT step
7 8
Embedded Systems Design: A Unified Hardware/Software Introduction Embedded Systems Design: A Unified Hardware/Software Introduction
9 10
Embedded Systems Design: A Unified Hardware/Software Introduction Embedded Systems Design: A Unified Hardware/Software Introduction
11 12
Embedded Systems Design: A Unified Hardware/Software Introduction Embedded Systems Design: A Unified Hardware/Software Introduction
2
Huffman encoding example Requirements Specification
• Pixel frequencies on left
– Pixel value –1 occurs 15 times
• System’s requirements – what system should do
– Pixel value 14 occurs 1 time – Nonfunctional requirements
Huffman
• Build Huffman tree from bottom up Pixel Huffman tree
• Constraints on design metrics (e.g., “should use 0.001 watt or less”)
frequencies codes
– Create one leaf node for each pixel 64
value and assign frequency as node’s -1 15x -1
0
00 – Functional requirements
value 0 8x 100
-3 4x -1 -3 11110
– Repeat until complete binary tree 9 1 1
-5 3x 5 8 0 6 1 -5 10110
• captures and stores at least 50 low-res images and uploads to PC,
• Traverse tree from root to leaf to -10 2x 1 0 -2 -10 01110
13 14
Embedded Systems Design: A Unified Hardware/Software Introduction Embedded Systems Design: A Unified Hardware/Software Introduction
• Power of 64 x 64
– Must operate below certain temperature (cooling fan not possible) yes More no Transmit serially
serial output
8×8
– Therefore, constrained metric blocks? e.g., 011010...
15 16
Embedded Systems Design: A Unified Hardware/Software Introduction Embedded Systems Design: A Unified Hardware/Software Introduction
Executable model of digital camera • Main initializes all modules, then uses CNTRL module to capture,
compress, and transmit one image
1010110101101
CCD.C
• This system-level model can be used for extensive experimentation
0101001010110
1... – Bugs much easier to correct here rather than in later models
CCDPP.C CODEC.C
17 18
Embedded Systems Design: A Unified Hardware/Software Introduction Embedded Systems Design: A Unified Hardware/Software Introduction
3
Design Implementation 1: Microcontroller alone
• Determine system’s architecture • Low-end processor could be Intel 8051 microcontroller
– Processors
• Any combination of single-purpose (custom or standard) or general-purpose processors • Total IC cost including NRE about $5
– Memories, buses • Well below 200 mW power
• Map functionality to that architecture
– Multiple functions on one processor • Time-to-market about 3 months
– One function on one or more processors • However, one image per second not possible
• Implementation – 12 MHz, 12 cycles per instruction
– A particular architecture and mapping
• Executes one million instructions per second
– Solution space is set of all implementations
• Starting point – CcdppCapture has nested loops resulting in 4096 (64 x 64) iterations
– Low-end general-purpose processor connected to flash memory • ~100 assembly instructions each iteration
• All functionality mapped to software running on processor • 409,000 (4096 x 100) instructions per image
• Usually satisfies power, size, and time-to-market constraints • Half of budget for reading image alone
• If timing constraint not satisfied then later implementations could:
– use single-purpose processors for time-critical functions – Would be over budget after adding compute-intensive DCT and Huffman
– rewrite functional specification encoding
19 20
Embedded Systems Design: A Unified Hardware/Software Introduction Embedded Systems Design: A Unified Hardware/Software Introduction
Implementation 2:
Microcontroller
Microcontroller and CCDPP
EEPROM
• Synthesizable version of Intel 8051 available
8051 RAM
– Written in VHDL
– Captured at register transfer level (RTL)
SOC UART CCDPP
• Fetches instruction from ROM Block diagram of Intel 8051 processor core
– Increases NRE cost and time-to-market • Special data movement instructions used to
– Easy to implement load and store externally
To External Memory Bus
21 22
Embedded Systems Design: A Unified Hardware/Software Introduction Embedded Systems Design: A Unified Hardware/Software Introduction
23 24
Embedded Systems Design: A Unified Hardware/Software Introduction Embedded Systems Design: A Unified Hardware/Software Introduction
4
DCT floating-point cost Fixed-point arithmetic
25 26
Embedded Systems Design: A Unified Hardware/Software Introduction Embedded Systems Design: A Unified Hardware/Software Introduction
27 28
Embedded Systems Design: A Unified Hardware/Software Introduction Embedded Systems Design: A Unified Hardware/Software Introduction
• 1.5 seconds
P
– Power consumption:
• 0.033 watt (same as 2)
• Performance close but not good enough
– Energy consumption:
• 0.050 joule (1.5 s x 0.033 watt) • Must resort to implementing CODEC in hardware
• Battery life 6x longer!! – Single-purpose processor to perform DCT on 8 x 8 block
– Total chip area:
• 90,000 gates
• 8,000 less gates (less memory needed for code)
29 30
Embedded Systems Design: A Unified Hardware/Software Introduction Embedded Systems Design: A Unified Hardware/Software Introduction
5
Implementation 4:
CODEC design
Microcontroller and CCDPP/DCT
• 4 memory mapped registers
– C_DATAI_REG/C_DATAO_REG used to
• Analysis of implementation 4
push/pop 8 x 8 block into and out of – Total execution time for processing one image:
CODEC
• 0.099 seconds (well under 1 sec)
– C_CMND_REG used to command
CODEC – Power consumption:
• Writing 1 to this register invokes CODEC • 0.040 watt
– C_STAT_REG indicates CODEC done • Increase over 2 and 3 because SOC has another processor
and ready for next block
• Polled in software Rewritten CODEC software – Energy consumption:
• Direct translation of C code to VHDL for static unsigned char xdata C_STAT_REG _at_ 65527;
static unsigned char xdata C_CMND_REG _at_ 65528;
• 0.00040 joule (0.099 s x 0.040 watt)
static unsigned char xdata C_DATAI_REG _at_ 65529;
actual hardware implementation static unsigned char xdata C_DATAO_REG _at_ 65530; • Battery life 12x longer than previous implementation!!
void CodecInitialize(void) {}
– Fixed-point version used void CodecPushPixel(short p) { C_DATAO_REG = (char)p; }
– Total chip area:
short CodecPopPixel(void) {
• CODEC module in software changed return ((C_DATAI_REG << 8) | C_DATAI_REG);
} • 128,000 gates
similar to UART/CCDPP in void CodecDoFdct(void) {
implementation 2
C_CMND_REG = 1;
while( C_STAT_REG == 1 ) { /* busy wait */ } • Significant increase over previous implementations
}
31 32
Embedded Systems Design: A Unified Hardware/Software Introduction Embedded Systems Design: A Unified Hardware/Software Introduction
Performance (second)
Implementation 2 Implementation 3 Implementation 4
9.1 1.5 0.099 • Digital camera example
Power (watt) 0.033 0.033 0.040
Size (gate)
Energy (joule)
98,000
0.30
90,000
0.050
128,000
0.0040
– Specifications in English and executable language
• Implementation 3 – Design metrics: performance, power and area
– Close in performance • Several implementations
– Cheaper
– Less time to build – Microcontroller: too slow
• Implementation 4 – Microcontroller and coprocessor: better, but still too slow
– Great performance and energy consumption – Fixed-point arithmetic: almost fast enough
– More expensive and may miss time-to-market window
• If DCT designed ourselves then increased NRE cost and time-to-market
– Additional coprocessor for compression: fast enough, but
• If existing DCT purchased then increased IC cost expensive and hard to design
• Which is better? Tradeoffs between hw/sw – the main lesson of this book!
33 34
Embedded Systems Design: A Unified Hardware/Software Introduction Embedded Systems Design: A Unified Hardware/Software Introduction