0% found this document useful (0 votes)
154 views

Lecture 6 - Digital Camera Example

The document summarizes the key components and functionality of a simple digital camera: 1) It captures images using a charge-coupled device sensor that converts light exposure to digital values and stores images in internal memory. 2) When the shutter is pressed, the image is captured by the CCD, converted to digital form, compressed, and stored in memory. 3) Images can then be uploaded and transmitted serially to a PC using special software commands to the camera.

Uploaded by

teddy tigabu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
154 views

Lecture 6 - Digital Camera Example

The document summarizes the key components and functionality of a simple digital camera: 1) It captures images using a charge-coupled device sensor that converts light exposure to digital values and stores images in internal memory. 2) When the shutter is pressed, the image is captured by the CCD, converted to digital form, compressed, and stored in memory. 3) Images can then be uploaded and transmitted serially to a PC using special software commands to the camera.

Uploaded by

teddy tigabu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Embedded Systems Design: A Unified

Outline
Hardware/Software Introduction
• Introduction to a simple digital camera
Chapter 7 Digital Camera Example • Designer’s perspective
• Requirements specification

Courtesy: Prof. Vahid - Embedded Systems Design: A Unified Hardware/Software Introduction
Design
– Four implementations

1 2
Embedded Systems Design: A Unified Hardware/Software Introduction

Introduction to a simple digital camera Designer’s perspective

• Captures images • Two key tasks


• Stores images in digital format
– No film – Processing images and storing in memory
– Multiple images stored in camera • When shutter pressed:
• Number depends on amount of memory and bits used per image – Image captured
• Downloads images to PC – Converted to digital form by charge-coupled device (CCD)
• Only recently possible – Compressed and archived in internal memory
– Systems-on-a-chip
• Multiple processors and memories on one IC – Uploading images to PC
– High-capacity flash memory • Digital camera attached to PC
• Very simple description used for example • Special software commands camera to transmit archived
– Many more features with real digital camera images serially
• Variable size images, image deletion, digital stretching, zooming in and out, etc.

3 4
Embedded Systems Design: A Unified Hardware/Software Introduction Embedded Systems Design: A Unified Hardware/Software Introduction

Charge-coupled device (CCD) Zero-bias error

• Special sensor that captures an image • Manufacturing errors cause cells to measure slightly above or below actual
light intensity
• Light-sensitive silicon solid-state device composed of many cells
• Error typically same across columns, but different across rows
When exposed to light, each • Some of left most columns blocked by black paint to detect zero-bias error
cell becomes electrically The electromechanical shutter
charged. This charge can Lens area is activated to expose the – Reading of other than 0 in blocked cells is zero-bias error
then be converted to a 8-bit cells to light for a brief
value where 0 represents no Covered columns Electro- – Each row is corrected by subtracting the average error found in blocked cells for
moment.
exposure while 255 mechanical that row
shutter Covered
represents very intense The electronic circuitry, when cells Zero-bias
Pixel rows

exposure of that cell to light. commanded, discharges the adjustment


Electronic
circuitry cells, activates the -13 123 157 142 127 131 102 99 235
Some of the columns are electromechanical shutter, -11 134 135 157 112 109 106 108 136
-9 135 144 159 108 112 118 109 126
covered with a black strip of and then reads the 8-bit 0 176 183 161 111 186 130 132 133
paint. The light-intensity of charge value of each cell. -7 137 149 154 126 185 146 131 132
these pixels is used for zero- Pixel columns These values can be clocked -1 121 130 127 146 205 150 130 126
-4 117 151 160 181 250 161 134 125
bias adjustments of all the out of the CCD by external -5 168 170 171 178 183 179 112 124
cells. logic through a standard
parallel bus interface. Before zero-bias adjustment After zero-bias adjustment

5 6
Embedded Systems Design: A Unified Hardware/Software Introduction Embedded Systems Design: A Unified Hardware/Software Introduction

1
Compression DCT step

• Store more images • Transforms original 8 x 8 block into a cosine-frequency


• Transmit image to PC in less time domain
• JPEG (Joint Photographic Experts Group) – Upper-left corner values represent more of the essence of the image
– Popular standard format -compressed – Lower-right corner values represent finer details
– Different modes of operation • Can reduce precision of these values and retain reasonable image quality
– Image data divided into blocks of 8 x 8 pixels • FDCT (Forward DCT) formula
3 steps performed on each block – C(h) = if (h == 0) then 1/sqrt(2) else 1.0
• Auxiliary function used in main function F(u,v)
– DCT – F(u,v) = ¼ x C(u) x C(v) Σx=0..7 Σy=0..7 Dxy x cos(π(2u + 1)u/16) x cos(π(2y + 1)v/16)
– Quantization • Gives encoded pixel at row u, column v
– Huffman encoding • Dxy is original pixel value at row x, column y

• IDCT (Inverse DCT)


– Reverses process to obtain original block (not needed for this design)

7 8
Embedded Systems Design: A Unified Hardware/Software Introduction Embedded Systems Design: A Unified Hardware/Software Introduction

Fourier Transform Analogy with JPEG DCT Process

9 10
Embedded Systems Design: A Unified Hardware/Software Introduction Embedded Systems Design: A Unified Hardware/Software Introduction

Quantization step Huffman encoding step

• Achieve high compression ratio by reducing image • Serialize 8 x 8 block of pixels


quality – Values are converted into single list using zigzag pattern
– Reduce bit precision of encoded data
• Fewer bits needed for encoding
• One way is to divide all values by a factor of 2
– Simple right shifts can do this
– Dequantization would reverse process for decompression • Perform Huffman encoding
1150 39 -43 -10 26 -83 11 41
Divide each cell’s
144 5 -5 -1 3 -10 1 5 – More frequently occurring pixels assigned short binary code
-81 -3 115 -73 -6 -2 22 -5 -10 0 14 -9 -1 0 3 -1
14 -11 1 -42 26 -3 17 -38 value by 8 2 -1 0 -5 3 0 2 -5 – Longer binary codes left for less frequently occurring pixels
2 -61 -13 -12 36 -23 -18 5 0 -8 -2 -2 5 -3 -2 1
44
36
13 37
-11 -9
-4
-4
10
20
-21
-28
7
-21
-8
14
6
5
2
-1
5
-1
-1
-1
1
3
-3
-4
1
-3
-1
2 • Each pixel in serial list converted to Huffman encoded values
– Much shorter list, thus compression
-19 -7 21 -6 3 3 12 -21 -2 -1 3 -1 0 0 2 -3
-5 -13 -11 -17 -4 -1 7 -4 -1 -2 -1 -2 -1 0 1 -1

After being decoded using DCT After quantization

11 12
Embedded Systems Design: A Unified Hardware/Software Introduction Embedded Systems Design: A Unified Hardware/Software Introduction

2
Huffman encoding example Requirements Specification
• Pixel frequencies on left
– Pixel value –1 occurs 15 times
• System’s requirements – what system should do
– Pixel value 14 occurs 1 time – Nonfunctional requirements
Huffman
• Build Huffman tree from bottom up Pixel Huffman tree
• Constraints on design metrics (e.g., “should use 0.001 watt or less”)
frequencies codes
– Create one leaf node for each pixel 64
value and assign frequency as node’s -1 15x -1
0
00 – Functional requirements
value 0 8x 100

– Create an internal node by joining any -2 6x


29
3
5
-2 110 • System’s behavior (e.g., “output X should be input Y times 2”)
1 5x 1 010
two nodes whose sum is a minimal
value
2 5x
18 1
2
3
1110
1010
• Initial specification may be very general and come from marketing dept.
3 5x 14
7
– E.g., short document detailing market need for a low-end digital camera that:
1 5
• This sum is internal nodes value 5 5x 5
0110

-3 4x -1 -3 11110
– Repeat until complete binary tree 9 1 1
-5 3x 5 8 0 6 1 -5 10110
• captures and stores at least 50 low-res images and uploads to PC,
• Traverse tree from root to leaf to -10 2x 1 0 -2 -10 01110

obtain binary code for leaf’s pixel 144 1x


5
4
5
5 5
6 144 111111 • costs around $100 with single medium-size IC costing less that $25,
-9 1x -9 111110
value -8 1x 5 3
2
2
2
-8 101111 • as long as possible battery life (at least ),
2 3 4 -4 101110
– Append 0 for left traversal, 1 for right -4 1x 2
-5 -3 6 011111 • has expected sales volume of 200,000 if market entry < 6 months,
6 1x -10
traversal 1 1 1 1 14 011110
14 1x 1 1 • 100,000 if between 6 and 12 months,
• Huffman encoding is reversible 14 6 -4 -8 -9 144
– No code is a prefix of another code • insignificant sales beyond 12 months

13 14
Embedded Systems Design: A Unified Hardware/Software Introduction Embedded Systems Design: A Unified Hardware/Software Introduction

Nonfunctional requirements (cont.) Informal functional specification


• Performance • Flowchart breaks functionality
– Must process image fast enough to be useful down into simpler functions
– 1 sec reasonable constraint
• Slower would be annoying CCD Zero-bias adjust
input
• Faster not necessary for low-end of market • Each function’s details could then
– Therefore, constrained metric DCT
be described in English
• Size
– Done earlier in chapter
– Must use IC that fits in reasonably sized camera Quantize
yes

– Constrained and optimization metric no Done?


Archive in
• Constraint may be 200,000 gates, but smaller would be cheaper
• Low quality image has resolution memory

• Power of 64 x 64
– Must operate below certain temperature (cooling fan not possible) yes More no Transmit serially
serial output
8×8
– Therefore, constrained metric blocks? e.g., 011010...

• Energy • Mapping functions to a particular


– Reducing power or time reduces energy processor type not done at this
– Optimized metric: want battery to last as long as possible stage

15 16
Embedded Systems Design: A Unified Hardware/Software Introduction Embedded Systems Design: A Unified Hardware/Software Introduction

Refined functional specification Putting it all together

Executable model of digital camera • Main initializes all modules, then uses CNTRL module to capture,
compress, and transmit one image
1010110101101
CCD.C
• This system-level model can be used for extensive experimentation
0101001010110
1... – Bugs much easier to correct here rather than in later models

CCDPP.C CODEC.C

image file int main(int argc, char *argv[]) {


char *uartOutputFileName = argc > 1 ? argv[1] : "uart_out.txt";
char *imageFileName = argc > 2 ? argv[2] : "image.txt";
/* initialize the modules */
CNTRL.C UartInitialize(uartOutputFileName);
1010101010101 CcdInitialize(imageFileName);
0101010101010 CcdppInitialize();
10... CodecInitialize();
CntrlInitialize();
/* simulate functionality */
CntrlCaptureImage();
UART.C CntrlCompressImage();
CntrlSendImage();
}
output file

17 18
Embedded Systems Design: A Unified Hardware/Software Introduction Embedded Systems Design: A Unified Hardware/Software Introduction

3
Design Implementation 1: Microcontroller alone
• Determine system’s architecture • Low-end processor could be Intel 8051 microcontroller
– Processors
• Any combination of single-purpose (custom or standard) or general-purpose processors • Total IC cost including NRE about $5
– Memories, buses • Well below 200 mW power
• Map functionality to that architecture
– Multiple functions on one processor • Time-to-market about 3 months
– One function on one or more processors • However, one image per second not possible
• Implementation – 12 MHz, 12 cycles per instruction
– A particular architecture and mapping
• Executes one million instructions per second
– Solution space is set of all implementations
• Starting point – CcdppCapture has nested loops resulting in 4096 (64 x 64) iterations
– Low-end general-purpose processor connected to flash memory • ~100 assembly instructions each iteration
• All functionality mapped to software running on processor • 409,000 (4096 x 100) instructions per image
• Usually satisfies power, size, and time-to-market constraints • Half of budget for reading image alone
• If timing constraint not satisfied then later implementations could:
– use single-purpose processors for time-critical functions – Would be over budget after adding compute-intensive DCT and Huffman
– rewrite functional specification encoding

19 20
Embedded Systems Design: A Unified Hardware/Software Introduction Embedded Systems Design: A Unified Hardware/Software Introduction

Implementation 2:
Microcontroller
Microcontroller and CCDPP
EEPROM
• Synthesizable version of Intel 8051 available
8051 RAM
– Written in VHDL
– Captured at register transfer level (RTL)
SOC UART CCDPP
• Fetches instruction from ROM Block diagram of Intel 8051 processor core

• Decodes using Instruction Decoder Instruction 4K ROM


Decoder
• ALU executes arithmetic operations
• CCDPP function implemented on custom single-purpose processor Controller
– Source and destination registers reside in 128
– Improves performance – less microcontroller cycles RAM
ALU
RAM

– Increases NRE cost and time-to-market • Special data movement instructions used to
– Easy to implement load and store externally
To External Memory Bus

• Simple datapath • Special program generates VHDL description


• Few states in controller of ROM from output of C compiler/linker
• Simple UART easy to implement as single-purpose processor also
• EEPROM for program memory and RAM for data memory added as well

21 22
Embedded Systems Design: A Unified Hardware/Software Introduction Embedded Systems Design: A Unified Hardware/Software Introduction

Implementation 2: Implementation 3: Microcontroller and


Microcontroller and CCDPP CCDPP/Fixed-Point DCT
• Analysis of implementation 2 • 9.1 seconds still doesn’t meet performance constraint
– Total execution time for processing one image: of 1 second
• 9.1 seconds • DCT operation prime candidate for improvement
– Power consumption: – Execution of implementation 2 shows microprocessor
• 0.033 watt spends most cycles here
– Energy consumption: – Could design custom hardware like we did for CCDPP
• 0.30 joule (9.1 s x 0.033 watt)
• More complex so more design effort
– Total chip area: – Instead, will speed up DCT functionality by modifying
• 98,000 gates behavior

23 24
Embedded Systems Design: A Unified Hardware/Software Introduction Embedded Systems Design: A Unified Hardware/Software Introduction

4
DCT floating-point cost Fixed-point arithmetic

• Floating-point cost • Integer used to represent a real number


– DCT uses ~260 floating-point operations per pixel transformation – Constant number of integer’s bits represents fractional portion of real number
• More bits, more accurate the representation
– 4096 (64 x 64) pixels per image
– Remaining bits represent portion of real number before decimal point
– 1 million floating-point operations per image
• Translating a real constant to a fixed-point representation
– No floating-point support with Intel 8051
– Multiply real value by 2 ^ (# of bits used for fractional part)
• Compiler must emulate
– Round to nearest integer
– Generates procedures for each floating-point operation
– E.g., represent 3.14 as 8-bit integer with 4 bits for fraction
• mult, add,
• 2^4 = 16
– Each procedure uses tens of integer operations
• 3.14 x 16 = 50.24 ≈ 50 = 00110010
– Thus, > 10 million integer operations per image • 16 (2^4) possible values for fraction, each represents 0.0625 (1/16)
– Procedures increase code size • Last 4 bits (0010) = 2

• Floating Fixed-point arithmetic can improve on this 2 x 0.0625 = 0.125
• 3(0011) + 0.125 = 3.125 ≈ 3.14 (more bits for fraction would increase accuracy)

25 26
Embedded Systems Design: A Unified Hardware/Software Introduction Embedded Systems Design: A Unified Hardware/Software Introduction

Fixed-point arithmetic operations Fixed-point implementation of CODEC


• Addition • COS_TABLE gives 8-bit fixed-point
static const char code COS_TABLE[8][8] = {
{ 64, 62, 59, 53, 45, 35, 24, 12 },
– Simply add integer representations representation of cosine values { 64, 53, 24, -12, -45, -62, -59, -35 },
{ 64, 35, -24, -62, -45, 12, 59, 53 },
– E.g., 3.14 + 2.71 = 5.85 { 64, 12, -59, -35, 45, 53, -24, -62 },

• 3.14 → 50 = 00110010 • 6 bits used for fractional portion {


{
64,
64,
-12,
-35,
-59,
-24,
35,
62,
45,
-45,
-53,
-12,
-24,
59,
62 },
-53 },
• 2.71 → 43 = 00101011 { 64, -53, 24, 12, -45, 62, -59, 35 },

• 50 + 43 = 93 = 01011101 • Result of multiplications shifted right };


{ 64, -62, 59, -53, 45, -35, 24, -12 }

• 5(0101) + 13(1101) x 0.0625 = 5.8125 ≈ 5.85 by 6 static const char ONE_OVER_SQRT_TWO = 5;

• Multiply static unsigned char C(int h) { return h ? 64 : ONE_OVER_SQRT_TWO;}


static short xdata inBuffer[8][8], outBuffer[8][8], idx;

static int F(int u, int v, short img[8][8]) { void CodecInitialize(void) { idx = 0; }


– Multiply integer representations long s[8], r = 0; void CodecPushPixel(short p) {
– Shift result right by # of bits in fractional part unsigned char x, j; if( idx == 64 ) idx = 0;
for(x=0; x<8; x++) {
– E.g., 3.14 * 2.71 = 8.5094 s[x] = 0;
inBuffer[idx / 8][idx % 8] = p << 6; idx++;
}
• 50 * 43 = 2150 = 100001100110 for(j=0; j<8; j++)
void CodecDoFdct(void) {
s[x] += (img[x][j] * COS_TABLE[j][v] ) >> 6;
• >> 4 = 10000110 }
unsigned short x, y;
for(x=0; x<8; x++)
• 8(1000) + 6(0110) x 0.0625 = 8.375 ≈ 8.5094 for(x=0; x<8; x++) r += (s[x] * COS_TABLE[x][u]) >> 6;
for(y=0; y<8; y++)
outBuffer[x][y] = F(x, y, inBuffer);
return (short)((((r * (((16*C(u)) >> 6) *C(v)) >> 6)) >> 6) >> 6); idx = 0;
• Range of real values used limited by bit widths of possible resulting values }
}

27 28
Embedded Systems Design: A Unified Hardware/Software Introduction Embedded Systems Design: A Unified Hardware/Software Introduction

Implementation 3: Microcontroller and Implementation 4:


CCDPP/Fixed-Point DCT Microcontroller and CCDPP/DCT
• Analysis of implementation 3 EEPROM RAM
8051

– Use same analysis techniques as implementation 2


– Total execution time for processing one image: SOC
CODEC UART CCDP

• 1.5 seconds
P

– Power consumption:
• 0.033 watt (same as 2)
• Performance close but not good enough
– Energy consumption:
• 0.050 joule (1.5 s x 0.033 watt) • Must resort to implementing CODEC in hardware
• Battery life 6x longer!! – Single-purpose processor to perform DCT on 8 x 8 block
– Total chip area:
• 90,000 gates
• 8,000 less gates (less memory needed for code)

29 30
Embedded Systems Design: A Unified Hardware/Software Introduction Embedded Systems Design: A Unified Hardware/Software Introduction

5
Implementation 4:
CODEC design
Microcontroller and CCDPP/DCT
• 4 memory mapped registers
– C_DATAI_REG/C_DATAO_REG used to
• Analysis of implementation 4
push/pop 8 x 8 block into and out of – Total execution time for processing one image:
CODEC
• 0.099 seconds (well under 1 sec)
– C_CMND_REG used to command
CODEC – Power consumption:
• Writing 1 to this register invokes CODEC • 0.040 watt
– C_STAT_REG indicates CODEC done • Increase over 2 and 3 because SOC has another processor
and ready for next block
• Polled in software Rewritten CODEC software – Energy consumption:
• Direct translation of C code to VHDL for static unsigned char xdata C_STAT_REG _at_ 65527;
static unsigned char xdata C_CMND_REG _at_ 65528;
• 0.00040 joule (0.099 s x 0.040 watt)
static unsigned char xdata C_DATAI_REG _at_ 65529;
actual hardware implementation static unsigned char xdata C_DATAO_REG _at_ 65530; • Battery life 12x longer than previous implementation!!
void CodecInitialize(void) {}
– Fixed-point version used void CodecPushPixel(short p) { C_DATAO_REG = (char)p; }
– Total chip area:
short CodecPopPixel(void) {
• CODEC module in software changed return ((C_DATAI_REG << 8) | C_DATAI_REG);
} • 128,000 gates
similar to UART/CCDPP in void CodecDoFdct(void) {

implementation 2
C_CMND_REG = 1;
while( C_STAT_REG == 1 ) { /* busy wait */ } • Significant increase over previous implementations
}

31 32
Embedded Systems Design: A Unified Hardware/Software Introduction Embedded Systems Design: A Unified Hardware/Software Introduction

Summary of implementations Summary

Performance (second)
Implementation 2 Implementation 3 Implementation 4
9.1 1.5 0.099 • Digital camera example
Power (watt) 0.033 0.033 0.040
Size (gate)
Energy (joule)
98,000
0.30
90,000
0.050
128,000
0.0040
– Specifications in English and executable language
• Implementation 3 – Design metrics: performance, power and area
– Close in performance • Several implementations
– Cheaper
– Less time to build – Microcontroller: too slow
• Implementation 4 – Microcontroller and coprocessor: better, but still too slow
– Great performance and energy consumption – Fixed-point arithmetic: almost fast enough
– More expensive and may miss time-to-market window
• If DCT designed ourselves then increased NRE cost and time-to-market
– Additional coprocessor for compression: fast enough, but
• If existing DCT purchased then increased IC cost expensive and hard to design
• Which is better? Tradeoffs between hw/sw – the main lesson of this book!
33 34
Embedded Systems Design: A Unified Hardware/Software Introduction Embedded Systems Design: A Unified Hardware/Software Introduction

You might also like