0% found this document useful (0 votes)
64 views96 pages

Chapter 10

The document discusses data reduction techniques for reducing storage space, computing time, and transmission time. It describes two main types of data reduction algorithms: significant point extraction algorithms like Turning Point (TP) and AZTEC, and variable bit length encoding like Huffman coding. It provides details on the TP algorithm including its selection strategy and an example. It also describes how the AZTEC algorithm works to piecewise linearly approximate signals like ECG data to achieve data compression.

Uploaded by

Asmaa Mosbeh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views96 pages

Chapter 10

The document discusses data reduction techniques for reducing storage space, computing time, and transmission time. It describes two main types of data reduction algorithms: significant point extraction algorithms like Turning Point (TP) and AZTEC, and variable bit length encoding like Huffman coding. It provides details on the TP algorithm including its selection strategy and an example. It also describes how the AZTEC algorithm works to piecewise linearly approximate signals like ECG data to achieve data compression.

Uploaded by

Asmaa Mosbeh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 96

Chapter 10

 Data Reduction Techniques


Reasons for data reduction
 Reduce storage space
 Reduce computing time
 Reduce transmission time
Data reduction algorithms
 Significant point extraction (lossy)
 TP (Turning Point)
 AZTEC (Amplitude Zone Time Epoch
Coding)
 CORTES (Co-Ordinate Reduction Time
Encoding System)
 Fan

 Variable bit length (lossless)


 Huffman coding
TP (Turning Point) algorithm
1 4 7

2 5 8

3 6 9

Pattern s1 = sign(X1 – X0) s2 = sign(X2 – X1) NOT(s1) OR Saved sample


(s1 + s2)
1 +1 +1 1 X2
2 +1 –1 0 X1
3 +1 0 1 X2
4 –1 +1 0 X1
5 –1 –1 1 X2
6 –1 0 1 X2
7 0 +1 1 X2
8 0 –1 1 X2
9 0 0 1 X2
TP (Turning Point) algorithm
1 4 7

2 5 8

3 6 9

Pattern s1 = sign(X1 – X0) s2 = sign(X2 – X1) NOT(s1) OR Saved sample


(s1 + s2)
1 +1 +1 1 X2
2 +1 –1 0 X1
3 +1 0 1 X2
4 –1 +1 0 X1
5 –1 –1 1 X2
6 –1 0 1 X2
7 0 +1 1 X2
8 0 –1 1 X2
9 0 0 1 X2
TP selection strategy
x(nT – 2T)

x(nT – T)

x(nT – 2T)
x(nT)

x(nT – T)

x(nT)
y(nT) = x(nT) y(nT) = x(nT – T)
TP selection strategy
x(nT – 2T)

x(nT – T)

x(nT – 2T)
x(nT)

x(nT – T)

x(nT)
y(nT) = x(nT) y(nT) = x(nT – T)
TP selection strategy
x(nT – 2T)

x(nT – T)

x(nT – 2T)
x(nT)

x(nT – T)

x(nT)
y(nT) = x(nT) y(nT) = x(nT – T)
TP selection strategy
x(nT – 2T)

x(nT – T)

x(nT – 2T)
x(nT)

x(nT – T)

x(nT)
y(nT) = x(nT) y(nT) = x(nT – T)
Turning point algorithm example

Original
signal 14 data points

Discard
every
other
point
Turning point algorithm example

Original
signal 14 data points

Discard
every
other 7 data points
point
Turning point algorithm example

Original
signal 14 data points

Discard
every
other 7 data points
point
Turning point algorithm example

Original
signal 14 data points

Discard
every
other 7 data points
point
Turning point algorithm example

Original
signal 14 data points

Discard
every
other 7 data points
point

Turning
point
Turning point algorithm example

Original
signal 14 data points

Discard
every
other 7 data points
point

Turning
point
Turning point algorithm example

Original
signal 14 data points

Discard
every
other 7 data points
point

Turning
point
Turning point algorithm example

Original
signal 14 data points

Discard
every
other 7 data points
point

Turning
point
Turning point algorithm example

Original
signal 14 data points

Discard
every
other 7 data points
point

Turning
point
Turning point algorithm example

Original
signal 14 data points

Discard
every
other 7 data points
point

Turning
point
Turning point algorithm example

Original
signal 14 data points

Discard
every
other 7 data points
point

Turning
point
Turning point algorithm example

Original
signal 14 data points

Discard
every
other 7 data points
point

Turning
point
Turning point algorithm example

Original
signal 14 data points

Discard
every
other 7 data points
point

Turning
point
Turning point algorithm example

Original
signal 14 data points

Discard
every
other 7 data points
point

Turning
point
Turning point algorithm example

Original
signal 14 data points

Discard
every
other 7 data points
point

Turning 7 data points


point
Turning point algorithm example

Original
signal 14 data points

Discard
every
other 7 data points
point

Turning 7 data points


point
Percent RMS difference (PRD)
 Numerical error computation
1
⎧ n ⎫
2 2
⎪ ∑ [ x org (i) − xrec (i)] ⎪
⎪ i=1 ⎪
PRD = ⎨
n
⎬ ×100 %
⎪ 2 ⎪
⎪ ∑ [ x org (i)] ⎪
⎩ i=1 ⎭


Data reduction – TP
Original signal
Data range = –75 to +85 (160 levels)

Turning point
Reduction 2:1; PRD = 3.8%
AZTEC algorithm
 Amplitude Zone Time Epoch Coding
 Piecewise linear approximation of the
ECG
 Originally for preprocessing ECGs for
rhythm analysis
AZTEC Zero Order Interpolation
(ZOI)
ECG sampled at 200 sps
Vth
ECG sampled at 200 sps
ECG sampled at 200 sps
ECG sampled at 200 sps
ECG sampled at 200 sps
ECG sampled at 200 sps
ECG sampled at 200 sps
ECG sampled at 200 sps
ECG sampled at 200 sps
ECG sampled at 200 sps
ECG sampled at 200 sps
ECG sampled at 200 sps
ECG sampled at 200 sps
AZTEC encoding of ECG –
3.3 to 1 reduction
AZTEC data structure
{–2, 83, 22, 83, 18, 77, 4, 101, –5, –232, –4, 141, 24, 141, 21, 164}
L, A, L, A, L, A, L, A, L, A, L, A, L, A, L, A
Data reduction – AZTEC
Original signal
Data range = –75 to +85 (160 levels)

AZTEC Threshold = 5
Reduction 2.99:1; PRD = 18.2%

AZTEC Threshold = 10
Reduction 5.4:1; PRD = 16.5%

AZTEC Threshold = 20
Reduction 8.4:1; PRD = 16.8%
AZTEC flow chart
Start A

Vmxi = Vmni = ECGt


LineMode = _PLATEAU
LineLen = 1
cAZT = 1 *AZT++ = -1 * Tsi
*AZT++ = V1 Y N
LineMode =
*AZT++ = T1 T1 > 2
_PLATEAU
*AZT++ = V1
Wait for
next
cAZT += 4
sample LineMode =
N Y
! _PLATEAU
Y *AZT++ = T1
V = ECGt
Vmx = Vmxl T1 > 2 *AZT++ = V1
Vmn = Vmnl cAZT += 2
LineLen += 1
*AZT++ = -1 * Tsi Y N
(V1 - Vsi) *
*AZT++ = Vsi Sign < 0
Y cAZT += 2 LineMode = _SLOPE
LineLen > 50 Tsi = 0 Vsi = *(AZT - 1)
Vsi = V1
N
Sign *= –1
N
Y
Y
Vmx < V Vmxi = V Tsi += T1 V1 - Vsi < 0 Sign = -1
Vsi = V1
N N
Y
Vmn > V Vmni = V Sign = 1

N
Tsi = 0
Y Vsi = V1
Vmxi - Vmni
< Vth

N
Vmxi = Vmni = V
T1 = NUM - 1
LineLen = 1
V1 = (Vmx + Vmn)/2

B A
B
CORTES algorithm
 Coordinate Reduction Time Encoding
System
 Hybrid of the TP and AZTEC algorithms
 Uses AZTEC zero-order-interpolation (ZOI)
for lower-frequency regions (e.g., baseline)
 Uses TP for higher frequency regions (e.g.,
QRS complex)
CORTES encoding of ECG –
2.3 to 1 reduction
CORTES encoding of ECG –
2.3 to 1 reduction
CORTES encoding of ECG –
2.3 to 1 reduction
AZTEC and CORTES
encoding example

Original
ECG

AZTEC CORTES

CORTES
AZTEC
with LPF
with LPF
Fan algorithm
 Draws lines between pairs of starting
and ending points so that all
intermediate samples are within some
specified error tolerance
 Final stored samples not equally spaced
 Similar algorithm is Scan-Along
Approximation (SAPA)
Fan algorithm example

U2
L2 U3
U1
ε
ε
Amplitude

L1 L3

Saved samples
Eliminated samples

t  0 t 1 t 2 t 3 t 4 time


Fan algorithm example

U2
L2 U3
U1
ε
ε
Amplitude

L1 L3

Saved samples
Eliminated samples

t  0 t 1 t 2 t 3 t 4 time


Fan algorithm example

U2
L2 U3
U1
ε
ε
Amplitude

L1 L3

Saved samples
Eliminated samples

t  0 t 1 t 2 t 3 t 4 time


Fan algorithm example

U2
L2 U3
U1
ε
ε
Amplitude

L1 L3

Saved samples
Eliminated samples

t  0 t 1 t 2 t 3 t 4 time


Fan algorithm example

U2
L2 U3
U1
ε
ε
Amplitude

L1 L3

Saved samples
Eliminated samples

t  0 t 1 t 2 t 3 t 4 time


Fan algorithm example

U2
L2 U3
U1
ε
ε
Amplitude

L1 L3

Saved samples
Eliminated samples

t  0 t 1 t 2 t 3 t 4 time


Fan algorithm example

U2
L2 U3
U1
ε
ε
Amplitude

L1 L3

Saved samples
Eliminated samples

t  0 t 1 t 2 t 3 t 4 time


Fan algorithm example

U2
L2 U3
U1
ε
ε
Amplitude

L1 L3

Saved samples
Eliminated samples

t  0 t 1 t 2 t 3 t 4 time


Data reduction – Fan
Original signal
Data range = –75 to +85 (160 levels)

Fan Threshold = 5
Reduction 3.6:1; PRD = 4.6%

Fan Threshold = 10
Reduction 6.8:1; PRD = 11.5%
An encoded poem
Sir, I send a rhyme excelling
In sacred truth and rigid spelling
Numerical sprites elucidate
For me the lexiconʼs full weight.
An encoded poem - decoded
3.14159265358979323846 π
Sir, I send a rhyme excelling
3 1 4 1 5 9
In sacred truth and rigid spelling
π

2 6 5€ 3 5 8
Numerical sprites elucidate
9 7 € 9
For me the lexiconʼs full weight.
3 2 3 8 4 6
Steganography
 The art and science of writing hidden
messages
Image of a tree.
By removing all but the
last 2 bits of each color
component, an almost
completely black image
results. Making the
resulting image 85 times
brighter results in the
image of the cat.

From Wikipedia, 2005.


Steganography
 The art and science of writing hidden
messages
Image of a tree.
By removing all but the
last 2 bits of each color
component, an almost
completely black image
results. Making the
resulting image 85 times
brighter results in the
image of the cat.

From Wikipedia, 2005.


Imperfect alphanumeric coding
Imperfect alphanumeric coding
 y cn ndrstnd ths bcs f th mzng cmptng
pwr f th hmn brn
(54 characters)
Imperfect alphanumeric coding
 y cn ndrstnd ths bcs f th mzng cmptng
pwr f th hmn brn
(54 characters)
 You can understand this because of the
amazing computing power of the human
brain.
(82 characters)
Imperfect alphanumeric coding
 y cn ndrstnd ths bcs f th mzng cmptng
pwr f th hmn brn
(54 characters)
 You can understand this because of the
amazing computing power of the human
brain.
(82 characters)
 82 : 54 reduction ≈ 1.5 : 1
Morse code
Telegraphy code.
Most frequent
characters take the
least time to send.
Lincolnʼs Gettysburg Address

President Abraham Lincoln spoke for two


or three minutes. Lincolnʼs ”few appropriate
remarks” summarized the war in 10
sentences and 272 words.
November 19, 1863.
Lincolnʼs Gettysburg Address

President Abraham Lincoln spoke for two


or three minutes. Lincolnʼs ”few appropriate
remarks” summarized the war in 10
sentences and 272 words.
November 19, 1863.
Text of Gettysburg Address
Example of Huffman coding
strategy
Huffman coded message
Huffman coded message
 EAT A BEET
 ASCII - 10 characters @ 8 bits each - total of 80 bits
 Huffman - total of 38 bits
Huffman coded message
 EAT A BEET
 ASCII - 10 characters @ 8 bits each - total of 80 bits
 Huffman - total of 38 bits
 Codes
 Space = 1
 E = 01; A = 0001; T = 001
 B = 0000 0000 0110 0010
Huffman coded message
 EAT A BEET
 ASCII - 10 characters @ 8 bits each - total of 80 bits
 Huffman - total of 38 bits
 Codes
 Space = 1
 E = 01; A = 0001; T = 001
 B = 0000 0000 0110 0010
 Coded message
01 0001 001 1 0001 1 0000000001100010 01 01 001
E A T (sp) A (sp) B E E T
Huffman coded message
 EAT A BEET
 ASCII - 10 characters @ 8 bits each - total of 80 bits
 Huffman - total of 38 bits
 Codes
 Space = 1
 E = 01; A = 0001; T = 001
 B = 0000 0000 0110 0010
 Coded message
01 0001 001 1 0001 1 0000000001100010 01 01 001
E A T (sp) A (sp) B E E T
 Reduction ratio - 80 : 38 ≈ 2 : 1
Huffman coded message
 Coded message
01 0001 001 1 0001 1 0000000001100010 01 01 001
E A T (sp) A (sp) B E E T
 Potential bit-error problems
01 0101 001 1 0001 1 0000000001100010 11 01 001
E E E T (sp) A (sp) B (sp)(sp)E T
Huffman example
 28 data points; 7 distinct symbols
(quantized levels)
{1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 6, 6, 7}

S Lists of P
i i

7/28 = .25 1 .25 .25 .25 .32 .43 .57 1.0


6/28 = .21 2 .21 .21 .22 .25 .32 .43
5/28 = .18 3 .18 .18 .21 .22 .25
4/28 = .14 4 .14 .14 .18 .21
3/28 = .11 5 .11 .11 .14
2/28 = .07 6 .07 .11
1/28 = .04 7 .04
Huffman tree
S Lists of P
i i
1.0
1 .25 .25 .25 .32 .43 .57 1.0
2 .21 .21 .22 .25 .32 .43
3 .18 .18 .21 .22 .25 1 0
4 .14 .14 .18 .21
5 .11 .11 .14
6 .07 .11 .57 .43
7 .04

1 0 1 0

.32 .25 .22 .21

1 0 1 1 0 2

.18 .14 .11 .11

3 4 5 1 0

.07 .04

6 7
1.0

1 0

.57 .43

1 0 1 0

.32 .25 .22 .21

1 0 1 1 0 2

.18 .14 .11 .11

3 4 5 1 0

.07 .04

6 7
1.0
7
E(l) = ∑ l i Pi
1 0

.57 .43

1 0 1 0
i=1
.32 .25 .22 .21

1 0 1 1 0 2 li represents the length of Huffman


.18 .14 .11 .11 code for the symbols
3 4 5 1 0

.07 .04

6 7
1.0
7
E(l) = ∑ l i Pi
1 0

.57 .43

1 0 1 0
i=1
.32 .25 .22 .21

1 0 1 1 0 2 li represents the length of Huffman


.18 .14 .11 .11 code for the symbols
3 4 5 1 0

.07 .04

6 7

li l i Pi
2 2 × 0.25 = 0.50
2 2 × 0.21 = 0.42
3 3 × 0.18 = 0.54
€ €3 3 × 0.14 = 0.42
3 3 × 0.11 = 0.33
4 4 × 0.07 = 0.28
4 4 × 0.04 = 0.16
E(l) = 2.65
1.0
7
E(l) = ∑ l i Pi
1 0

.57 .43

1 0 1 0
i=1
.32 .25 .22 .21

1 0 1 1 0 2 li represents the length of Huffman


.18 .14 .11 .11 code for the symbols
3 4 5 1 0
€ E(l) = 2.65 in this example, resulting in
.07 .04
an expected reduction ratio of 3 : 2.65
6 7

li l i Pi
2 2 × 0.25 = 0.50
2 2 × 0.21 = 0.42
3 3 × 0.18 = 0.54
€ €3 3 × 0.14 = 0.42
3 3 × 0.11 = 0.33
4 4 × 0.07 = 0.28
4 4 × 0.04 = 0.16
E(l) = 2.65
Modified Huffman coding –
adding infrequent symbols
 Frequent set and infrequent set
 Reduces size of the translation table
Adaptive coding
 Builds translation table as data are presented
 Example is Lempel-Ziv-Welch (LZW) algorithm
 Uses fixed-size table
 Initializes some positions of table for some chosen
data sets
 When new data encountered, uninitialized positions
are used so that each unique data word is assigned
its own position
 When table is full, oldest or least-used position is
reinitialized according to the new data
 During data reconstruction, translation table is
incrementally rebuilt from the encoded data
Encoding the ECG first
difference
 Typically neighboring signal amplitudes
are not statistically independent
 Amplitude range of difference signal is
smaller than that of the original signal,
thus less bits required per sample point
 First difference selectable in UW
DigiScope Huffman coding module
First difference changes the
statistics
60

50

Original ECG 40

Data range is 85 to –75 30

Data reduction is 3.5 to 1 20

10

5
1
7
3
9
5
1
7
3
9
5
1
7
9

13
19
31
47
53
77
85
-7
-1
-7
-7
-6
-6
-5
-5
-5
-4
-4
-3
-3
-3
-2
-1
First difference changes the
statistics
60

50

Original ECG 40

Data range is 85 to –75 30

Data reduction is 3.5 to 1 20

10

5
1
7
3
9
5
1
7
3
9
5
1
7
9

13
19
31
47
53
77
85
-7
-1
-7
-7
-6
-6
-5
-5
-5
-4
-4
-3
-3
-3
-2
-1
160

140

First difference 120

Data range is 34 to –63 100

80

Data reduction is 5.6 to 1 60

40

20

-8

-4
3

12

22

26

30

34
-6

-3

-2

-2

-1
First difference changes the
statistics
60

50

Original ECG 40

Data range is 85 to –75 30

Data reduction is 3.5 to 1 20

10

5
1
7
3
9
5
1
7
3
9
5
1
7
9

13
19
31
47
53
77
85
-7
-1
-7
-7
-6
-6
-5
-5
-5
-4
-4
-3
-3
-3
-2
-1
160

140

First difference 120

Data range is 34 to –63 100

80

Data reduction is 5.6 to 1 60

40

20

-8

-4
3

12

22

26

30

34
-6

-3

-2

-2

-1
Signal can be exactly reconstructed.
ECG beat subtraction and
encoding of residual

1. Make template of typical beat.


2. Align template with each QRS complex and subtract from signal.
3. Huffman encode residual signal.
4. After reconstruction, add the template to the signal at each QRS location.
Run-length encoding
 Used for FAX
 Example
{1, 1, 1, 1, 1, 3, 3, 3, 3, 0, 0, 0}
 Encoded
{1, 5, 3, 4, 0, 3}
 12 samples reduced to 6 samples – 2 : 1
data reduction
The End

You might also like