1 3+Data+Storage+and+Compression
1 3+Data+Storage+and+Compression
1 byte = 8 bits
…
Bit > Byte > Kilobyte > Megabyte > Gigabyte > Terabyte > Petabytes > Exabyte >
Well there is a problem….
0478
1.3 Data Storage
1 bit
1 nibble = 4 bits
1 byte = 2 nibbles = 8 bits
1 kilobyte (kb) = 1000 bytes
1 megabyte (mb) = 1000 kb
1 gigabyte (gb) = 1000 mb
1 terabyte (tb) = 1000 gb
1 petabyte (pb) = 1000 tb
1 exabyte (eb) = 1000 pb
And now we have another problem…. Because everyone still thinks kilobyte is 1024 and not 1000 like the
IEC wants.
So when someone says kilobyte does it mean 1000 bytes like the IEC wants or does it mean 1024 bytes like
IEC Terms
0478
1.3 Data Storage
So because of this confusion the IEC wants us to learn these new words
1 bit
1 nibble = 4 bits
1 byte = 8 bits
1 Kibibyte (KiB) = 1024 bytes
1 Mebibyte (MiB) = 1024 Kibibyte
1 Gibibyte (GiB) = 1024 Mebibyte
1 Tebibyte (TiB) = 1024 gibibyte
1 Pebibyte (PiB) = 1024 tebibyte
1 Exbibyte (EiB) = 1024 pebibyte
1 Zebibyte (ZiB) = 1024 exbibyte
All terms
0478
1.3 Data Storage
So these are all the IEC terms. But remember, every book, website, even in our
lessons ignore the IEC. But we need it for our exams
IEC Terms
1 bit 1 bit
1 nibble 4 bits 1 nibble 4 bits
1 byte 8 bits 1 byte 8 bits
1 kilobyte (kb) 1000 bytes 1 Kibibyte (KiB) 1024 bytes
1 megabyte (mb) 1000 kilobyte (kb) Mebibyte (MiB) 1024 Kibibyte (KiB)
1 gigabyte (gb) 1000 megabyte (mb) 1 Gibibyte (GiB) 1024 Mebibyte (MiB)
1 terabyte (tb) 1000 gigabyte (gb) 1 Tebibyte (TiB) 1024 gibibyte (GiB)
1 petabyte (pb) 1000 terabyte (tb) 1 Pebibyte (PiB) 1024 tebibyte (TiB)
1 exabyte (eb) 1000 petabyte (pb) 1 Exbibyte (EiB) 1024 pebibyte (PiB)
Your syllabus
0478
1.3 Data Storage
4. Understand how files are compressed using lossy and lossless compression
methods
5. Lossless compression reduces the file size without permanent loss of data, e.g.
run length encoding (RLE)
Worse quality
Where is lossy compression used?
0478
1.3 Data Storage
Sound
Images
Video
Lossy Sound – MP3
0478
1.3 Data Storage
Sound
Your stupid human ears can hear sounds from 20Hz to 20,000Hz
But as you get older, you lose the ability to hear. So there is no point having
frequencies inside your sound file that ‘most’ humans cannot hear.
Test your hearing, it is playing, I can start hearing it around 120Hz. It lasts for
around 2 minutes.
Lossy Image
0478
1.3 Data Storage
Your stupid human eyes cannot see all the colours and shades of colour that could
be in an image. It also cannot tell fine detail. Which one of these is lossy?
Lossy Image
ORIGINAL LOSSY
0478
1.3 Data Storage
ORIGINAL LOSSY
0478
1.3 Data Storage
Lossy Image
0478
1.3 Data Storage
Okay, so you cannot see the difference between original and lossy image unless
you zoom in…. But even though my example of the guitar girl is fine, I don’t want to
lie to you.
A real, 100%, pure image, with nothing removed is called a RAW image.
A JPEG image takes a RAW image, looks at the the data your stupid human eyes
cannot see (or cannot see well) and removes it
RAW vs JPEG
0478
1.3 Data Storage
Now a video is just one picture shown after another after another after another, so
the compression is the same.
(Okay, this is a lie, there are 3 ways that a video does compression…but its not on
your syllabus, but I’ll add to the end of this PPT)
Lossless
0478
1.3 Data Storage
Better quality
Where you use lossless?
0478
1.3 Data Storage
You use lossless wherever you must keep all of the data.
Text
Vector**
Vector images don’t have compression, because it’s built on math and coordinates,
you don’t need to compress it. The quality will always be the same regardless of
the size
Is lossless
CCCCCCCCCCWCCCCCCCCCCPPP
Can be written as
10C 1W 10C 3P
These next two topics are NOT in your syllabus, but are pretty cool and not so
difficult.
We have:
Lossless
ASCII
0478
1.3 Data Storage
In ASCII, every letter is either 7 bits for standard ASCII, but really everyone uses extended
ASCII so we say every ASCII letter is 8 bits.
Well what if we just count the number of times a letter happens (we actually also
count spaces and punctuation too)
How many times
0478
1.3 Data Storage
Letter Frequency
Batman’s real name is Bruce Wayne a 5
[SPACE] 5
e 4
So instead of using 8 bits for each letter, why don’t we n 3
B 2
say the most popular letters only uses 1 bit
m 2
s 2
Then the next most popular use two bits…and so on r 2
t 1
‘ 1
a=0 l 1
i 1
[SPACE] = 1 u 1
e = 00 c 1
W 1
y 1
And there is a problem,
does 00 = e or does 00 = ‘a’ twice
Huffman
0478
1.3 Data Storage
Letter Frequency
One smart way to solve this is with Huffman encoding a 5
[SPACE] 5
e 4
You still need the frequency. n 3
B 2
m 2
s 2
We will use this frequency to build a Binary tree,
r 2
A Huffman Tree t 1
‘ 1
l 1
i 1
u 1
c 1
W 1
y 1
Build a tree
0478
1.3 Data Storage
Letter Frequency
Take the two most least used characters a 5
[SPACE] 5
e 4
W and y n 3
B 2
m 2
W y s 2
r 2
t 1
‘ 1
We say how many times they are used. 1 time each l 1
i 1
1 1 u 1
W y c
W
1
1
y 1
Build a tree
0478
1.3 Data Storage
Letter Frequency
Then we link these two by adding their values a 5
[SPACE] 5
e 4
2 n 3
B 2
m 2
1 1
s 2
W y r 2
t 1
‘ 1
l 1
i 1
u 1
Now take this and put it at the bottom of your tree c 1
W 1
y 1
Build a tree
0478
1.3 Data Storage
Letter Frequency
Then we link these two by adding their values a 5
[SPACE] 5
e 4
2 n 3
B 2
m 2
1 1
s 2
W y r 2
t 1
‘ 1
l 1
i 1
u 1
Now take this and put it in back in your frequency table. c 1
W 1
y 1
Letter Frequency
Build a tree
0478
1.3 Data Storage
a 5
[SPACE] 5
e 4
n 3
2 B 2
m 2
s 2
1 1
r 2
W y 2 1
t 1
‘ 1
l 1
Now take this and put it in back in your frequency table. i 1
u 1
c 1
W 1
If you reach a letter (like ‘r’) that matches the sum of two y 1
frequencies (W and y) then put them all on one level
My Tree
0478
1.3 Data Storage
2
1 1
W y
0478
22
11 12
10
7 8
5 5
a SPACE
4
4 4 4 e 4
3
n
2
r
2
s 2 2 2 2 2
B m
2
1 1 1 1 1 1 1 1
W y u c l i t ‘
So what
0478
1.3 Data Storage
22
11 12
10
7 8
5 5
a SPACE
4
4 4 4 e 4
3
n
2 2
r s 2 2 2 2 2 2
B m
1 1 1 1 11 11 1 1
W y u c l i t ‘
What’s the code?
0478
1.3 Data Storage
START
So if you want ‘B’
It was 01000010 22
7 8
So from the top (START) 5 5
a SPACE
Go Right (22) - 1 4
4 4 4 e 4
Go Right (12) - 1 3
n
Go Right (8) - 1
Go Right (4) - 1 2 2
2 2 2 2 2 2
r s B m
Go Left (2) - 0
1 1 1 1 11 11 1 1
So B is now 11110 W y u c l i t ‘
What’s the code?
0478
1.3 Data Storage
B is now 11110
START
What if you want a more popular letter, ‘e’
22
START
RIGHT (22) – 1
11 12
RIGHT (12) - 1 10
RIGHT (8) - 1 7 8
LEFT (e) - 0 5 5
a SPACE
4
4 4 4 e 4
So e is now 1110 n
3
2 2
2 2 2 2 2 2
Because e is more popular than B, it hasr a shorter
s code B m
1 1 1 1 11 11 1 1
W y u c l i t ‘
+&-
0478
1.3 Data Storage
± If you have a really deep/tall tree, you may have a letter that uses more than 8
bits. But this is okay because that letter will be at the bottom of the tree and those
letters are not used often
Today
0478
1.3 Data Storage
4. Video is a series of still images that are play backed at speed to imitate
movement
Frames
0478
1.3 Data Storage
If video is a set of images play backed at speed. How quick must it be?
If you have 1 image on screen for 1 second, you say your video is 1 frame per
second (fps)
Most videos are 30fps and smooth videos or video games are done at 60fps.
Interlaced encoding
Progressive encoding
Interlaced
0478
1.3 Data Storage
Broadcast TV uses interlaced – because its cheaper (both hardware is cheaper and
bandwidth cost)
Progressive
0478
1.3 Data Storage
Image artifacts
Moiré effect
Today
0478
1.3 Data Storage
But how??
Interframe compression - i-frame
0478
1.3 Data Storage
An i frame is your full frame – it must be the first frame in your video and you can
have many i-frames in your video
Interframe – P Frame
0478
1.3 Data Storage
The next frame of your video is broken up into 8x8 blocks called MACRO BLOCKS
If the Macro block from frame 2 is the same or similar as frame 1 then we will put
in a P-frame (predicted frame)
A P frame looks at an i-frame and says “you look almost the same” instead of
processing the whole thing I’ll just process the changes”
An I-frame is fixed,
It looks at your frame and removes data that you cannot see / notice the difference
(much)
Temporal Redundancy
0478
1.3 Data Storage
If parts of your video don’t change then an instruction is given to say “hey, don’t
change this part”
It is used in P-frames
They use half as much data as an i-frame