0% found this document useful (0 votes)
19 views69 pages

1 3+Data+Storage+and+Compression

This document discusses data storage and compression methods. It covers: 1. Different units used to measure data storage sizes like bytes, kilobytes, megabytes, etc. 2. The concepts of lossy and lossless compression methods. Lossy compression removes redundant data and creates smaller file sizes but with lower quality, while lossless compression reduces file size without any permanent loss of data. 3. Examples of when lossy compression is used, like for images, sound, and video, since it can remove data our senses like sight and hearing cannot perceive. Lossless compression is used when perfect quality is required after decompression.

Uploaded by

haotongxu14
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views69 pages

1 3+Data+Storage+and+Compression

This document discusses data storage and compression methods. It covers: 1. Different units used to measure data storage sizes like bytes, kilobytes, megabytes, etc. 2. The concepts of lossy and lossless compression methods. Lossy compression removes redundant data and creates smaller file sizes but with lower quality, while lossless compression reduces file size without any permanent loss of data. 3. Examples of when lossy compression is used, like for images, sound, and video, since it can remove data our senses like sight and hearing cannot perceive. Lossless compression is used when perfect quality is required after decompression.

Uploaded by

haotongxu14
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 69

1.

3 Data storage and compression


0478
1.3 Data Storage

1. Understand how data storage is measured


2. Calculate the file size of an image file and a sound file, using information given
3. Understand the purpose of and need for data compression
4. Understand how files are compressed using lossy and lossless compression
methods
5. Lossless compression reduces the file size without permanent loss of data, e.g.
run length encoding (RLE)
Today
0478
1.3 Data Storage

1. Understand how data storage is measured


2. Calculate the file size of an image file and a sound file, using information given
3. Understand the purpose of and need for data compression

Understand: Different ways to represent data

Able: Perform RLE

Answer: What is the difference between lossy and lossless


Remember when we did this…..
0478
1.3 Data Storage

1 bit = 1 binary digit = the smallest thing you can have

1 byte = 8 bits

1 kilobyte = 1024 bytes

1 megabyte = 1024 kilobytes

1 gigabyte = 1024 megabytes


Bit > Byte > Kilobyte > Megabyte > Gigabyte > Terabyte > Petabytes > Exabyte >
Well there is a problem….
0478
1.3 Data Storage

Question: How many meters are in a kilometre (km)?


Answer: 1000

Question: How many grams are in a kilogram (kg)?


Answer: 1000

So why is kilobyte 1024 and not 1000?

Okay….. This next part is nonsense, but stay with me….

Cambridge wants us to use some new words.


These new words were made by the IEC, International Electrotechnical Commission
And no one….no one….no one uses these new words, but we need to do them because
ICE Sizes
0478
1.3 Data Storage

The IEC wants to keep everything as 1000 not 1024. So it goes:

1 bit
1 nibble = 4 bits
1 byte = 2 nibbles = 8 bits
1 kilobyte (kb) = 1000 bytes
1 megabyte (mb) = 1000 kb
1 gigabyte (gb) = 1000 mb
1 terabyte (tb) = 1000 gb
1 petabyte (pb) = 1000 tb
1 exabyte (eb) = 1000 pb

And now we have another problem…. Because everyone still thinks kilobyte is 1024 and not 1000 like the
IEC wants.

So when someone says kilobyte does it mean 1000 bytes like the IEC wants or does it mean 1024 bytes like
IEC Terms
0478
1.3 Data Storage

So because of this confusion the IEC wants us to learn these new words

1 bit
1 nibble = 4 bits
1 byte = 8 bits
1 Kibibyte (KiB) = 1024 bytes
1 Mebibyte (MiB) = 1024 Kibibyte
1 Gibibyte (GiB) = 1024 Mebibyte
1 Tebibyte (TiB) = 1024 gibibyte
1 Pebibyte (PiB) = 1024 tebibyte
1 Exbibyte (EiB) = 1024 pebibyte
1 Zebibyte (ZiB) = 1024 exbibyte
All terms
0478
1.3 Data Storage

So these are all the IEC terms. But remember, every book, website, even in our
lessons ignore the IEC. But we need it for our exams
IEC Terms
1 bit 1 bit
1 nibble 4 bits 1 nibble 4 bits
1 byte 8 bits 1 byte 8 bits
1 kilobyte (kb) 1000 bytes 1 Kibibyte (KiB) 1024 bytes
1 megabyte (mb) 1000 kilobyte (kb) Mebibyte (MiB) 1024 Kibibyte (KiB)
1 gigabyte (gb) 1000 megabyte (mb) 1 Gibibyte (GiB) 1024 Mebibyte (MiB)
1 terabyte (tb) 1000 gigabyte (gb) 1 Tebibyte (TiB) 1024 gibibyte (GiB)
1 petabyte (pb) 1000 terabyte (tb) 1 Pebibyte (PiB) 1024 tebibyte (TiB)
1 exabyte (eb) 1000 petabyte (pb) 1 Exbibyte (EiB) 1024 pebibyte (PiB)
Your syllabus
0478
1.3 Data Storage

Your syllabus says:


Calculate the file size of an image file and a sound file, using information given
Information given may include:
– image resolution and colour depth
– sound sample rate, resolution and length of track

Well, we did all of that in 1.2 Text, Sound and Images.

So we can now move onto Compression


Today
0478
1.3 Data Storage

4. Understand how files are compressed using lossy and lossless compression
methods
5. Lossless compression reduces the file size without permanent loss of data, e.g.
run length encoding (RLE)

Understand: What is compression?

Able: Apply different compression methods

Answer: When to use lossy and when to use lossless?


What is compression?
0478
1.3 Data Storage

Make things take up less space

Many methods of compression

All of them are either lossy or lossless

But why do we need to make things take less space?


1. Less storage space needed
2. Quicker to send a smaller file then a larger file. Short transmission time
3. Less bandwidth needed
Lossy
0478
1.3 Data Storage

Remove redundant data

Creates smaller file size

Worse quality
Where is lossy compression used?
0478
1.3 Data Storage

Lossy is used when you can afford to remove data

Sound
Images
Video
Lossy Sound – MP3
0478
1.3 Data Storage

Sound
Your stupid human ears can hear sounds from 20Hz to 20,000Hz
But as you get older, you lose the ability to hear. So there is no point having
frequencies inside your sound file that ‘most’ humans cannot hear.
Test your hearing, it is playing, I can start hearing it around 120Hz. It lasts for
around 2 minutes.
Lossy Image
0478
1.3 Data Storage

Your stupid human eyes cannot see all the colours and shades of colour that could
be in an image. It also cannot tell fine detail. Which one of these is lossy?
Lossy Image
ORIGINAL LOSSY
0478
1.3 Data Storage
ORIGINAL LOSSY
0478
1.3 Data Storage
Lossy Image
0478
1.3 Data Storage

Okay, so you cannot see the difference between original and lossy image unless
you zoom in…. But even though my example of the guitar girl is fine, I don’t want to
lie to you.

The original image was a JPEG image.


You’ve all seen JPEG images, but JPEG images have already been compressed.

A real, 100%, pure image, with nothing removed is called a RAW image.
A JPEG image takes a RAW image, looks at the the data your stupid human eyes
cannot see (or cannot see well) and removes it
RAW vs JPEG
0478
1.3 Data Storage

You see the sky in JPEG,


looks rubbish…. But this
image is also a lie, because
if you did a lot of lossy
compression then yes the
sky could look like this but
no one would do this.
RAW vs JPEG
0478
1.3 Data Storage
RAW vs JPEG
0478
1.3 Data Storage

So, with this guy, why does the RAW


file look duller?

Because RAW has all the


information. It kept everything. It
means it includes all the light data
AND the dark data.

RAW usually looks worse to our


human eyes because we are used to
JPEG. So why do we use RAW? We
use it because we want to have
100% of the data and when we use
photoshop we can play with 100%
Video - Lossy
0478
1.3 Data Storage

Now a video is just one picture shown after another after another after another, so
the compression is the same.

(Okay, this is a lie, there are 3 ways that a video does compression…but its not on
your syllabus, but I’ll add to the end of this PPT)
Lossless
0478
1.3 Data Storage

Keeps 100% of the information

Less free space than lossy after compression

Better quality
Where you use lossless?
0478
1.3 Data Storage

You use lossless wherever you must keep all of the data.

Text
Vector**

Vector images don’t have compression, because it’s built on math and coordinates,
you don’t need to compress it. The quality will always be the same regardless of
the size

There are many types of lossless compression:


Run Length Encoding
Huffman (not in syllabus)
Run length encoding
0478
1.3 Data Storage

Is lossless

CCCCCCCCCCWCCCCCCCCCCPPP

Can be written as

10C 1W 10C 3P

But we don’t write the spaces as this takes space ))


10C1W10C3P
Run Length Encoding
0478
1.3 Data Storage

You can even use RLE for images too.


Look at the top line,
You could say:
White, White, White, White, White, White, White,
White

Or just say W,W,W,W,W,W,W,W,

Or with RLE just say 8W


Line 2 would be: 1W 2B 1W 2B 2W
Or you can be smarter and say:
2 WBB 2W
Task
0478
1.3 Data Storage

Draw a 8x8 grid


Shade in an image
Give to someone
They have to do the RLE on it
EXTRA FREE INFO
0478
1.3 Data Storage

These next two topics are NOT in your syllabus, but are pretty cool and not so
difficult.

We have:

Lossless Compression : Huffman


And
How are videos compressed
Huffman Coding
0478
1.3 Data Storage

It’s a way to compress text.

Lossless
ASCII
0478
1.3 Data Storage

In ASCII, every letter is either 7 bits for standard ASCII, but really everyone uses extended
ASCII so we say every ASCII letter is 8 bits.

Batman’s real name is Bruce Wayne

This sentence in Binary is:


01000010 01100001 01110100 01101101 01100001 01101110 11100010 10000000
10011001 01110011 00100000 01110010 01100101 01100001 01101100 00100000
01101110 01100001 01101101 01100101 00100000 01101001 01110011 00100000
01000010 01110010 01110101 01100011 01100101 00100000 01010111 01100001
01111001 01101110 01100101
How many times
0478
1.3 Data Storage

Well what if we just count the number of times a letter happens (we actually also
count spaces and punctuation too)
How many times
0478
1.3 Data Storage
Letter Frequency
Batman’s real name is Bruce Wayne a 5
[SPACE] 5
e 4
So instead of using 8 bits for each letter, why don’t we n 3
B 2
say the most popular letters only uses 1 bit
m 2
s 2
Then the next most popular use two bits…and so on r 2
t 1
‘ 1
a=0 l 1
i 1
[SPACE] = 1 u 1
e = 00 c 1
W 1
y 1
And there is a problem,
does 00 = e or does 00 = ‘a’ twice
Huffman
0478
1.3 Data Storage
Letter Frequency
One smart way to solve this is with Huffman encoding a 5
[SPACE] 5
e 4
You still need the frequency. n 3
B 2
m 2
s 2
We will use this frequency to build a Binary tree,
r 2
A Huffman Tree t 1
‘ 1
l 1
i 1
u 1
c 1
W 1
y 1
Build a tree
0478
1.3 Data Storage
Letter Frequency
Take the two most least used characters a 5
[SPACE] 5
e 4
W and y n 3
B 2
m 2
W y s 2
r 2
t 1
‘ 1
We say how many times they are used. 1 time each l 1
i 1
1 1 u 1
W y c
W
1
1
y 1
Build a tree
0478
1.3 Data Storage
Letter Frequency
Then we link these two by adding their values a 5
[SPACE] 5
e 4
2 n 3
B 2
m 2
1 1
s 2
W y r 2
t 1
‘ 1
l 1
i 1
u 1
Now take this and put it at the bottom of your tree c 1
W 1
y 1
Build a tree
0478
1.3 Data Storage
Letter Frequency
Then we link these two by adding their values a 5
[SPACE] 5
e 4
2 n 3
B 2
m 2
1 1
s 2
W y r 2
t 1
‘ 1
l 1
i 1
u 1
Now take this and put it in back in your frequency table. c 1
W 1
y 1
Letter Frequency
Build a tree
0478
1.3 Data Storage
a 5
[SPACE] 5
e 4
n 3
2 B 2
m 2
s 2
1 1
r 2
W y 2 1
t 1
‘ 1
l 1
Now take this and put it in back in your frequency table. i 1
u 1
c 1
W 1
If you reach a letter (like ‘r’) that matches the sum of two y 1
frequencies (W and y) then put them all on one level
My Tree
0478
1.3 Data Storage

This is what it looks like so far.


Then you just keep repeating it

2
1 1
W y
0478

START 1.3 Data Storage

22

11 12
10
7 8
5 5
a SPACE
4
4 4 4 e 4
3
n
2
r
2
s 2 2 2 2 2
B m
2

1 1 1 1 1 1 1 1
W y u c l i t ‘
So what
0478
1.3 Data Storage

Now with the tree, every time you go left is a 0


Every time you go right is a 1 START

22

11 12
10

7 8
5 5
a SPACE
4
4 4 4 e 4
3
n

2 2
r s 2 2 2 2 2 2
B m

1 1 1 1 11 11 1 1
W y u c l i t ‘
What’s the code?
0478
1.3 Data Storage

START
So if you want ‘B’
It was 01000010 22

You know the frequency is 2.


You know left is 0 11 12
Right is 1 10

7 8
So from the top (START) 5 5
a SPACE
Go Right (22) - 1 4
4 4 4 e 4
Go Right (12) - 1 3
n
Go Right (8) - 1
Go Right (4) - 1 2 2
2 2 2 2 2 2
r s B m
Go Left (2) - 0

1 1 1 1 11 11 1 1
So B is now 11110 W y u c l i t ‘
What’s the code?
0478
1.3 Data Storage

B is now 11110

START
What if you want a more popular letter, ‘e’
22

START
RIGHT (22) – 1
11 12
RIGHT (12) - 1 10

RIGHT (8) - 1 7 8
LEFT (e) - 0 5 5
a SPACE
4
4 4 4 e 4
So e is now 1110 n
3

2 2
2 2 2 2 2 2
Because e is more popular than B, it hasr a shorter
s code B m

1 1 1 1 11 11 1 1
W y u c l i t ‘
+&-
0478
1.3 Data Storage

+ More popular letters use less binary bits

- Takes time to make a Huffman Tree


- You have to store the tree so there is a reference

± If you have a really deep/tall tree, you may have a letter that uses more than 8
bits. But this is okay because that letter will be at the bottom of the tree and those
letters are not used often
Today
0478
1.3 Data Storage

Understand : What is a video?

Able : Explain encoding

Answer : What is frame rate


What is a video?
0478
1.3 Data Storage

What is the correct definition?

1. Video is a series of moving images played back at speed to imitate movement

2. Video is a series of moving images

3. Video is a series of still images

4. Video is a series of still images that are play backed at speed to imitate
movement
Frames
0478
1.3 Data Storage

If video is a set of images play backed at speed. How quick must it be?

Frame – This means one image.

If you have 1 image on screen for 1 second, you say your video is 1 frame per
second (fps)

Most videos are 30fps and smooth videos or video games are done at 60fps.

Higher frame rate the smoother the video playback


Encoding
0478
1.3 Data Storage

There are two main ways that a video gets displayed.

Interlaced encoding

Progressive encoding
Interlaced
0478
1.3 Data Storage

It shows an image in two frames

Splits image into rows

Displays the odd number rows in the first frame

Then displays even number rows in the second

Happens so quick you think it’s a single frame


Interlaced – Takes a frame
0478
1.3 Data Storage
Interlaced – Splits it
0478
1.3 Data Storage
First frame– Displays odd sections
0478
1.3 Data Storage
Second frame– Displays even sections
0478
1.3 Data Storage
Interlaced
0478
1.3 Data Storage
Why?
0478
1.3 Data Storage

Cheaper on bandwidth to send part of a frame then the whole frame

Older technology (before progressive)

Broadcast TV uses interlaced – because its cheaper (both hardware is cheaper and
bandwidth cost)
Progressive
0478
1.3 Data Storage

Displays whole frame at once


Progressive
0478
1.3 Data Storage
Why?
0478
1.3 Data Storage

Better quality – less ‘noise’ / artifacts than interlaced

Hardware has gotten better


Find out what are:
0478
1.3 Data Storage

Image artifacts

Moiré effect
Today
0478
1.3 Data Storage

Understand : What is interframe compression

Able : Understand the difference between Spatial and temporal redundancy

Answer : Why we use compression


Interframe Compression
0478
1.3 Data Storage

Videos are big sizes

So you need to compress the video to make it smaller

But how??
Interframe compression - i-frame
0478
1.3 Data Storage

It starts with an i-frame

An i frame is your full frame – it must be the first frame in your video and you can
have many i-frames in your video
Interframe – P Frame
0478
1.3 Data Storage

The next frame of your video is broken up into 8x8 blocks called MACRO BLOCKS

If the Macro block from frame 2 is the same or similar as frame 1 then we will put
in a P-frame (predicted frame)

A P frame looks at an i-frame and says “you look almost the same” instead of
processing the whole thing I’ll just process the changes”

This includes colour changes, size, movement


Interframe – B Frame
0478
1.3 Data Storage

An I-frame is fixed,

A P frame looks at an i-frame

A B frame (bidirectional) looks at both an i-frame and p frame


0478
1.3 Data Storage

The format of it IBBPBBPBBI


Spatial Redundancy
0478
1.3 Data Storage

Just like when you compress an image (like a JPEG)

It only happens on a single frame (usually i-frames)

It looks at your frame and removes data that you cannot see / notice the difference
(much)
Temporal Redundancy
0478
1.3 Data Storage

If parts of your video don’t change then an instruction is given to say “hey, don’t
change this part”

So in this example the house doesn't’t change.

It is used in P-frames
They use half as much data as an i-frame

You might also like