0% found this document useful (0 votes)
19 views37 pages

Data Presentation 1

The document explains how data is represented digitally using binary (0s and 1s) and discusses different number systems such as decimal, binary, octal, and hexadecimal. It covers concepts like bits, bytes, overflow errors, round-off errors, and the importance of abstraction and encoding schemes like ASCII and Unicode. Additionally, it highlights the differences between digital and analog data, particularly in the context of digital audio and file formats.

Uploaded by

Em
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views37 pages

Data Presentation 1

The document explains how data is represented digitally using binary (0s and 1s) and discusses different number systems such as decimal, binary, octal, and hexadecimal. It covers concepts like bits, bytes, overflow errors, round-off errors, and the importance of abstraction and encoding schemes like ASCII and Unicode. Additionally, it highlights the differences between digital and analog data, particularly in the context of digital audio and file formats.

Uploaded by

Em
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Understanding Data

Part 1
Representing Data Digitally
Data
Data values can be stored in variables, lists of items, or standalone constants.

Computing devices represent data(text, numbers, audio, images) digitally,


meaning that the lowest-level components of any value are bits.

Bit is shorthand for binary digit and is either 0 or a 1. This system of


representing data is called binary (base 2) which uses only combinations of the
digits zero and one.

name = "Mike Smith" # represented internally as 0's and 1's


grades = [87, 91, 75] # represented internally as 0's and 1's

Thus, digital data is simply some sequence of 0's and 1s representing some
information.
Bits and Bytes

4
Base 10 (Decimal numbers)
Computers store all information in the form of binary numbers. To understand
binary numbers, let's first look at a more familiar system: decimal numbers.

Decimal numbers is base 10. It uses 10 digits {0,1,2,3…,9}. Why do we prefer


base 10? There are many reasons but one simple reason is simply because we
have ten fingers.

Base 10 (Decimal) uses 10 digits: {0,1,2,3…,9}.


Base 2 (Binary) uses 2 digits: {0,1}.
Base 8 (Octal) uses 8 digits: {0,1,2,3,4,5,6,7}
Base 16 (Hexadecimal) uses 16 digits: {0,1,2,3,4…,9,A,B,C,D,E,F}
Base 10 (Decimal numbers)
Each digit has a weight corresponding to a power of 10. Multiply each digit with
its weight and then sum.

What does 157 mean?


157 = 1 x 100 + 5 x 10 + 7 x 1
= 1 x 102 + 5 x 101 + 7 x 100
Base 10 (Decimal)
Base 2 (Binary)
Base 2 (Binary)

Base 2
1011 = 1 x 23 + 0 x 22 + 1 x 21 + 1 x 20
1011 = 1 x 8 + 0 x 4 + 1 x 2 + 1 x 1 =11 in Base 10
Bits and Bytes

1 bit is a single bit of information, a 1 or 0(Only two possible values)

1 byte is 8 bits, an 8-bit word


• 256 possible values from 0-255 base 10
• or 00000000 to 11111111 base 2

For example, 10100110 is a single byte.


Decimal to Binary
Repeatedly modulo 2 and the divide by 2. Stop at 0. The remainder list read top to
bottom are the digits from left to right.
People

There are 10 types of people in the world; those who understand binary and those who don’t.

There are 10 types of people in the world; those who understand binary and those who have friends.
Hexadecimal
Binary code is too long in representation. Hexadecimal is much shorter. Hexadecimal
uses 16 digits.

Problem: we are short of numbers.


A-10 B-11 C-12 D-13 E-14 F-15
Hexadecimal digits: {0,1,2,3,4…,9,A,B,C,D,E,F}

Converting a binary number to a Hex number is relatively easy. Every 4 bits can convert to
a Hex.
Hexadecimal
Binary to Hex
Overflow Error

In many programming languages, integers are represented by a fixed number of


bits, which limit the range of integer values and mathematical operations on
those values.

For example in Java, the range of the value of an integer is from


−2,147,483,648 to +2,147,483,647. Trying to store a number bigger than the
limits will result in an overflow error.

Some languages like Python, integers do not have limits on number size but,
instead, expand to the limit of the available memory.

18
Overflow Error
What Does Overflow Error Mean?

In computing, an overflow error is an error that happens when a


program receives a number, value or variable outside the scope of its
ability to handle. This type of error is somewhat common in
programming, especially when dealing with integers or other numerical
types of variables.
Overflow Error
The formula to calculate the largest number stored using n bits is 2^n – 1.

Example 1:
With 4 bits, the largest integer that can be stored is 2^4 - 1 = 15.

Example 2:
A 4-bit integer can any value in {0,1,2,..,15}. Thus, storing the value of 10 + 6
= 16 would cause an overflow error since the 16 = 10000 requires at least
5 bits.

Example 3:
If x is a 3-bit integer, then x = 111 + 111 will cause an overflow error since
the sum is 1110 which requires at least 4 bits.
20
Round Off Errors

A fixed number of bits is used to store real numbers. Because of this


limitation, round-off errors can occur.

Python computes 1/3 as 0.3333333333333333. This value is only an


approximation of 1/3 which is an infinitely repeating decimal.

Round-off error occurs when decimals (real numbers) are rounded.

21
Abstraction
Abstraction is the process of reducing complexity by focusing on the main idea.

By hiding details irrelevant to the question at hand and bringing together related
and useful details, abstraction reduces complexity and allows one to focus on the
idea.

One type of abstraction we saw was procedural abstraction, which provides a


name for a procedure(function) and allows it to be used only knowing what it does,
not how it does it.

import random
print(random.randrange(10)) # random number from 0 - 9

We don't need to know how the randrange() function is implemented to be able use it.
Abstraction
Bits are grouped to represent abstractions. These abstractions include, but are
not limited to, numbers, characters, and color.

Sequences of 0's and 1's represent a string of characters, a numeric grade and a color
in the examples below. The color example is done using Processing.

name = "Mike Smith"


grade = 87
c = color(255, 123, 110)
fill(c);
ellipse(250, 80, 100, 100)

23
Encoding
A computer cannot store “letters” or “pictures”. It can only work with bits(0 or 1).

To represent anything other than bits, we need rules that will allow us to convert a
sequence of bits into letters or pictures. This set of rules is called an encoding scheme, or
encoding for short. For example,
ASCII
ASCII(American Standard Code for Information Exchange) is an encoding scheme
that specifies the mapping between bits and characters. There are 128 characters in the
scheme. There are 95 readable characters and 33 values for nonprintable characters like
space, tab, and backspace and so on.

The readable characters include a-z, A-Z, 0-9 and punctuation. In ASCII, 65 represents A, 66
represents B and so on. The numbers are called code points. But ASCII only uses 8 bits. It
does not have enough bits to encode other characters and languages(Chinese, French,
Japanese).

Similar to ASCII, Unicode provides a table of code points for characters: "65 stands for A, 66
stands for B and 9,731 stands for ☃”. UTF-32(Unicode Transformation Format),
UTF-16 and UTF-8 are three encodings that use the Unicode table of code points.

Of the three, UTF-8 is by far the most widely used encoding. It is the standard encoding for all
email and webpages(HTML5).
ASCII Sample
encode
To encode something in ASCII,
follow the table from right to left,
substituting letters for bits.

To decode a string of bits into


human readable characters, follow
the table from left to right, substituting
bits for letters.

decode
Unicode
So, how many bits does Unicode use to encode all these characters? None.
Because Unicode is not an encoding.

Unicode first and foremost defines a table of code points for characters. That's a
fancy way of saying "65 stands for A, 66 stands for B and 9,731 stands for ☃”.

To represent 1,114,112 different values, two bytes aren't enough. Three bytes are,
but three bytes are often awkward to work with, so four bytes would be the
comfortable minimum.

UTF-32(Unicode Transformation Format) is such an encoding that encodes


all Unicode code points using 32 bits. That is, four bytes per character.
UTF-32
To represent 1,114,112 different values, two bytes aren't enough. Three bytes are,
but three bytes are often awkward to work with, so four bytes would be the
comfortable minimum.

UTF-32(Unicode Transformation Format) is such an encoding that encodes all


Unicode code points using 32 bits. That is, four bytes per character.

UTF-32 very simple, but often wastes a lot of space. For example, if A is always
encoded as 00000000 00000000 00000000 01000001 and B as 00000000 00000000
00000000 01000010 and so on, documents would bloat to 4x its necessary size.
Digital vs. Analog
Computers can only understand digital data: discrete values like a list of integers,
sequences of 0's and 1's.

Analog data have values that change smoothly, rather than in discrete intervals,
over time. Some examples of analog data include temperatures over a period of
time, pitch and volume of music, colors of a painting, or position of a sprinter during
a race.
• analog signals are continuous and can take on an infinite possible values(the
real numbers)
• digital signals are finite. For example, 8-bit colors can take on one of 256
discrete, finite possibilities. But actual colors can take on any of an infinite
possible values or shades.
Digital vs. Analog
Analog data can be closely approximated digitally using a sampling technique,
which means measuring values of the analog signal at regular intervals called
samples.

The samples are measured to figure out the exact bits required to store each
sample. The number of samples measured per second is the sampling rate, the
higher the rate the better the quality.

CD-quality has a rate of 44,100 samples


per second(44,100 Hz or 44.1 kHz).

Analog Low sampling rate High sampling rate


Digital Audio Basics
• Digital audio is music, speech, and other sounds represented in binary
format for use in digital devices
• Most digital devices have a built-in microphone and audio software, so
recording external sounds is easy
• To digitally record sound samples of a sound wave are collected at
periodic intervals and stored as numeric data in an audio file
• Sound waves are sampled many times per second by an analog-to-digital
converter
• A digital-to-analog converter transforms the digital bits into analog sound
waves

31
Digital Audio Basics

32
Digital Audio Basics
• Sampling rate refers to
the number of times
per second that a
sound is measured
during the recording
process
• Higher sampling rates
increase the quality of
a sound recording but
require more storage
space

Unit 1: Digital Content 33


Digital Audio File Formats
• A digital file can be
identified by its
type or its file
extension, such as
Thriller.mp3 (an
audio file)
• The most popular
digital audio
formats are: AAC,
MP3, Ogg,Vorbis,
WAV, and WMA

34
Digital Audio File Formats
•To play a digital audio file, you must use some type of
audio software, such as:
• Audio Software: General-purpose software and apps used for
recording, playing, and modifying audio files, such as iTunes
• Audio Players: Small standalone software application or
mobile app that offers tools for listening to digital audio and
managing playlists, typically included with your computer’s OS
(operating system)
• Audio Plugins: Software that works in conjunction with your
computer’s browser to manage and play audio from a Web page

Unit 1: Digital Content 35


Digital Audio File Formats

Unit 1: Digital Content 36


Digital Audio File Formats
• Ripping is a slang term
that refers to the
process of importing
tracks from a CD or
DVD to your
computer’s hard disk
• The technical term for
ripping music tracks is
digital audio extraction

37

You might also like