0% found this document useful (0 votes)
3 views

Week_02_bits_and_bytes

The lecture covers the fundamentals of how text and numbers are represented in computer memory, including the concepts of bits, bytes, and different types of memory. It explains the differences between volatile and non-volatile memory, as well as the representation of numbers in various numeral systems such as binary, decimal, and hexadecimal. Additionally, it discusses ASCII encoding for text representation and the importance of understanding file types and their interpretations in computing.

Uploaded by

Yash Soni
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Week_02_bits_and_bytes

The lecture covers the fundamentals of how text and numbers are represented in computer memory, including the concepts of bits, bytes, and different types of memory. It explains the differences between volatile and non-volatile memory, as well as the representation of numbers in various numeral systems such as binary, decimal, and hexadecimal. Additionally, it discusses ASCII encoding for text representation and the importance of understanding file types and their interpretations in computing.

Uploaded by

Yash Soni
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 76

COMP 1238

Lecture 2:
Representing text and numbers in
computers
COMP 1238

Introduction to Data
Management

Monday, Sept 9
Starting at 1:05
◎ AtKlass: HPYX
◎ AtKlass reg: E2CJ
COMP 1238 – Intro to Data
Management

Starting Zoom recording


3
Agenda
◎ How computer memory works
◎ Short-term RAM memory vs persistent storage
◎ Representation of numbers and characters in computer memory
◎ Text and binary files - ASCII encoding

Objective: Build the foundation for understating how information is stored in


computers. Explain that all data structures and formats, no matter how
complex, are represented by a sequences of bytes

4
Computer Memory
and Storage
The “bit” – when we are limited to only two
states
Self portrait of Samuel Morse
1812

Morse developed the


Telegraph system together
with a physicist Joseph Henry,
and mechanical engineer
Alfred Vail around 1837

7
The “bit”
◎ The tiniest piece of information in
computing and communication
◎ Can hold with one of two possible
values
○ 0/1, true/false, on/off
◎ Allegedly, “bit” is short for “Binary
Digit”
◎ What are the two “values” in
Morse code on the pic?
8
CD under a
microscop
e

9
Bytes – groups of bits
◎ 1 bit is of limited use – so we group bits into “bytes”
◎ Groups of 6, 8 and 9 bits were common in early days
◎ Now everyone uses bytes consisting of 8 bits

10
Punched Card
12
Volatile and Non-Volatile
memory
◎ Volatile = gets erased when electricity is gone
◎ Non-volatile memory is used for long term storage
○ Hard drive, SD card, Flash drive
○ CD / DVD discs, Tape, punch cards, clay tablets …
◎ Volatile memory is used as short-term working
memory, it’s usually much faster but much more
expensive

13
RAM vs Storage
◎ Long term memory - called storage or disk
○ Usually slower and optimized to read or write large
chunks of data at a time
◎ Volatile working memory - usually called RAM
○ RAM = Random Access Memory – it’s fast for
reading from any location
◎ How much memory does you phone have?

14
RAM Storage /
Disk

15
16
How many bytes is a Gigabyte?

17
Refresher on exponentiation
◎ Powers of 10: ◎ Powers of 2:

18
Units of measurement - GB vs GiB
◎ Kilo, Mega, Giga prefixes like usual
◎ 1 Megabyte = 8 MegaBits
◎ But kilo = 1000 or 1024 = 210 ?
◎ Both are used, difference may be
important in some cases
◎ Convention that nobody really
remembers:
○ MB, GB … for powers of 10
○ MiB, GiB … for powers or 2 like
1024
19
Representin
g text &
numbers
with bits
We can choose which
sequences of bits represent
what character in an arbitrary
way, that’s what Morse code
does for example

21
22
But the engineers decided to
interpret bits as numbers in
“binary” notation
◎ They had good reasons – easier to build hardware that
does math on binary numbers than Morse digits

23
Dive into binary + disclaimer
◎ This section about binary and hex might be a bit heavy
◎ Don’t worry if you don’t get it, many people don’t
◎ You’ll hear about it on other courses in more detail

24
Why binary and hex are so
confusing

10=2
numbers -
explained by
ancient
Babylonians
Number vs
Representation of a number
◎ When talking about different representations,
the language we use is confusing, like
○ “Ten in binary is two”
○ “Ten in hex is sixteen”
◎ The number of fingers on this hand does not
change if I decide to represent it in one system
or another - , V, “five”, 0b101
◎ But to communicate a number, we must
represent it somehow – the word “five” is a
representation 27
Ambiguity in words and
symbols is the main source of
confusion here
◎ For some representations we have only one meaning,
no ambiguity:
○ The word “five”, the digit 5
○ All the digits and spoken numbers up to 9 have
only one meaning
◎ But when we say “ten” do we mean:
○ The number of fingers on two hands
or
○ The digits 1 and 0 written together as “10”
28
Positional numeral systems
◎ 345 = 3*100 + 4*10 + 5

29
Positional numeral systems
◎ 345 = 3*100 + 4*10 + 5
◎ “Positional” because the meaning
of a digit depends on position
◎ The base of the system
is how many digits it uses
also called “radix”
We use 10 digits 0 to 9
so it’s a base-10 system

30
Common options for the radix
Base = radix
“Positional” because the How many Fancy Name Digits
meaning of a digit depends digits it uses
on position – e.g. hundreds
vs thousands. 2 Binary 0,1

The base of the system 8 Octal 0 to 7


is how many digits it uses
also called “radix” 10 Decimal 0 to 9
0 to 9, A, B, C,
16 Hexadecimal
D, E, F
60 Sexagesimal to
How to specify the base in writing
Mathematical notation Computer notation prefixes:
0b101 – binary one zero
one
0o17 – octal one seven
0d19 – decimal one nine
0x1F – hex one ef

32
Rules to follow to avoid
confusion
◎ Whenever you use words like “ten” or “twelve”
○ always use them to represent the number they
mean in everyday language
○ not the string of digits “10” and “25”
◎ When you want to say a binary number 10 – spell out
the digits “binary one zero” as if you are talking over a
radio

33
Converting a binary number
Let’s convert 0b1010 to decimal

1 0 1 0 digits

3 2 1 0 Index

23 22 21 20 2^index

8 4 2 1 2^index = ?

1 0 1 0 Digits again

8 2 digit 2^index

34
Converting a hex number
Let’s convert 0x2F to decimal
Digit “F” stands for 15

0 0 2 F digits

3 2 1 0 Index

163 162 161 160 16^index

4096 256 16 1 16^index = ?

0 0 2*16 15*1 digit 16^index

35
Exercise time

◎ Hex -> Babyloninan & decimal


◎ 0b1
◎ 0b10
◎ 0b11
◎ 0b100
◎ 0x99

36
Two-byte integer
The “Least Significant Bit” on the right represent
The “Most Significant Bit” on the left represents
Two-byte integer can store numbers from 0 to

37
Memory as a series of bytes

Computer sees the memory as one long sequence of bytes. It


can look up the value in each one of those bytes

38
We’ve seen positive integer
numbers in binary.

How can we represent negative and


fractional numbers?

39
Signed numbers (positive and
negative)
◎ We could use the first bit for sign, but the most
commonly uses system is where the first bit stands for
negative 2N
◎ a 4-bit signed number 1010 would be

40
Fractional numbers
◎ How would you represent fractional number like
○ 3.1415
○ 0.0000000042
○ 9.99

41
Fixed point notation
◎ For prices if we assume we can only use
whole cents
○ Price can be 3.45 but not 3.455
◎ Write all the digits, but remember that the
dot separates the last two of them
◎ This is equivalent to storing the prices in
cents in this case

42
Floating point notation
1. Write out the digits: 47988
2. Say where the point is: 4 positions from the end
3. Store those two numbers and use them to represent
the fractional number

43
How big can the numbers be?
Some early computers could only address 64KB of
memory because the address was stores as a two-
byte integer so they could only count to 65,535

It’s common to represent time using a signed 4-


byte integer that keeps the number of seconds
since the start of January 1st, 1970. It’s called
“Unix Time”. In 2038 this representation will
overflow in what is known as the year 2038
problem
44
𝑁
2
◎ A “16-bit number” is a lot like a “6-figure salary”
◎ It’s how many digits there are - a “bit” is a “binary
digit”

45
Bits to Text

A Morse Key (1900) Siemens t37h (1933)

46
ASCII encoding
◎ ASCII = American Standard Code for Information
Interchange
◎ A table of characters and “control codes” to
standardize communication

◎ 0b1000001 = 65 => ‘A’


◎ 0b1000010 = 66 => ‘B’
◎…
47
48
◎ Control codes don’t print a character but send some
other command like line-feed or ring bell

49
Extended ASCII

50
Text Mode

51
1960s computer
rooms had no
screens
Then screens appeared, but before we
could control each pixel on the screen
we used “Text Mode”
◎ First screens were direct replacement for printers
◎ “Text Mode” divides the screen into a grid, usually 80
columns by 25 lines, putting a character in each cell
◎ Low-level hardware would light up the right pixels to
display each character

53
54
Old airport billboards were
also a kind of text only
displays

55
Why text mode?
Apple II was released in 1977
With 4KB of RAM and only
removable floppy drives for storage
(140KB per disk)

The video controller displayed 40


columns by 24 lines of monochrome,
upper-case-only text

40*24 = 960 characters ~ 1KB


That’s ¼ of all the RAM it had
56
Too many pixels to handle

A 640 by 480 (VGA) screen 25 x 80 text grid = 2000 characters


has about 300,000 pixels. 2KB at 1 byte per character
Even at 1 bit per pixel
that’s 300,000/8 bytes =
38KB
Telling the graphics
hardware what to do with
each pixel was way too
much data
57
58
“htop” utility on Linux – often used on modern
computers

59
“Code pages” for other
languages
◎ People stated using the 128-255 range for a variety of
characters from other languages
○ Latin1 – the most common (often default) code page
○ Windows Code Pages CP-125x
◎ It was a mess, avoid anything other than Latin1 if possible.
Use Unicode UTF-8 for other languages, we will talk about
it later

60
encoding
for Cyrillic
letters

61
Files are sequences of bytes
◎ We decide what the bytes mean and how to interpret
them
◎ Text files are files where each byte is interpreted as a
character (or special symbol line new-line)
○ * We will talk about Unicode later – some
characters may take more than one byte

62
File
s

63
Why files?
◎ Named chunks of space on a storage device
◎ Simplest way to “organize” large amount of data

64
Each byte can be shown as a
character for any file, not only text
files

Any file can be


displayed as a text
file, even if it looks
ugly

65
Binary files – files that look ugly
if displayed as text

◎ The data is not meant to be displayed as text, for


example audio files contain long sequences of
numbers represented with 2 or 4 bytes each

66
Questions?

67
Links

◎ TedEd video: How computer memory works


◎ Video playlist about binary numbers
◎ Binary Explained in 01100100 Seconds (fast video by Fireship)
◎ How To Count in Binary
◎ Floating point numbers
○ Video by @Computerphile
◎ Where did Bytes Come From? - Computerphile
◎ 1964 video by IBM - IBM: Once Upon A Punched Card

68
DRAFTS

69
Decimal Binary Hex Babylonia
n

70
Start with counting states of N bits

◎ TBD

71
Rules that help with understanding

◎ Babylonian numbers represents actual numbers


◎ Words like “ten” or “twelve” or “twenty-five” must only
◎ Anything with regular digits 1, 2, 3 … is to be
interpreted in some notation that must be clearly
specified, if unspecified it’s decimal
◎ King Hammurabi (the bearded dude from the previous
slide) has fingers on his hand

72
74
◎ In decimal system we use daily (also called base 10)

◎ In binary

75
Representing Numbers
Positional Number Systems
◎ TBD
○ Use some system like Babylonian to make it
distinct from the systems we are playing with
○ <<|| is always
○ The confusion of phrases like “12 hex” is “18 decimal” is probably
the biggest problem when learning number systems like hex. Don’t
use phrases like this!
○ Twelve is a number, we have twelve phalanges on the 4 fingers of
the hand, there are twelve of them no matter which system you use
to represent the number
○ Always spell out the digits
○ Prefix with 0x 0b 0d 0o (zero o), pronounce as “Octal one two” 76

You might also like