0% found this document useful (0 votes)

15 views

01 Intro Bits Bytes

2^20 = 1,048,576 possible messages with 20 yes/no questions. If some messages are disallowed or more likely than others, we could assign probabilities to the messages and use an entropy encoding like Huffman coding to compress the messages into fewer bits on average.

Uploaded by

Cường Darma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

01 Intro Bits Bytes

Uploaded by

Cường Darma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

Introduction:

algorithms (recapitulation),
bits, strings
Jesper Larsson
Course, teaching
• Me: Jesper Larsson, string and compression algorithms
person, teaching at MAU since 2014, research
background in string algorithms and data compression

• You?

• Languages? Spoken and for programming

• Lectures, assignments, computer time

• Not based on book, but exists in books

• Small and familiar course

Lectures (preliminary)
1. Intro, bits and strings
2. Bucket and radix sorting
3. Trie (digital tree)
4. Su x tree
5. Inverted le (search engine data structure) + regular expressions
6. Su x array
7. Su x data structure algorithms, supplemental
8. Information theory, codes, entropy coding
9. More on codes and their applications, Ziv-Lempel compression
10. The Burrows–Wheeler transform (BWT)
11. Substring search (KMP, BM, Karp–Rabin)
12. Catching up and summary
ffi
ffi
ffi
fi
I expect that you know

• Sorting and searching taught on basic algorithms

courses: binary search trees, hash tables, { selection,
insertion, quick-, merge-}-sort

• Programming

• Principles of algorithm analysis, O-notation etc.

Assignments (preliminary)

1. Word frequencies + compact le

2. Radix sorting (general interface)
3. Word frequencies 2 (trie) + search engine
4. Su x sorting
5. Entropy coding
6. BWT
ffi
fi
Today

• Brief recapitulation of algorithm time complexity

• We take a step back: digital representation of information

• Bitwise operations, numbers

(Not what this course is supposed to teach, but you are
going to need this in your programming assignments)

• If time: counting/bucket sort (“key-indexed counting”)

Algorithmic research
• Come up with algorithms for speci c problems

• Determine the “speed” of algorithms

• Find “faster” algorithms

• Prove that:
- A speci c algorithm does what it’s supposed to
- A speci c algorithm has a certain “speed”
- There can be no algorithm for the problem “faster”
than a certain “speed”
fi
fi
fi
how many times
do you have to
turn the crank?

Charles Babbage’s
analytical engine,
“programmed” by
Ada Lovelace
Time complexity of algorithm
• T(N) = a measure for the time it takes to run the program
on an input of size N

• Approximate with “at most proportional to”, O-notation

~, Θ, O

Ex 1. ⅙ N 3 + 20 N + 16 ~ ⅙N3 is Θ(N3)
Ex 2. ⅙ N 3 + 100 N 4/3 + 56 ~ ⅙N3 = proportional to N3
Ex 3. ⅙N3 - ½N 2 + ⅓ N ~ ⅙N3 = cubic

Most common measure

f(N) is O(g(N)) means: f(N) is at most proportional to g(N)
f(N) is Θ(g(N)) means: f(N) is precisely proportional to g(N)
g(N) is the “best” function such that f(N) is O(g(N))

Formally
f(N) is O(g(N)) means: ∃ constants N0 and c so that if N > N0, then |f(N)| < c·g(N)
f(N) is Ω(g(N)) means: ∃ constants N0 and c so that if N > N0, then |f(N)| > c·g(N)
f(N) is Θ(g(N)) means: f(N) is both O(g(N)) and Ω(g(N))
Time for sorting with
pairwise compares
log2

• Upper bound: ~N lg N compares (given by mergesort)

• Lower bound: ?

• Optimal algoritm: ?

Start with considering how many possible orderings there are

Decision tree to nd, using compare, which possible ordering of 3 elements is correct

a<b
tree heigh =
yes no number of
compares in the

b<c a<c

yes
no yes no

abc a<c bac b<c

yes no yes no

acb cab bca cba

one leaf per possible ordering: 3! = 6 leaves

fi
Decision tree for possible orderings of N values

• N values a1 to aN. Assume they are all di erent (a case we need to manage).
• N! = N · (N−1) · (N−2) · … · 3 · 2 · 1 di erent orderings
• Tree with compares (internal nodes), with orderings as leaves:

at least N! leaves no more than 2h leaves

• Worst case time: height h.

2 h ≥ N!
• Binary tree med h levels: at most 2h leaves
h ≥ lg (N!) ~ N lg N

Stirling's approximation

• Conclusion: Any algorithm must use lg (N!) ~ N lg N compares (worst case)

ff
ff
Stupid question?
• Well known: computers at machine level represent
everything using only 0 and 1

• So how can computers process and output graphics,

audio, or even text, which don’t look like 0s and 1s?

• What does it even mean “only 0 and 1”?

• (How to explain this to someone without comp sci

knowledge?)
Blondinrikard Fröberg, Listen closely https:// ic.kr/p/tRbAcU

στρατός (formerly known as Michelangelo_MI), At the end of the track https:// ic.kr/p/4wMSNh
fl
fl
Sound
• A sequence of amplitude values in binary
representatio

• Parameters: frequency, bits per sample…

Pictures
• Bit-mapped
PDF, Postscript, …

PNG, JPEG, GIF, …

ht
tp
:/
/b
lo
g.
xk
cd
.c
om
/2
01
0/
05
/0
3/
co
lo
r-
su
rv
ey
-r
es
ul
ts
/
Pictures
• Colors: RGB

• 3 byte numbers, 256×256×256

= 16777216 different colors
D. B. Gaston, Arabian Nights text cropped https:// ic.kr/p/5QvRXv

fl
7 bit ascii
International (e.g.
Scandinavian) characters
• Replace some glyphs [ ] \ { }
• Use 8 bits: Latin-1 (ISO 8859-1
• Replace for new chars (€ etc.): Latin-9
(ISO-8859-15
• Microsoft variant: Windows-125
• Unicode multibyte: UTF-8 de-facto standard?
)

UTF-8
No of bits we Bytes in
Byte 1 Byte 2 Byte 3 Byte 4
need UTF-8 value

7 1 0xxxxxxx

8–11 2 110xxxxx 10xxxxxx

12–16 3 1110xxxx 10xxxxxx 10xxxxxx

17–21 4 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

Two approaches (which are
both kind of the same)

• Recursively split the universe in two parts. Denote the

parts 1 and 0. Binary tree with what we want to represent
in the leaves

• Assign numbers to everything we want to represent.

Encode in base 2 (binary numbers)
Game: 20 questions

• How many ”messages” are possible with 20 questions =

bits?

• What if some messages are (much) more likely than

others?

• (How many bits would we need to distinguish any

message ever written by a human?)
X has probability Claude Shannon

Optimal number of bits to represent X:

1
log2 bits
p
p

~ Unary number representation

Positional number system

ِ ‫خ َوا ِرزْم‬ َ ‫َعبْ َداهلل ُم َح َّمد ِبن ُم‬

ْ ‫وسى ا َ ْل‬
Abū ʿAbdallāh Muḥammad ibn Mūsā al-Khwārizmī
Indian positional system → “Arabic” numbers
782 = 2·100
+ 8·101
+ 7·102

2 7
1 3 6 8

4 5

0 9
Binary number
representation
decimal:
782 = 2·100
+ 8·101
+ 7·102

binary:
1100001110 = 0·20
+ 1·21
+ 1·22
+ 1·23
+ 0·24
+ 0·25
+ 0·26
+ 0·27
+ 1·28
+ 1·29
“8 Questions” for unsigned
numbers (octets or “bytes”)
00000000:
00000001:
00000010:
00000011:
00000100:
⋮
01111110: 12
01111111: 12
10000000: 12
10000001: 12
10000010: 13
⋮
11111110: 25
11111111: 255
4

Signed 8-bit numbers (octets

or “bytes”): two’s complement
00000000: 10000000: −12
00000001: 10000001: −12
00000010: 10000010: −12
00000011: ⋮
00000100: 11111110: −
⋮ 11111111: −1
01111110: 12 00000000:
01111111: 12 00000001:
10000000: −12 00000010:
10000001: −12 00000011:
10000010: −12 00000100:
⋮ ⋮
11111110: − 01111110: 12
11111111: −1 01111111: 127
4

Floating-point representation
0.250244140625:

Single precision, 32 bits: 1 sign, 8 exponent, 23 mantissa

0 01111101 00000000010000000000000

sign 0: positive
exponent: 01111101 = 125, subtract 127 (exponent bias): −2
mantissa: 1·20 (implicit rst 1 bit)
+ 0·2−1
+ 0·2−2
+ 0·2−3
⋮
+ 0·2−9
+ 1·2−10
+ 0·2−11
⋮
= 1.0009765625

+ 1.0009765625 · 2−2 = 0.250244140625

fi
Bitwise operations
• and & ∧ ·

• or | ∨ +

• xor ^ ⊻ ⊕

• not ~ ¬ ¯

• <<

• >>, >>>
a b a&b
0 0 0
0 1 0
1 0 0
1 1 1

a & b is true only if both a and b are true

a b a|b
0 0 0
0 1 1
1 0 1
1 1 1
a ∨ b is true if at least one of a and b is true
a b a^b
0 0 0
0 1 1
1 0 1
1 1 0

a ⊕ b is true when a and b are not equal

• Setting bit to 1: or 1 (or 0 means don't change)

• Setting bit to 0: and 0 (and 1 means don't change

• Flipping bit: xor 1 (xor 0 means don't change)

decimal: 9 + 5 = 14 decimal: 11 + 7 = 18
binary: 1001 + 101 = 1110 binary: 1011 + 111 = 10010

carry C
A
B

Each row is an operation withthree input bits and two output bits

outi = Ai ^ Bi ^ Ci
Ci+1 = Ai & Bi | Ci & (Ai ^ Bi)
Groups of bits

• Groups of 3: octal

• Groups of 4: hex[adecimal]

• groups of 8: byte “strings”

(One strange representation: IPv4 32-bit address: “192.168.10.199”)
Let’s calculate

• 2022 * 666
= 1346652

• 111|11100110 * 10|10011010
= 10100|10001100|01011100
2022 & 0x = 230
2022 >>> 8 = 7 7 230

666 & 0x = 154 2 154

666 >>> 8 = 2
ff
ff
154*230 & 0x = 92
154*230 >>> 8 = 138 7 230

138 2 154

92
ff
154*7 + 138 & 0x = 192
154*7 + 138 >>> 8 = 4 7 230

2 154

ff
4 192 92
2*230 & 0x = 204
2*230 >>> 8 = 1 7 230

1 2 154

4 192 92

204
ff
2*7 + 1 = 15
7 230

2 154

4 192 92

15 204
192 + 204 & 0x = 140
192 + 204 >>> 8 = 1 7 230

1+4+15 = 20 2 154

4 192 92

15 204

20 140 92

20·2562 + 140·2561 + 92·2560 = 1346652

= 666·2022
ff
Endianness

• Little-endian: byte 0 at address 0, byte 1 at address 1, …

Makes sense because it's analogous to bit 0 being the least signi cant bit

• Big-endian: byte 0 at the last address of the word,

most signi cant byte at address 0…
Makes sense because you can sort (unsigned) numbers as if they were strings
fi
fi
Counting multi-byte ints
A rst go at sorting in
less than O(N log N)

Next lecture!
fi

Daa Lab Manual
No ratings yet
Daa Lab Manual
60 pages
CS50 Lecture 0 Notes
100% (1)
CS50 Lecture 0 Notes
13 pages
Number Systems and Number Representation: Princeton University
No ratings yet
Number Systems and Number Representation: Princeton University
51 pages
Revision_part 2 (1)
No ratings yet
Revision_part 2 (1)
53 pages
DE Unit 1
No ratings yet
DE Unit 1
116 pages
DLC U1 - EE8351
No ratings yet
DLC U1 - EE8351
52 pages
The Language of Bits: Computer Organisation and Architecture
No ratings yet
The Language of Bits: Computer Organisation and Architecture
72 pages
UNIT1 - Introduction Number Systems and Conversion PDF
No ratings yet
UNIT1 - Introduction Number Systems and Conversion PDF
33 pages
DLD 3 4
No ratings yet
DLD 3 4
77 pages
Binary
No ratings yet
Binary
8 pages
Logic Contents All Chapters Except 5
No ratings yet
Logic Contents All Chapters Except 5
198 pages
Logic Gates
No ratings yet
Logic Gates
36 pages
Chapter - 6 Digital Electronics
No ratings yet
Chapter - 6 Digital Electronics
75 pages
CSE031.Lecture 02.data Representation - Part I
No ratings yet
CSE031.Lecture 02.data Representation - Part I
29 pages
Switching Theory and Logic Circuits
No ratings yet
Switching Theory and Logic Circuits
159 pages
Switching Theory and Logic Circuits
100% (1)
Switching Theory and Logic Circuits
159 pages
2019 2020 CSE206 Week09 Ch9 Ch10 Number Systems and Computer Arithmetic
No ratings yet
2019 2020 CSE206 Week09 Ch9 Ch10 Number Systems and Computer Arithmetic
39 pages
АКТ Лекция 3
No ratings yet
АКТ Лекция 3
8 pages
Unit 1
No ratings yet
Unit 1
36 pages
Cours Struct M Chapitre1
No ratings yet
Cours Struct M Chapitre1
46 pages
Fundamental of Computer Science: Subject code-IMIT-1101
No ratings yet
Fundamental of Computer Science: Subject code-IMIT-1101
48 pages
Lesson 3 - Binary Maths
No ratings yet
Lesson 3 - Binary Maths
128 pages
DLD
No ratings yet
DLD
8 pages
Cs302 Lec03
No ratings yet
Cs302 Lec03
56 pages
Addition in Binary and Hexadecimal: 0 + 0 0 0 + 1 1 1 + 0 1 1 + 1 0 Carry 1
No ratings yet
Addition in Binary and Hexadecimal: 0 + 0 0 0 + 1 1 1 + 0 1 1 + 1 0 Carry 1
15 pages
02-Data Representation in The Computer Systems
No ratings yet
02-Data Representation in The Computer Systems
42 pages
DDMP Unit 1 Updated 2024
No ratings yet
DDMP Unit 1 Updated 2024
65 pages
Object Oriented Programming in Java Binary and Hexadecimal Numeration and Logical Operations
No ratings yet
Object Oriented Programming in Java Binary and Hexadecimal Numeration and Logical Operations
12 pages
Lec1 Digitalsystem Chap1
No ratings yet
Lec1 Digitalsystem Chap1
35 pages
Unit 1
No ratings yet
Unit 1
36 pages
Bits and Bytes
No ratings yet
Bits and Bytes
29 pages
Cs8351 Digital Principles and System Design
No ratings yet
Cs8351 Digital Principles and System Design
161 pages
Running Time of Programslucifer
No ratings yet
Running Time of Programslucifer
16 pages
Ch. 1 Digital Systems - Tagged
No ratings yet
Ch. 1 Digital Systems - Tagged
28 pages
TM112 Book
No ratings yet
TM112 Book
473 pages
DCN 157 - Introduction To IT (Lecture 3)
No ratings yet
DCN 157 - Introduction To IT (Lecture 3)
31 pages
Lecture Notes Fit1047
No ratings yet
Lecture Notes Fit1047
74 pages
ZNote for Theory
No ratings yet
ZNote for Theory
36 pages
ZNote for Theory A level
No ratings yet
ZNote for Theory A level
36 pages
CH - 1 - Digital Systems and Binary Numbers - S2-20-21
No ratings yet
CH - 1 - Digital Systems and Binary Numbers - S2-20-21
62 pages
Digital Logic & Design: Dr. Waseem Ikram
No ratings yet
Digital Logic & Design: Dr. Waseem Ikram
42 pages
Week 10-11 Module
No ratings yet
Week 10-11 Module
52 pages
CS34 Digital Principles and System Design
No ratings yet
CS34 Digital Principles and System Design
107 pages
DPSD Lecture Notes
No ratings yet
DPSD Lecture Notes
108 pages
The Design of C: A Rational Reconstruction: Goals of This Lecture
No ratings yet
The Design of C: A Rational Reconstruction: Goals of This Lecture
18 pages
Lecture2 Number Systems
No ratings yet
Lecture2 Number Systems
32 pages
Digital Electronics SPH 323
No ratings yet
Digital Electronics SPH 323
153 pages
Math For Developers
100% (1)
Math For Developers
39 pages
Chapter 2: Number System
100% (1)
Chapter 2: Number System
60 pages
Curriculum PDF
No ratings yet
Curriculum PDF
7 pages
Week 0 - Lectures
No ratings yet
Week 0 - Lectures
34 pages
Chapter 1-DigitalSystemsInformation
No ratings yet
Chapter 1-DigitalSystemsInformation
45 pages
Chapt 3 Number Representation
No ratings yet
Chapt 3 Number Representation
93 pages
201110_data_representation
No ratings yet
201110_data_representation
36 pages
02 Number Systems (1)
No ratings yet
02 Number Systems (1)
57 pages
N03_BinaryAdders
No ratings yet
N03_BinaryAdders
25 pages
Bits and bytes worksheet
No ratings yet
Bits and bytes worksheet
9 pages
Theory Notes Akher Kalam
No ratings yet
Theory Notes Akher Kalam
43 pages
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
From Everand
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
Fouad Sabry
No ratings yet
Nell: An SVG Drawing Language
From Everand
Nell: An SVG Drawing Language
Stefan Hollos
No ratings yet
Useful Formulae: Mathematical & Physical
From Everand
Useful Formulae: Mathematical & Physical
Matthew Watkins
No ratings yet
Code Generation and Optimization
No ratings yet
Code Generation and Optimization
17 pages
Lecture 1
No ratings yet
Lecture 1
36 pages
Assignment 03
No ratings yet
Assignment 03
4 pages
Sarwan Hakm: Software Developer
No ratings yet
Sarwan Hakm: Software Developer
2 pages
Algebraic_Graph_Theory
No ratings yet
Algebraic_Graph_Theory
13 pages
s00521-020-05359-8
No ratings yet
s00521-020-05359-8
16 pages
Grade 8 Term 1 Eos Revision Module Part1&2 With Answers
No ratings yet
Grade 8 Term 1 Eos Revision Module Part1&2 With Answers
24 pages
Mit Polytechnic, Pune.: Micro-Projrct
No ratings yet
Mit Polytechnic, Pune.: Micro-Projrct
19 pages
SemesterMark02-01-2025 07_56_53
No ratings yet
SemesterMark02-01-2025 07_56_53
1 page
Class 14 List Methods 2
No ratings yet
Class 14 List Methods 2
17 pages
PDF MIPS assembly language programming Robert Britton download
100% (2)
PDF MIPS assembly language programming Robert Britton download
41 pages
Information Security - Chapter 4
No ratings yet
Information Security - Chapter 4
14 pages
Pyramid
No ratings yet
Pyramid
5 pages
B.C.A. (SEM.-I) Examination Nove - Dece. - 2019 - 106 - Practical - 2
No ratings yet
B.C.A. (SEM.-I) Examination Nove - Dece. - 2019 - 106 - Practical - 2
2 pages
CHECKSUM
No ratings yet
CHECKSUM
6 pages
Asymptotic Notation - Analysis of Algorithms
No ratings yet
Asymptotic Notation - Analysis of Algorithms
37 pages
C S and Application UNIT-5
No ratings yet
C S and Application UNIT-5
348 pages
Learning Python Third Edition Mark Lutz - The complete ebook is available for download with one click
100% (1)
Learning Python Third Edition Mark Lutz - The complete ebook is available for download with one click
41 pages
Transmutation Table
100% (1)
Transmutation Table
10 pages
A Practical Approach to Compiler Construction 1st Edition Des Watson (Auth.) instant download
100% (3)
A Practical Approach to Compiler Construction 1st Edition Des Watson (Auth.) instant download
57 pages
Advanced C - Workshop: by Madheswaran D
No ratings yet
Advanced C - Workshop: by Madheswaran D
42 pages
International Conference On Innovative Computing and Communications Proceedings of ICICC 2020 Volume 1 Deepak Gupta
100% (3)
International Conference On Innovative Computing and Communications Proceedings of ICICC 2020 Volume 1 Deepak Gupta
62 pages
TensorRT Developer Guide
100% (1)
TensorRT Developer Guide
131 pages
Y7 End of Year Mark Scheme - Calculator
No ratings yet
Y7 End of Year Mark Scheme - Calculator
4 pages
EditDistance
No ratings yet
EditDistance
28 pages
Autoencoder-Based Anomaly Detection in Network Traffic
No ratings yet
Autoencoder-Based Anomaly Detection in Network Traffic
4 pages
Assignment No 3 Lex and Yacc
100% (1)
Assignment No 3 Lex and Yacc
4 pages
Bar Graphs and Pie Charts
No ratings yet
Bar Graphs and Pie Charts
6 pages
People'S University, Bhopal: Scheme of Examination School of Research & Technology
No ratings yet
People'S University, Bhopal: Scheme of Examination School of Research & Technology
12 pages

01 Intro Bits Bytes

Uploaded by

01 Intro Bits Bytes

Uploaded by

Introduction:

• Languages? Spoken and for programming

• Lectures, assignments, computer time

• Not based on book, but exists in books

• Small and familiar course

• Sorting and searching taught on basic algorithms

• Principles of algorithm analysis, O-notation etc.

1. Word frequencies + compact le

• Brief recapitulation of algorithm time complexity

• We take a step back: digital representation of information

• Bitwise operations, numbers

• If time: counting/bucket sort (“key-indexed counting”)

• Determine the “speed” of algorithms

• Find “faster” algorithms

• Approximate with “at most proportional to”, O-notation

Most common measure

• Upper bound: ~N lg N compares (given by mergesort)

Start with considering how many possible orderings there are

abc a<c bac b<c

acb cab bca cba

one leaf per possible ordering: 3! = 6 leaves

at least N! leaves no more than 2h leaves

• Worst case time: height h.

• Conclusion: Any algorithm must use lg (N!) ~ N lg N compares (worst case)

• So how can computers process and output graphics,

• What does it even mean “only 0 and 1”?

• (How to explain this to someone without comp sci

• Parameters: frequency, bits per sample…

PNG, JPEG, GIF, …

• 3 byte numbers, 256×256×256

8–11 2 110xxxxx 10xxxxxx

12–16 3 1110xxxx 10xxxxxx 10xxxxxx

17–21 4 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

• Recursively split the universe in two parts. Denote the

• Assign numbers to everything we want to represent.

• How many ”messages” are possible with 20 questions =

• What if some messages are (much) more likely than

• (How many bits would we need to distinguish any

Optimal number of bits to represent X:

~ Unary number representation

ِ ‫خ َوا ِرزْم‬ َ ‫َعبْ َداهلل ُم َح َّمد ِبن ُم‬

Signed 8-bit numbers (octets

Single precision, 32 bits: 1 sign, 8 exponent, 23 mantissa

+ 1.0009765625 · 2−2 = 0.250244140625

a & b is true only if both a and b are true

a ⊕ b is true when a and b are not equal

• Setting bit to 0: and 0 (and 1 means don't change

• Flipping bit: xor 1 (xor 0 means don't change)

• groups of 8: byte “strings”

666 & 0x = 154 2 154

20·2562 + 140·2561 + 92·2560 = 1346652

• Little-endian: byte 0 at address 0, byte 1 at address 1, …

• Big-endian: byte 0 at the last address of the word,

You might also like