0% found this document useful (0 votes)

60 views8 pages

2: Models of Computation: Al-Khw Arizm I

This document provides an overview of models of computation and algorithms. It discusses the random access machine model, pointer machine model, and Python model of computation. It also describes the document distance problem of computing the distance between two documents based on their shared words, and presents algorithms for solving this problem in linear time and space.

Uploaded by

Anonymous bccDLuxZs

We take content rights seriously. If you suspect this is your content, claim it here.

0% found this document useful (0 votes)

60 views8 pages

2: Models of Computation: Al-Khw Arizm I

Uploaded by

Anonymous bccDLuxZs

We take content rights seriously. If you suspect this is your content, claim it here.

You are on page 1/ 8

Lecture 2

6.006 Fall 2011

Lecture 2: Models of Computation

Lecture Overview
• What is an algorithm? What is time?

• Random access machine

• Pointer machine

• Python model

• Document distance: problem & algorithms

History
Al-Khwārizmı̄ “al-kha-raz-mi” (c. 780-850)
• “father of algebra” with his book “The Compendious Book on Calculation by Com
pletion & Balancing”

• linear & quadratic equation solving: some of the ﬁrst algorithms

What is an Algorithm?
• Mathematical abstraction of computer program

• Computational procedure to solve a problem

analog
program algorithm
built on
programming top of
pseudocode
language
model of
computer
computation

Figure 1: Algorithm

Model of computation speciﬁes

• what operations an algorithm is allowed

• cost (time, space, . . . ) of each operation

• cost of algorithm = sum of operation costs

1
Lecture 2 6.006 Fall 2011

Random Access Machine (RAM)

0
1
2
3
4
5
. .

. .

. .
} word

• Random Access Memory (RAM) modeled by a big array

• Θ(1) registers (each 1 word)

• In Θ(1) time, can

– load word @ ri into register rj

– compute (+, −, ∗, /, &, |, ˆ) on registers
– store register rj into memory @ ri

• What’s a word? w ≥ lg (memory size) bits

– assume basic objects (e.g., int) ﬁt in word

– unit 4 in the course deals with big numbers

• realistic and powerful → implement abstractions

Pointer Machine
• dynamically allocated objects (namedtuple)

• object has O(1) ﬁelds

• ﬁeld = word (e.g., int) or pointer to object/null (a.k.a. reference)

• weaker than (can be implemented on) RAM

2
Lecture 2 6.006 Fall 2011

val 5
prev null
next

val -1
prev
next null

Python Model
Python lets you use either mode of thinking

1. “list” is actually an array → RAM

L[i] = L[j] + 5 → Θ(1) time

2. object with O(1) attributes (including references) → pointer machine

x = x.next → Θ(1) time

Python has many other operations. To determine their cost, imagine implementation in
terms of (1) or (2):

1. list
(a) L.append(x) → θ(1) time
obvious if you think of inﬁnite array

but how would you have > 1 on RAM?

via table doubling [Lecture 9]
⎫
(b) L
' = L1v + L2" ≡ L = [ ] → θ(1)
⎪
⎪
⎪
⎪
(θ(1+|L1|+|L2|) time)
⎪
⎪
⎪
⎪
for x in L1: θ(|L1 |) ⎪
⎪
⎬
L.append(x) → θ(1)
⎪
⎪
⎪
for x in L2: θ(|L2 |) ⎪
⎪
⎪
⎪
L.append(x) → θ(1)
⎪
⎪
⎪
⎭

3
Lecture 2 6.006 Fall 2011

)
(c) L1.extend(L2) ≡ for x in L2: θ(1 + |L2 |) time
≡ L1+ = L2 L1.append(x) → θ(1)
)
(d) L2 = L1[i : j] ≡ L2 = [] θ(j − i + 1) = O(|L|)
for k in range(i, j):
L2.append(L1[i]) → θ(1)

(e)
⎫
b = x in L ≡ for y in L: ⎫ ⎪
⎪ θ(index of x) = θ(|L|)
& L.index(x) if x == y: ⎬ θ(1)
⎪
⎪
⎪ ⎪
⎪
⎪
& L.ﬁnd(x) b = T rue; ⎬
⎪
break ⎭ ⎪
⎪
⎪
else
⎪
⎪
⎪
⎪
⎭
b = F alse

(f) len(L) → θ(1) time - list stores its length in a ﬁeld

(g) L.sort() → θ(|L| log |L|) - via comparison sort [Lecture 3, 4 & 7)]

2. tuple, str: similar, (think of as immutable lists)

3. dict: via hashing [Unit 3 = Lectures

) 8-10]
D[key] = val
θ(1) time w.h.p.
key in D

4. set: similar (think of as dict without vals)

5. heapq: heappush & heappop - via heaps [Lecture 4] → θ(log(n)) time

6. long: via Karatsuba algorithm [Lecture 11]

x + y → O(|x| + |y|) time where |y| reﬂects # words
x ∗ y → O((|x| + |y|)log(3) ) ≈ O((|x| + |y|)1.58 ) time

Document Distance Problem — compute d(D1 , D2 )

The document distance problem has applications in ﬁnding similar documents, detecting
duplicates (Wikipedia mirrors and Google) and plagiarism, and also in web search (D2 =
query).
Some Deﬁnitions:

• Word = sequence of alphanumeric characters

• Document = sequence of words (ignore space, punctuation, etc.)

The idea is to deﬁne distance in terms of shared words. Think of document D as a vector:
D[w] = # occurrences of word W . For example:

4
Lecture 2 6.006 Fall 2011

dog

the

D1
cat

Figure 2: D1 = “the cat”, D2 = “the dog”

As a ﬁrst attempt, deﬁne document distance as

d/ (D1 , D2 ) = D1 · D2 = D1 [W ] · D2 [W ]
W

The problem is that this is not scale invariant. This means that long documents with 99%
same words seem farther than short documents with 10% same words.
This can be ﬁxed by normalizing by the number of words:
D 1 · D2
d// (D1 , D2 ) =
|D1 | · |D2 |

where |Di | is the number of words in document i. The geometric (rescaling) interpretation
of this would be that:
d(D1 , D2 ) = arccos(d// (D1 , D2 ))
or the document distance is the angle between the vectors. An angle of 0◦ means the two
documents are identical whereas an angle of 90◦ means there are no common words. This
approach was introduced by [Salton, Wong, Yang 1975].

Document Distance Algorithm

1. split each document into words

2. count word frequencies (document vectors)

3. compute dot product (& divide)

5
Lecture 2 6.006 Fall 2011

(1) re.ﬁndall (r“ w+”, doc) → what cost?

in general re can be exponential time ⎫
→ for char in doc: ⎪
⎪ Θ(|doc|)
⎪
if not alphanumeric ⎫
⎪
⎪
⎬
add previous word ⎬ Θ(1) ⎪
⎪
(if any) to list ⎪
⎪
⎪
⎪
⎪
start new word ⎭ ⎭

(2) sort word list ← O(k log k · |word|) where k is #words

Dictionary Approach

(2)’ count = {}
⎫
⎪
⎪
for word in doc:
⎪
⎪
⎪
⎪
⎪
if word in count: ← ⎫ Θ(|word|) + Θ(1) w.h.p ⎬
O(|doc|) w.h.p.
count[word] += 1 ⎪
⎬ ⎪
⎪
⎪
else Θ(1)
⎪
⎪
⎪
⎪
⎪ ⎭
count[word] = 1 ⎭

(3)’ as above → O(|doc1 |) w.h.p.

6
Lecture 2 6.006 Fall 2011

Code (lecture2 code.zip & data.zip on website)

t2.bobsey.txt 268,778 chars/49,785 words/3354 uniq

t3.lewis.txt 1,031,470 chars/182,355 words/8534 uniq
seconds on Pentium 4, 2.8 GHz, C-Python 2.62, Linux 2.6.26

• docdist1: 228.1 — (1), (2), (3) (with extra sorting)

words = words + words on line

• docdist2: 164.7 — words += words on line

• docdist3: 123.1 — (3)’ . . . with insertion sort

• docdist4: 71.7 — (2)’ but still sort to use (3)’

• docdist5: 18.3 — split words via string.translate

• docdist6: 11.5 — merge sort (vs. insertion)

• docdist7: 1.8 — (3) (full dictionary)

• docdist8: 0.2 — whole doc, not line by line

7
MIT OpenCourseWare
http://ocw.mit.edu

6.006 Introduction to Algorithms

Fall 2011

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

Introduction To Algorithms Lecture Notes (MIT 6 - 006) - It-eBooks - It-eBooks-2017, 2017 - IBooker It-eBooks - Anna's Archive
No ratings yet
Introduction To Algorithms Lecture Notes (MIT 6 - 006) - It-eBooks - It-eBooks-2017, 2017 - IBooker It-eBooks - Anna's Archive
150 pages
Algorithms 20
No ratings yet
Algorithms 20
217 pages
Algorithms: Freely Using The Textbook by Cormen, Leiserson, Rivest, Stein
No ratings yet
Algorithms: Freely Using The Textbook by Cormen, Leiserson, Rivest, Stein
204 pages
infIILecture5 en Handout
No ratings yet
infIILecture5 en Handout
50 pages
Mit6 100l f22 Lec23
No ratings yet
Mit6 100l f22 Lec23
69 pages
Lec08 Sort Scale
No ratings yet
Lec08 Sort Scale
102 pages
Daa Unt 1
No ratings yet
Daa Unt 1
72 pages
Lecture13 IO BLG336E
No ratings yet
Lecture13 IO BLG336E
58 pages
Lec 9
No ratings yet
Lec 9
86 pages
Lec1 Intro v2 Light 1up
No ratings yet
Lec1 Intro v2 Light 1up
75 pages
Lecture 11
No ratings yet
Lecture 11
40 pages
cs330 10 Notes PDF
No ratings yet
cs330 10 Notes PDF
204 pages
Introduction To Asymptotic Analysis
No ratings yet
Introduction To Asymptotic Analysis
38 pages
Diver Easy Pro Key
100% (1)
Diver Easy Pro Key
5 pages
Time and Space Complexity - 2023
No ratings yet
Time and Space Complexity - 2023
47 pages
L2.1 Intro To DSA
No ratings yet
L2.1 Intro To DSA
57 pages
Lec - 3 Final
No ratings yet
Lec - 3 Final
52 pages
2 - 2 - Analysis of Algorithm. Analysis of Algorithm
No ratings yet
2 - 2 - Analysis of Algorithm. Analysis of Algorithm
24 pages
M Aang Dsa Pattern
No ratings yet
M Aang Dsa Pattern
30 pages
Functions As Objects, Dictionaries
No ratings yet
Functions As Objects, Dictionaries
37 pages
1 - Abstract-Note Version
No ratings yet
1 - Abstract-Note Version
62 pages
Lec1 Intro v2 Dark 6up
No ratings yet
Lec1 Intro v2 Dark 6up
13 pages
Lec1 Intro v2 Dark
No ratings yet
Lec1 Intro v2 Dark
75 pages
02AlgorithmAnalysis Dönüştürüldü
No ratings yet
02AlgorithmAnalysis Dönüştürüldü
59 pages
Unit 6
No ratings yet
Unit 6
39 pages
Data Structure - Chap 1
No ratings yet
Data Structure - Chap 1
42 pages
H170227e Vimbainashe Chigumbu Daa-3
No ratings yet
H170227e Vimbainashe Chigumbu Daa-3
8 pages
Mastering Core Data With Swift
100% (2)
Mastering Core Data With Swift
276 pages
02b Analysis
No ratings yet
02b Analysis
19 pages
CS 332: Algorithms: Review For Exam 1
No ratings yet
CS 332: Algorithms: Review For Exam 1
60 pages
Soluzioni Cormen
No ratings yet
Soluzioni Cormen
20 pages
Automata Group Assignment
No ratings yet
Automata Group Assignment
7 pages
Daa
No ratings yet
Daa
20 pages
DSA - Notes 2
No ratings yet
DSA - Notes 2
19 pages
CSE 241 Class Notes
No ratings yet
CSE 241 Class Notes
7 pages
Algorithm Introduction
No ratings yet
Algorithm Introduction
28 pages
Python DS Notes Detailed
No ratings yet
Python DS Notes Detailed
6 pages
Preliminaries: COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski
No ratings yet
Preliminaries: COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski
31 pages
Python Lecture 10-Efficiency
No ratings yet
Python Lecture 10-Efficiency
26 pages
2.maths Background
No ratings yet
2.maths Background
31 pages
ps1 PDF
No ratings yet
ps1 PDF
5 pages
Introduction To Algorithms, Recitation 1
No ratings yet
Introduction To Algorithms, Recitation 1
7 pages
CS1010S Cheatsheet
No ratings yet
CS1010S Cheatsheet
3 pages
Lecture21 PDF
No ratings yet
Lecture21 PDF
13 pages
John V. Guttag - Introduction To Computation and Programming Using Python - With Application To Understanding Data-The MIT Press (2016) PDF
100% (1)
John V. Guttag - Introduction To Computation and Programming Using Python - With Application To Understanding Data-The MIT Press (2016) PDF
17 pages
Module 5 Atc
No ratings yet
Module 5 Atc
12 pages
Python Cost Model: Docdist1
No ratings yet
Python Cost Model: Docdist1
12 pages
Daa Spring 2024 & Fall 2023
No ratings yet
Daa Spring 2024 & Fall 2023
9 pages
Course Notes 2: N I 1 N (n+1) 2
No ratings yet
Course Notes 2: N I 1 N (n+1) 2
7 pages
1.2 Five Representative Problems
No ratings yet
1.2 Five Representative Problems
30 pages
Assignment 02 - Spring 21
No ratings yet
Assignment 02 - Spring 21
4 pages
Midsemester Exam Solutions
No ratings yet
Midsemester Exam Solutions
6 pages
Unit-1 DAA Notes - Daa Unit 1 Note Unit-1 DAA Notes - Daa Unit 1 Note
No ratings yet
Unit-1 DAA Notes - Daa Unit 1 Note Unit-1 DAA Notes - Daa Unit 1 Note
26 pages
Toc Recursive Function Theory
100% (1)
Toc Recursive Function Theory
83 pages
cmsc420 2020 08 Handouts
No ratings yet
cmsc420 2020 08 Handouts
53 pages
University of Toronto Winter 2011 Midterm: Total: / 45
No ratings yet
University of Toronto Winter 2011 Midterm: Total: / 45
8 pages
MIT6 0001F16 Lec11
No ratings yet
MIT6 0001F16 Lec11
40 pages
Comp Fundamentals and Programming - Syllabus
No ratings yet
Comp Fundamentals and Programming - Syllabus
4 pages
MOS - IsO 27001 2022 Mandatory Documents List
No ratings yet
MOS - IsO 27001 2022 Mandatory Documents List
11 pages
cs3230 Cheatsheet
No ratings yet
cs3230 Cheatsheet
6 pages
DAA - Questions With Answer
No ratings yet
DAA - Questions With Answer
9 pages
08 Odds and Ends v2 Annotated PDF
No ratings yet
08 Odds and Ends v2 Annotated PDF
49 pages
CUCM - PBX Assessment Example
No ratings yet
CUCM - PBX Assessment Example
15 pages
Microsoft Access
No ratings yet
Microsoft Access
20 pages
Ref ET V3.1 Users Manual
No ratings yet
Ref ET V3.1 Users Manual
99 pages
Chapter 6
No ratings yet
Chapter 6
26 pages
Infs2608 Notes
No ratings yet
Infs2608 Notes
9 pages
2019 PSC Question Paper Collection @PSC - PDF - Bank PDF
No ratings yet
2019 PSC Question Paper Collection @PSC - PDF - Bank PDF
84 pages
Introducing Web Forms: VB Intro1.aspx
No ratings yet
Introducing Web Forms: VB Intro1.aspx
42 pages
Data Domain Symantec NetBackup Best Practices
No ratings yet
Data Domain Symantec NetBackup Best Practices
35 pages
Project Report - Stock
No ratings yet
Project Report - Stock
56 pages
Nvidia-Learning-Training Course-Catalog
No ratings yet
Nvidia-Learning-Training Course-Catalog
32 pages
Nord Electro 6 English User Manual v2.6x Edition J
No ratings yet
Nord Electro 6 English User Manual v2.6x Edition J
40 pages
Getting Started: Matlab Practice Sessions
No ratings yet
Getting Started: Matlab Practice Sessions
5 pages
PHD Thesis Topics in Data Mining
100% (2)
PHD Thesis Topics in Data Mining
5 pages
Loading SW: 1. When Installing SW On A Vivid I - Q You Will Get The Following Options
No ratings yet
Loading SW: 1. When Installing SW On A Vivid I - Q You Will Get The Following Options
16 pages
Software Quality Concepts
No ratings yet
Software Quality Concepts
38 pages
ICTM (3) - Other Hospital Information System
No ratings yet
ICTM (3) - Other Hospital Information System
3 pages
Course Outline (E-Tech)
0% (1)
Course Outline (E-Tech)
2 pages
Clonamos El Repositorio para Obtener Los Dataset: From Import
No ratings yet
Clonamos El Repositorio para Obtener Los Dataset: From Import
23 pages
Termux GUI Logcat
No ratings yet
Termux GUI Logcat
4 pages
Readme
No ratings yet
Readme
6 pages
DB Assignment2report
No ratings yet
DB Assignment2report
4 pages
Cyber Attacks
No ratings yet
Cyber Attacks
5 pages
Anshul Final Ultra
No ratings yet
Anshul Final Ultra
3 pages
Contents
No ratings yet
Contents
8 pages
Preconditions: C++ & Fortran Development in Windows Using The Mingw-W64 GCC and Netbeans
No ratings yet
Preconditions: C++ & Fortran Development in Windows Using The Mingw-W64 GCC and Netbeans
1 page
Python programming for beginners: Python programming for beginners by Tanjimul Islam Tareq
From Everand
Python programming for beginners: Python programming for beginners by Tanjimul Islam Tareq
Tanjimul Islam Tareq
No ratings yet
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet

2: Models of Computation: Al-Khw Arizm I

Uploaded by

2: Models of Computation: Al-Khw Arizm I

Uploaded by

Lecture 2

6.006 Fall 2011

Lecture 2: Models of Computation

• Random access machine

• Document distance: problem & algorithms

• linear & quadratic equation solving: some of the ﬁrst algorithms

• Computational procedure to solve a problem

Model of computation speciﬁes

• cost (time, space, . . . ) of each operation

• cost of algorithm = sum of operation costs

Random Access Machine (RAM)

• Random Access Memory (RAM) modeled by a big array

• Θ(1) registers (each 1 word)

• In Θ(1) time, can

– load word @ ri into register rj

• What’s a word? w ≥ lg (memory size) bits

– assume basic objects (e.g., int) ﬁt in word

• realistic and powerful → implement abstractions

• object has O(1) ﬁelds

• ﬁeld = word (e.g., int) or pointer to object/null (a.k.a. reference)

• weaker than (can be implemented on) RAM

1. “list” is actually an array → RAM

2. object with O(1) attributes (including references) → pointer machine

but how would you have > 1 on RAM?

(f) len(L) → θ(1) time - list stores its length in a ﬁeld

2. tuple, str: similar, (think of as immutable lists)

3. dict: via hashing [Unit 3 = Lectures

4. set: similar (think of as dict without vals)

5. heapq: heappush & heappop - via heaps [Lecture 4] → θ(log(n)) time

6. long: via Karatsuba algorithm [Lecture 11]

Document Distance Problem — compute d(D1 , D2 )

• Word = sequence of alphanumeric characters

• Document = sequence of words (ignore space, punctuation, etc.)

Figure 2: D1 = “the cat”, D2 = “the dog”

As a ﬁrst attempt, deﬁne document distance as

Document Distance Algorithm

2. count word frequencies (document vectors)

3. compute dot product (& divide)

(1) re.ﬁndall (r“ w+”, doc) → what cost?

(2) sort word list ← O(k log k · |word|) where k is #words

(3)’ as above → O(|doc1 |) w.h.p.

Code (lecture2 code.zip & data.zip on website)

t2.bobsey.txt 268,778 chars/49,785 words/3354 uniq

• docdist1: 228.1 — (1), (2), (3) (with extra sorting)

• docdist2: 164.7 — words += words on line

• docdist3: 123.1 — (3)’ . . . with insertion sort

• docdist4: 71.7 — (2)’ but still sort to use (3)’

• docdist5: 18.3 — split words via string.translate

• docdist6: 11.5 — merge sort (vs. insertion)

• docdist7: 1.8 — (3) (full dictionary)

• docdist8: 0.2 — whole doc, not line by line

6.006 Introduction to Algorithms

You might also like