PDSP2023 Lecture09 12sep2023
PDSP2023 Lecture09 12sep2023
Arrays ¶
Contiguous block of memory
Typically size is declared in advance, all values are uniform
a[0] points to first memory location in the allocated block
Locate a[i] in memory using index arithmetic
Skip i blocks of memory, each block's size determined by value stored in array
Random access -- accessing the value at a[i] does not depend on i
Useful for procedures like sorting, where we need to swap out of order values a[i] and a[j]
a[i], a[j] = a[j], a[i]
Cost of such a swap is constant, independent of where the elements to be swapped are in the array
Inserting or deleting a value is expensive
Need to shift elements right or left, respectively, depending on the location of the modification
Lists
Each location is a cell, consisiting of a value and a link to the next cell
Think of a list as a train, made up of a linked sequence of cells
The name of the list l gives us access to l[0] , the first cell
To reach cell l[i] , we must traverse the links from l[0] to l[1] to l[2] …to l[i-1] ] to l[i]
Takes time proportional to i
Cost of swapping l[i] and l[j] varies, depending on values i and j
On the other hand, if we are already at l[i] modifying the list is easy
Insert - create a new cell and reroute the links
Delete - bypass the deleted cell by rerouting the links
Each insert/delete requires a fixed amount of local "plumbing", independent of where in the list it is performed
Dictionaries
Values are stored in a fixed block of size𝑚
{0,1,…,𝑚 − 1}
Keys are mapped to
Hash function ℎ:𝐾→𝑆 maps a large set of keys𝐾 𝑆
to a small range
𝑘∈𝐾 𝑛 𝑛𝑘 mod 𝑚 |𝑆| = 𝑚
ℎ(𝑘1 ) = ℎ(𝑘𝑘2 )
Simple hash function: interpret as a bit sequence representing a number in binary, and compute , where
Mismatch in sizes means that there will be collisions -- 𝑘1 ≠ 𝑘2 , but
A good hash function maps keys "randomly" to minimize collisions
Hash can be used as a signature of authenticity
𝑘
Modifying slightly will drastically alterℎ(𝑘)
No easy way to reverse engineer a 𝑘′ to map to a givenℎ(𝑘)
Use to check that large files have not been tampered with in transit, either due to network errors or malicious intervention
Dictionary uses a hash function to map key values to storage locations
Lookup requires computing ℎ(𝑘) which takes roughly the same time for any 𝑘
Compare with computing the offset a[i] for any index i in an array
Collisions are inevitable, different mechanisms to manage this, which we will not discuss now
Effectively, a dictionary combines flexibility with random access
Lists in Python
Flexible size, allow inserting/deleting elements in between
However, implementation is an array, rather than a list
Initially allocate a block of storage to the list
When storage runs out, double the allocation
l.append(x) is efficient, moves the right end of the list one position forward within the array
l.insert(0,x) inserts a value at the start, expensive because it requires shifting all the elements by 1
We will run experiments to validate these claims
3.1834037989901844
5.753009960986674
5.5166299150150735
Doubling and tripling the work multiplies the time by 4 and 9, respectively, so quadratic
In [5]: start = time.perf_counter()
l = []
for i in range(200000):
l.insert(0,i)
elapsed = time.perf_counter() - start
print(elapsed)
17.979196411994053
43.46195148699917
3.8069355089974124
9.057193082000595
Implementing a "real" list using dictionaries
node = l
while node["next"] != {}:
node = node["next"]
node["next"]["value"] = x
node["next"]["next"] = {}
return
def listinsert(l,x):
if l == {}:
l["value"] = x
l["next"] = {}
return
newnode = {}
newnode["value"] = l["value"]
newnode["next"] = l["next"]
l["value"] = x
l["next"] = newnode
return
def printlist(l):
print("{",end="")
if l == {}:
print("}")
return
node = l
print(node["value"],end="")
while node["next"] != {}:
node = node["next"]
print(",",node["value"],end="")
print("}")
return
0.020103318995097652
{'value': 0, 'next': {'value': 1, 'next': {'value': 2, 'next': {'value': 3, 'next': {'value': 4, 'next': {'value': 5, 'nex
t': {'value': 6, 'next': {'value': 7, 'next': {'value': 8, 'next': {'value': 9, 'next': {}}}}}}}}}}}
3.375442454998847
6.131248404999496
9.82448883599136
2.685035665985197
Set comprehension
Defining new sets from old
{𝑥2 ∣ 𝑥 ∈ ℤ,𝑥 ≥ 0 ∧ (𝑥 mod 2) = 0}
𝑥 ∈ ℤ, generating set
𝑥2≥ 0 ∧ (𝑥 mod 2) = 0, filtering condition
𝑥 , output transformation
More generally {𝑓(𝑥) ∣ 𝑥 ∈ 𝑆,𝑝(𝑥)}
generating set 𝑆
filtering predicate 𝑝()
transformer function 𝑓()
In [15]: evensqlist = []
for i in range(20):
if i % 2 == 0:
evensqlist.append(i*i)
print(evensqlist)
In [17]: l1
Out[17]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
In [18]: l2
In [19]: l3
Out[19]: [0,
1,
4,
9,
16,
25,
36,
49,
64,
81,
100,
121,
144,
169,
196,
225,
256,
289,
324,
361]
In [20]: l4
Out[20]: [0, 4, 16, 36, 64, 100, 144, 196, 256, 324]
List comprehension
[ f(x) for x in ... if p(x) ]
Out[21]: [0, 4, 16, 36, 64, 100, 144, 196, 256, 324]
Out[22]: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
In [25]: N = 20
triples = []
for x in range(1,N+1):
for y in range(x,N+1):
for z in range(y,N+1):
if x*x + y*y == z*z:
triples.append((x,y,z))
In [26]: triples
Out[26]: [(3, 4, 5), (5, 12, 13), (6, 8, 10), (8, 15, 17), (9, 12, 15), (12, 16, 20)]
Out[27]: [(3, 4, 5), (5, 12, 13), (6, 8, 10), (8, 15, 17), (9, 12, 15), (12, 16, 20)]
Pull out all dictionary values where the keys satisfy some property: e.g. all marks below 50
[ d[k] for k in d.keys() if p(k) ]
Symmetrically, keys whose values satisfy some property: e.g. all roll numbers where marks are below 50
[ k for k in d.keys() if p(d[k]) ]
Or, extract (key,value) pairs of interest
[ (k,d[k]) for k in d.keys() if p(d[k]) ]