0% found this document useful (0 votes)
56 views45 pages

DSAP-Lecture 5 - Array Based Sequences

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views45 pages

DSAP-Lecture 5 - Array Based Sequences

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Array-Based Sequences

Data Structures and Algorithms with Python

Lecture 5
Overview
● Python Sequence Types
● Low-level arrays
○ Referential Arrays
○ Compact Arrays
● Dynamic Arrays and Amortization
○ Implementing a dynamic array
○ Amortized analysis of dynamic array
○ Python’s list class
● Efficiency of Python’s sequence types
○ List and Tuple classes
○ String class
● Using Array-based sequences
○ Storing High scores for a game
○ Sorting a sequence
○ Simple Cryptography
● Multidimensional datasets
Python’s Sequence Types

● Sequence classes: list, tuple and str


● Common behaviour:
○ Each support indexing to access an individual element of a sequence: e.g seq[k]
○ Each use a low-level concept known as an array to represent the sequence.
● Difference lies in
○ The abstraction these classes represent and how these complex data structures
are implemented.
● A deeper understanding of these data structures is important for a good
programmer.
Low-level Arrays Memory address

● The primary memory of a computer is composed of bits of information.Data as byte of


information

● Those bits are typically grouped into larger units that depend upon the precise system
architecture. Such a typical unit is a byte, which is equivalent to 8 bits.

● Each byte of information in the memory is associated with a unique number that serves
as its address - memory address.

● Random Access Memory (RAM): Despite the sequential layout of memory locations,
computer hardware is designed to access any memory location efficiently using its
memory address. That is, it is just as easy to retrieve byte #8675309 as it is to retrieve
byte #309.

● Any individual byte of memory can be stored or retrieved in O(1) time.


● An array is a group of related variables
stored one after the another in a
contiguous portion of the computer’s ● Each unicode character is represented by 16
memory. bits or 2 bytes.
Example: A text string is stored as an ● A six-character string such as ‘SAMPLE’ would
ordered sequence of individual be stored in 12 consecutive bytes of memory.

characters. ● This is an array of 6 characters and requires 12


bytes of memory.

● Each cell of an array must use the ● Each location within the array is called a cell
and an integer index will be used to describe
same number of bytes. This allows an the cell location in the array.
arbitrary cell to be accessed in
● As example cell with index 4 contains ‘L’ that is
constant time. stored in memory locations 2154 and 2155
respectively.

● Address of any cell with index k = start +


cellsize * k
Referential Arrays Consider a list of ‘strings’ of different sizes

● Python represents a list or tuple instance by


using an internal storage mechanism of an
array of object references.

● So, at the lowest level, the array consists of


a sequence of memory addresses where the
elements of the sequence reside.

● Each memory address is of constant size


and hence, python can provide
constant-time access to a list or tuple
element based on its index.

● The list or tuple will simply keep a sequence


of references to the objects to be stored.
● A single list instance may include multiple
references to the same object as elements of the
list.

● It is also possible for a single object to be an


element of two or more lists, as those lists simply
store references back to that object.

● temp[2] = 15 only changes the references to an


immutable integer object ‘15’.

● backup = list(primes) creates a new list that


contains the same references as that present in
primes (shallow copy).

● A deep copy should be used for copying a mutable


list.
● counters = [0]*8 produces a list of
references to the same element ‘0’
which is an immutable object.

● counters[2] += 1 creates a new


integer with value 0+1 and sets its
reference to the cell 2 of the array.

● Extending a list - receives references


to newly added elements.
extras = [23, 29, 31]
primes.extend(extras)
Compact Arrays in Python
● A compact array stores primary data (in the
form of bits) instead of their references.
Example - strings. > import array
> primes = array( ‘i’ , [2, 3, 5, 7, 11])

● Advantages of Compact arrays:


○ Overall memory usage will be much lower
compared to referential arrays - no overhead for
storing references.
○ Primary data is stored consecutively in memory.
This provides higher computing performance.

● Primary support for compact arrays is in a


module named array.
Dynamic Arrays
● When creating low-level arrays, it is necessary to specify the precise size of that
array explicitly during instantiation.

● The same is applicable to Python’s tuple and str instance which are immutable -
Once created, its size or content can not altered.

● Python’s list class represents a dynamic array where elements could be added to
or removed from the list.

● Semantics of Dynamic array


○ It maintains an underlying array that has greater capacity than the current
length of the list.

○ If the reserved capacity is exhausted, the system initializes a new array larger
in size and the older array is then reclaimed by the system.
● An empty list requires some space (64 bytes on
this machine). It is required for keeping following
information:
○ Number of actual elements currently stored in
the list.
○ The maximum number of elements that could 32 bytes are reserved for storing 4
be stored. object references.

○ The reference to the currently allocated array. Space for 4 object


references

● Each memory address is a 64-bit (8 byte) number.


Space for 8 object
references stored.
● Reserve capacity increases 32 bytes (4 cells) to
64 (8 cells) and so on as the array is extended
continuously by adding new objects.
Space for 9 object
references stored.
Implementing a dynamic array

● The key is to provide a means to grow the array A that stores the elements of a list.
● If an element is appended to a list at a time when the underlying array is full, we perform the
following steps:
○ Allocate a new array B with larger capacity.
○ Set where n denotes the current number of items.
○ Set A = B, i.e, we use B as the array supporting the list.
○ Insert the new element in the new array.

● How large of a new array to create?


○ Common rule - the new array to have twice the capacity of the existing array that is filled.
Creation of new
array + copying

Copy
operation

Run-time of a series of append operations on a dynamic array


Amortized Analysis

● Amortized analysis refers to determining the time-averaged running time for a


sequence of operations.

● It is different from average-case analysis as it does not make any assumption


about the distribution of data values.

● Amortized analysis is a worst-case analysis for a sequence of operations rather


than for individual operations.

● The motivation for amortized analysis is to better understand the running time of
certain techniques, where standard worst-case analysis provides an overly
pessimistic bound.
● Amortized analysis generally applies to a method that consists of a sequence
of operations, where the vast majority of the operations are cheap, but some
of the operations are expensive. If we can show that expensive operations
are rare, we can change them to the cheap operations, and only bound the
cheap operations.

● Assign an artificial cost to each operation in the sequence, such that the total
of the artificial costs for the sequence of operations bounds the total of the
real costs of the sequence.

● This artificial cost is called the amortized cost of an operation.


Amortized Analysis of Dynamic Arrays

● View the computer as a coin-operated appliance that requires the payment


of 1 cyber-dollar for a constant amount of computing time.

● When an operation is executed, we should have enough cyber-dollars


available in our current bank account for that operation’s running time.

● Thus the total amount of cyber-dollars spent for any computation will be
proportional to the total time spent on that computation.

● We can overcharge some operations in order to save up cyber-dollars to pay


for others.
Proposition 5.1: Let S be a sequence implemented by means of a dynamic array with initial
capacity one, using the strategy of doubling the array size when full. The total time to
perform a series of n append operations in S, starting from S being empty, is O(n).

Justification:

● Let’s assume each append operation in S costs 1 cyber-dollar excluding time spent for
growing the array.

● Let’s assume growing the array from size k to size 2k requires k cyber-dollars.

● Let’s charge 3 cyber-dollars for each append operation that does not cause an
overflow.

● There is a profit of 2 cyber-dollars for each insertion that does not grow the array and is
stored in the cell where an element is inserted.

● An array overflow occurs when S has elements.


● Doubling the size of the array will Pays for doubling
the array size
require cyber-dollars.
● These cyber-dollars are stored in
cells and .
● This is a valid amortization
scheme in which each operation is
charged 3 cyber-dollars and all
computing time is paid for.
● Hence for n append operations,
we need 3n cyber-dollars.
● The amortized running time of
each append operation is O(1).
● Hence, the total running time of n
append operations is O(n).
● Geometric Increase in capacity
○ The O(1) amortized bound per operation can be proven for any geometrically
increasing progression of array sizes.

● Beware of Arithmetic Progression:


○ Reserving a constant number of additional cells during array resize leads to poor
performance.
○ At an extreme, adding one cell with each append operation leads to a quadratic
overall cost:

○ Using increase of 2 or 3 cells at a time is slightly better but still leads to quadratic
overall cost.

○ Using a fixed increment for each resize - arithmetic progression of intermediate


array sizes, results in overall time that is quadratic in the number of operations.
Increasing array size Increasing array size
by 2 Cells. by 3 Cells.
Proposition 5.2: Performing a series of n append operations on an initially empty
dynamic array using a fixed increment with each resize takes Ω(n^2 ) time.

Justification:

● Let represent the fixed increment in capacity that is used for each
resize event.

● For a series of n append operations, there will be resize

operations taking time (number of operations).

● Overall time will be proportional to the sum :


Memory Usage and Shrinking an Array
● Another consequence of the rule of geometric increase in capacity when appending to
a dynamic array is that the final array size is guaranteed to be proportional to the
overall number of elements. That is, the data structure uses O(n) memory.

● If a container, such as a Python list, provides operations that cause the removal of one
or more elements, greater care must be taken to ensure that a dynamic array
guarantees O(n) memory usage.

● Repeated insertions may cause the underlying to grow arbitrarily large which will no
longer be a proportional relationship between the actual number of elements.

● A robust implementation of such a data structure will shrink the underlying array, on
occasion, while maintaining O(1) amortized bound on individual operations.
Python’s List class

● Python’s implementation of the


append method exhibits
amortized constant-time
behaviour.

● As we can see the amortized


time for each append is
independent of n.

n 100 1000 10,000 100,000 1,000,000 100,000,000

µs 0.278 0.181 0.087 0.099 0.133 0.079


Efficiency of Python’s Sequence Types

● Tuples are typically more memory efficient than lists because


they are immutable.
● Tuple class implements non-mutating behaviour of list class.
● Constant-time operations:
○ len(data) and data[j]
● Searching for occurrence of a value
○ Computing count requires looping over all elements in the
array
○ The loop for Checking containment of an element or
determining index of an element immediately exit once
they find the leftmost occurrence of desired value.
○ Lexicographic comparisons - in the worst case, evaluation
requires an iteration time proportional to the length of the Asymptotic performance of the
shorter of two sequences non-mutating behaviors of the list and
○ Creating a new instances (last 3) - running time depends tuple classes.
on the length of new array size to be created.
Mutating Behaviours

● __setitem__() method requires


constant time.
● Adding element to a list
● Removing element from a list
● Extending a list
● Constructing new list

Asymptotic performance of the


mutating behaviors of the list class
Adding element to a list

● insert(k, value), 0 <= k <= n


requires shifting all subsequent
elements back one slot to make
room.
○ Array resizing with a worst-case
run-time of but only O(1)
amortized time per operation.

○ Shifting of elements to make room for


the new item with amortized
performance for inserting at index k.
Analyzing the run-time performance of
Python’s insert() operation.

● Inserting at the beginning of the


list is most expensive requiring Average running time of insert(k, val)
linear time per operation. measured in microseconds over a sequence of N
calls.

● Inserting at the middle is half


the time as inserting at the
beginning, still Omega(n) time.

● Inserting at end displays O(1)


behaviour.
Removing Elements from a List

● pop() - removes the last element from


a list ~ O(1).

● pop(k) removes the element that is at


index k < n of a list ~ O(n-k) time
requirement as the amount of shifting
depends on the choice of k. pop(0) is
most expensive using time.

● remove() can remove a specified


value (without specifying any index)
from the list, requires time.
Extending a List
● Used to add all elements of one list to the end of a second list.
● data.extend(other) is equivalent to:

● Running time is proportional to length of the other list ~ O(n) amortized as the first list could
be resized to accommodate additional elements.

● The extend method is preferable to repeated calls to append. It offers three advantages:
○ Built-in python implementations are comparatively efficient as they are implemented
natively in a compiled language (compared to interpreted python code).
○ There is less overhead in calling a single function that accomplishes all the work,
versus many individual function calls.
○ The increased efficiency of extend comes from the fact the resulting size of the
updated list can be calculated in advance.
Constructing New list
● Asymptotic efficiency is linear in length of the
list that is created.

● List comprehension syntax is significantly


faster than building the list by repeatedly
appending.

● It is more efficient to create a list by using


[0]*n than building such a list incrementally.
Python’s String Class
Some Notations:

● Length of a string: n
● Length of a second string: m

Some common behaviours:

● Methods that produce a new string (e.g., capitalize(), center(),


strip()) require time that is linear in length of the string that is produced.

● Behaviours that test Boolean conditions on a string (e.g. islower()) take


O(n) time, examining all n characters in the worst case.
● Pattern Matching - finding one string within a larger string
○ Examples: __contains__, find, index, count, replace, split
○ Naive implementation will lead to O(mn) time complexity. Because, there are n-m+1 starting
indices for the pattern and we spend O(m) time at each starting position checking if the pattern
matches.

● Composing Strings
○ Creating new string from another string.
○ Example: We have a large string named ‘document’. Our goal is to create a new string ‘letters’
that contains only alphabetic characters of the original string ( with spaces, numbers and
punctuation removed).
○ Implementation #1
● Implementation #2

● Implementation #3 (use list comprehension instead of repeated calls to


append):

● Implementation #4 (avoids the temporary list altogether)


Using Array-Based Sequences

We will study the following three examples:

● Storing High scores for a game


● Sorting a Sequence
● Simple Cryptography
Storing High Scores for a game

● The objective is to store high score entries for a video


game.

● Each game entry has a name and a score.


Corresponding class is GameEntry.
ScoreBoard implemented as a
● To maintain a sequence of high scores, we create dynamic array
another class called Scoreboard.

● Scoreboard can have limited number of entries. A new


score only qualifies for the scoreboard if it is strictly
higher than the lowest of the high scores in the board.

● When a new entry is made to the scoreboard,


elements are shifted to the right to create an
appropriate space for new score.
Sorting a Sequence

● Objective is to sort an unordered sequence of elements into non-decreasing


order of its values.

● Insertion-sort Algorithm
○ Start with the first element in the array. One element by itself is sorted.
○ Consider the second element. If it is smaller than the first, we swap them.
○ Consider the third element. We swap it leftward until it is in the proper order with the first
two elements.
○ And so on ...
Simple Cryptography plaintext

Encryption

Decryption
● Caesar Cipher: Involves
replacing each letter in a cyphertext
message with the letter
that is a certain number of
letters after it in the
alphabet.

● The replacement rule


could be represented
using another string.
● In order to find a replacement for a character
in our Caesar cipher, we need to map the
characters A to Z to the respective numbers
0 to 25.

● The formula for doing that conversion is j =


ord(c) − ord( A ).

● Encryption: Replace each letter i with the


letter (i+r)% 26

● Decryption: replace each letter i with the


letter (i-r)%26
Multi-dimensional Data Sets

● Have more than one dimensions. A


two-dimensional list may be called a
matrix.

● Examples: Two-dimensional integer dataset (Number of


stores in various regions in district)
○ Geographical information may be
naturally represented in two-dimensions.
○ Range data obtained from a lidar is a 3D
data.
○ Monochrome images are 2D images
Data = [ [22,18, 709, 5, 33], [45, 32, 830, 120, 750],
[4, 880, 45, 66, 61] ]

Each element can be accessed by using a syntax:


data[1][3]
Constructing multi-dimensional list

● Quickly initialize a one-dimensional list:


Data = [0] * n
Creates 1-D list

● Initialize multi-dimensional list:


The problem is that all r
entries of the list known as
data are references to the
same instance of a list of c
zeros
# Correct implementation for initializing a 2-dimensional list

This ensures that each cell of the primary list


refers to an independent instance of a secondary
list
Summary
We studied the following in this lecture:

● Working of the low-level arrays - initialization, accessing elements etc.


● Referential arrays, Compact arrays.
● Creating and managing dynamic array
● Understanding python array-based sequences: list, tuple, string
● Analyzing the run-time complexity of managing these array-based sequences
● Few applications of array-based sequences
● Creating multi-dimensional arrays

You might also like