DSAP-Lecture 5 - Array Based Sequences
DSAP-Lecture 5 - Array Based Sequences
Lecture 5
Overview
● Python Sequence Types
● Low-level arrays
○ Referential Arrays
○ Compact Arrays
● Dynamic Arrays and Amortization
○ Implementing a dynamic array
○ Amortized analysis of dynamic array
○ Python’s list class
● Efficiency of Python’s sequence types
○ List and Tuple classes
○ String class
● Using Array-based sequences
○ Storing High scores for a game
○ Sorting a sequence
○ Simple Cryptography
● Multidimensional datasets
Python’s Sequence Types
● Those bits are typically grouped into larger units that depend upon the precise system
architecture. Such a typical unit is a byte, which is equivalent to 8 bits.
● Each byte of information in the memory is associated with a unique number that serves
as its address - memory address.
● Random Access Memory (RAM): Despite the sequential layout of memory locations,
computer hardware is designed to access any memory location efficiently using its
memory address. That is, it is just as easy to retrieve byte #8675309 as it is to retrieve
byte #309.
● Each cell of an array must use the ● Each location within the array is called a cell
and an integer index will be used to describe
same number of bytes. This allows an the cell location in the array.
arbitrary cell to be accessed in
● As example cell with index 4 contains ‘L’ that is
constant time. stored in memory locations 2154 and 2155
respectively.
● The same is applicable to Python’s tuple and str instance which are immutable -
Once created, its size or content can not altered.
● Python’s list class represents a dynamic array where elements could be added to
or removed from the list.
○ If the reserved capacity is exhausted, the system initializes a new array larger
in size and the older array is then reclaimed by the system.
● An empty list requires some space (64 bytes on
this machine). It is required for keeping following
information:
○ Number of actual elements currently stored in
the list.
○ The maximum number of elements that could 32 bytes are reserved for storing 4
be stored. object references.
● The key is to provide a means to grow the array A that stores the elements of a list.
● If an element is appended to a list at a time when the underlying array is full, we perform the
following steps:
○ Allocate a new array B with larger capacity.
○ Set where n denotes the current number of items.
○ Set A = B, i.e, we use B as the array supporting the list.
○ Insert the new element in the new array.
Copy
operation
● The motivation for amortized analysis is to better understand the running time of
certain techniques, where standard worst-case analysis provides an overly
pessimistic bound.
● Amortized analysis generally applies to a method that consists of a sequence
of operations, where the vast majority of the operations are cheap, but some
of the operations are expensive. If we can show that expensive operations
are rare, we can change them to the cheap operations, and only bound the
cheap operations.
● Assign an artificial cost to each operation in the sequence, such that the total
of the artificial costs for the sequence of operations bounds the total of the
real costs of the sequence.
● Thus the total amount of cyber-dollars spent for any computation will be
proportional to the total time spent on that computation.
Justification:
● Let’s assume each append operation in S costs 1 cyber-dollar excluding time spent for
growing the array.
● Let’s assume growing the array from size k to size 2k requires k cyber-dollars.
● Let’s charge 3 cyber-dollars for each append operation that does not cause an
overflow.
● There is a profit of 2 cyber-dollars for each insertion that does not grow the array and is
stored in the cell where an element is inserted.
○ Using increase of 2 or 3 cells at a time is slightly better but still leads to quadratic
overall cost.
Justification:
● Let represent the fixed increment in capacity that is used for each
resize event.
● If a container, such as a Python list, provides operations that cause the removal of one
or more elements, greater care must be taken to ensure that a dynamic array
guarantees O(n) memory usage.
● Repeated insertions may cause the underlying to grow arbitrarily large which will no
longer be a proportional relationship between the actual number of elements.
● A robust implementation of such a data structure will shrink the underlying array, on
occasion, while maintaining O(1) amortized bound on individual operations.
Python’s List class
● Running time is proportional to length of the other list ~ O(n) amortized as the first list could
be resized to accommodate additional elements.
● The extend method is preferable to repeated calls to append. It offers three advantages:
○ Built-in python implementations are comparatively efficient as they are implemented
natively in a compiled language (compared to interpreted python code).
○ There is less overhead in calling a single function that accomplishes all the work,
versus many individual function calls.
○ The increased efficiency of extend comes from the fact the resulting size of the
updated list can be calculated in advance.
Constructing New list
● Asymptotic efficiency is linear in length of the
list that is created.
● Length of a string: n
● Length of a second string: m
● Composing Strings
○ Creating new string from another string.
○ Example: We have a large string named ‘document’. Our goal is to create a new string ‘letters’
that contains only alphabetic characters of the original string ( with spaces, numbers and
punctuation removed).
○ Implementation #1
● Implementation #2
● Insertion-sort Algorithm
○ Start with the first element in the array. One element by itself is sorted.
○ Consider the second element. If it is smaller than the first, we swap them.
○ Consider the third element. We swap it leftward until it is in the proper order with the first
two elements.
○ And so on ...
Simple Cryptography plaintext
Encryption
Decryption
● Caesar Cipher: Involves
replacing each letter in a cyphertext
message with the letter
that is a certain number of
letters after it in the
alphabet.