PDSA Week 1
PDSA Week 1
Collaboration
Share your code
Collaborative development
Report your results
Documentation
Interleave with the code
Switch between different versions of code
Export and import your project
Preserve your output
Jupyter notebook
A sequence of cells
Like a one dimensional spreadsheet
Cells hold code or text
Markdown notation for formatting
https://www.markdownguide.org/
Edit and re-run individual cells to update environment
Supports different kernels
Julia, Python, R
We will use it only for Python
Widely used to document and disseminate ML projects
Solutions to problems posed on platforms like Kaggle
https://www.kaggle.org
Won ACM Software Systems Award 2017
Google Colab
Google Colaboratory (Colab)
Colab.research.google.com
Free to use
Similar to jupyter notebook, online
Customized Jupyter notebook
All standard packages required for ML are preloaded
scikit-learn, tensorflow
Access to GPU hardware
Week 1 Page 1
Python Recap - I
03 October 2021 00:43
Computing gcd
gcd(m, n) - greatest common divisor
Largest k that divides both m and n
gcd(8, 12) = 4
gcd(18, 25) = 1
Also hcf - highest common factor
gcd(m, n) always exists
1 divides both m and n
Computing gcd(m, n)
gcd(m, n) <= min(m, n)
Compute list of common factors from 1 to min(m, n)
Return the last such common factor
Code
def gcd(m, n):
cf = [] #List of common factors
for i in range(1,min(m,n)+1):
if (m%i) == 0 and (n%i) == 0:
cf.append(i)
return(cf[-1])
Points to note
Need to initialize cf for cf.append() to work
Variables (names) derive their type from the value they hold
Control flow
Conditionals (if)
Loops (for)
range(i,j) runs from i to j-1
List indices run from 0 to len(l) - 1 and backwards from -1 to -len(l)
Eliminate the list
Since only last element of cf is needed / important
Keep track of most recent common factor (mrcf)
Recall that 1 is always a common factor
No need to initialize mrcf
Code
def gcd(m, n):
for i in range(1,min(m,n)+1):
if (m%i) == 0 and (n%i) == 0:
mrcf = i
return(mrcf)
Efficiency
Both versions of gcd take time proportional to min(m,n)
Can we do better?
Week 1 Page 2
Python Recap - II
03 October 2021 00:59
Checking primality
A prime number n has exactly two factors, 1 and n
Note that 1 is not a prime
Compute the list of factors of n
n is a prime if the list of factors is precisely [1,n]
Code
def factors(n):
fl = [] # factor list
for I in range(1,n+1):
if (n%i) == 0:
fl.append(i)
return(fl)
def prime(n):
return(factors(n) == [1,n])
Counting primes
List all primes up to m
def primesupto(m):
pl = [] # prime list
for i in range(1,m+1):
if prime(i):
pl.append(i)
return(pl)
def firstprimes(m):
(count,I,pl) = (0,1,[])
while (count < m):
if prime(i):
(count,pl) = (count+1,pl+[i])
i=i+1
return(pl)
for vs while
Is the number of iterations known in advance?
Ensure progress to guarantee termination of while
Computing primes
Directly check if n has a factor between 2 and n-1
def prime(n):
result = True
Week 1 Page 3
result = True
for i in range(2,n):
if (n%i) == 0:
result = False
return(result)
def prime(n):
result = True
for i in range(2,n):
if (n%i) == 0:
result = False
break # Abort loop
return(result)
def prime(n):
(result,i) = (True,2)
while (result and (i < n)):
if (n%i) == 0:
result = False
i=i+1
return(result)
Import math
def prime(n):
(result,i) = (True,2)
while (result and (i < math.sqrt(n))):
if (n%i) == 0:
result = False
i=i+1
return(result)
Properties of primes
There are infinitely many primes
How are they distributed?
Twin primes: p, p + 2
In general, 2^k - 1 and 2^k + 1
Odd in general
Twin prime conjecture
There are infinitely many twin primes?
Compute the differences between primes
Use a dictionary
Key - difference
Value - frequency
Start checking from 3, since 2 is the smallest prime
Week 1 Page 4
Start checking from 3, since 2 is the smallest prime
def primediffs(n):
lastprime = 2
pd = {} # Dictionary for prime differences
for i in range(3,n+1):
if prime(i):
d = i - lastprime
lastprime = I
if d in pd.keys():
pd[d] = pd[d] + 1
else:
pd[d] = 1
return(pd)
Week 1 Page 5
Python Recap - III
03 October 2021 01:27
Computing gcd
Can we do better?
Till now, the process is like kind of brute force for gcd. This is called naïve approach.
def gcd(m,n):
(a,b) = (max(m,n),min(m,n))
if a%b == 0:
return(b)
else:
return(gcd(b,a-b))
Euclid's algorithm
Suppose n does not divide m
Then m = qn + r
Suppose d divides both m and n
Then m = ad, n = bd
m = qn + r => ad = q(bd) + r
r must be of the form cd
Euclid's algorithm
If n divides m, gcd(m,n) = n
Otherwise, compute gcd(n,m mod n)
def gcd(m,n):
(a,b) = (max(m,n),min(m,n))
if a%b == 0:
return(b)
else:
return(gcd(b,a%b))
Week 1 Page 6
Exception handling
03 October 2021 10:32
Recovering gracefully
Try to anticipate errors
Provide a contingency plan
Exception handling
Types of errors
Python flags the type of each error
Most common error is a syntax error
SyntaxError: invalid syntax
Not much you can do!
We are interested in errors when the code is running
Name used before value is defined
NameError: name 'x' is not defined
Division by zero in arithmetic expression
ZeroDivisionError: division by zero
Invalid list index
IndexError: list assignment index out of range
KeyError for dictionary keys
Terminology
Raise an exception
Run time error => signal error type, with diagnostic information
NameError: name 'x' is not defined
Handle an exception
Anticipate and take corrective action based on error type
Unhandled exception aborts execution
Handling exceptions
try:
…
… # Code where error may occur
except IndexError:
… # Handle IndexError
except (NameError,KeyError):
… # Handle multiple exxception types
except:
… # Handle all other exceptions
Week 1 Page 7
… # Handle all other exceptions
else:
… # Execute if try runs without errors
Traditional approach
if b in scores.keys():
scores[b].append(s)
else:
scores[b] = [s]
Using exceptions
try:
scores[b].append(s)
except KeyError:
scores[b] = [s]
Flow of control
The error raised anywhere will be passed back. For example, assume a function f(x,y) calls g(x) internally,
and g(x) calls h(x) internally. If IndexError is raised by h(), this error will be passed back to g() and then
g() passes this error to f(). So, using try except codes, we can handle these exceptions easily.
Week 1 Page 8
Classes and Objects
03 October 2021 12:15
Abstract datatype
Stores some information
Designated functions to manipulate the information
For instance, stack: last-in, first-out, push(), pop()
Class
Template for a data type
How data is stored
How public functions manipulate data
Object
Concrete instance of template
Example: 2D points
A point has coordinates (x, y)
__init__() initializes internal values x, y
First parameter is always self
Here, by default a point is at (0, 0)
Translation: shift a point by (delta x, delta y)
(x, y) => (x + deltax, y + deltay)
Distance from the origin
d = sqrt(x^2 + y^2)
class Point:
def __init__(self,a=0,b=0):
self.x = a
self.y = b
def translate(self,deltax,deltay):
self.x += deltax
self.y += deltay
def odistance(self):
import math
d = math.sqrt(self.x*self.x + self.y*self.y)
return(d)
Week 1 Page 9
Interface has not changed
User need not be aware whether representation is (x, y) or (r, theta)
import math
class Point:
def __init__(self,a=0,b=0):
self.r = math.sqrt(a*a + b*b)
if a == 0:
self.theta = math.pi/2
else:
self.theta = math.atan(b/a)
def odistance(self):
return(self.r)
def translate(self,deltax,deltay):
x = self.r*math.cos(self.theta)
y = self.r*math.sin(self.theta)
x += deltax
y += deltay
self.r = math.sqrt(x*x + y*y)
if x == 0:
self.theta = math.pi/2
else:
self.theta = math.atan(y/x)
Special functions
__init__() - constructor
__str__() - convert object to string
str(o) == o.__str__()
Implicitly invoked by print()
__add__()
Implicitly invoked by +
__mult__() invoked by *
__lt__() invoked by <
__ge__() invoked by >=
…
Week 1 Page 10
Timing our code
03 October 2021 12:59
import time
start = time.perf_counter()
…
# Execute some code
…
end = time.perf_counter()
elapsed = end - start
A timer object
Create a timer class
Two internal values
_start_time
_elapsed_time
start starts the timer
stop records the elapsed time
More sophisticated version in the actual code
Python executes 10^7 operations per second where C++ can be even faster with 10^8 operations per
second.
import time
class Timer:
def __init__(self):
self._start_time = 0
self._elapsed_time = 0
def start(self):
self._start_time = time.perf_counter()
def stop(self):
self._elapsed_time = time.perf_counter() - self._start_time
def elapsed(self):
return(self._elapsed_time)
Week 1 Page 11
Why Efficiency matters?
03 October 2021 13:07
How long will the validation process take with nested loop?
M SIM cards, N Aadhar cards
Nested loops iterate M*N times
What are M and N
Almost everyone in India has an Aadhar card: N > 10^9
Number of SIM cards registered is similar: M > 10^9
Assume M = N = 10^9
Nested loops execute 10^18 times
We calculated previously that Python can perform 10^7 operations in a second
This takes at least 10^11 seconds
10^11 / 60 = 1.6667E9 minutes
1.6667E9 / 60 = 2.7778E7 hours
2.7778E7 / 24 = 1.1574E6 days
1.1574E6 / 365 = 3,170.9589 years!
How can we fix this?
Guess my birthday
You propose a date
I answer, Yes, Earlier, Later
Suppose my birthday is 12 April
A possible sequence of questions
September 12? Earlier
February 23? Later
July 2? Earlier
…
What is the best strategy?
Interval of possibilities
Query midpoint - halves the interval
June 30? Earlier
March 31? Later
May 15? Earlier
April 22? Earlier
April 11? Later
April 16? Earlier
April 13? Earlier
April 12? Yes
Instead of 365 iterations all over the year, just 8 iterations solved the problem, by halving the size of
iterable.
Week 1 Page 12
Back to Aadhar and SIM cards
Assume Aadhar details are sorted by Aadhar number
Use the halving strategy to check SIM card
Halving 10 times reduces the interval by a factor of 1000, because 2^10 = 1,024
After 10 queries, interval shrinks to 10^6
After 20 queries, interval shrinks to 10^3
After 30 queries, interval shrinks to 1
Total operations = 10^9 * 30 = 3E10
Time = 100 * 30 = 3,000 seconds = 50 minutes
From 3200 years to 50 minutes!
Of course, to achieve this, we have to first sort the Aadhar cards
Arranging the data results in a much more efficient solution
Both algorithms and data structures matter
Week 1 Page 13
Programming Assignments
06 October 2021 22:26
PPA 1
Twin primes are pairs of prime numbers that differ by 2. For example (3, 5), (5, 7), and (11,13) are twin
primes.
Write a function Twin_Primes(n, m) where n and m are positive integers and n < m , that returns all
unique twin primes between m and n (both inclusive). The function returns a list of tuples and each
tuple (a,b) represents one unique twin prime where n <= a < b <= m.
Code:
def is_prime(x):
if x == 1:
return False
elif x == 2:
return True
else:
prime = True
for i in range(2,x):
if x%i == 0:
prime = False
break
return prime
def Twin_Primes(n,m):
tp = []
for i in range(n,m+1):
if is_prime(i) and is_prime(i+2) and i+2 <= m:
tp.append((i,i+2))
return tp
n=int(input())
m=int(input())
print(sorted(Twin_Primes(n, m)))
PPA 2
Week 1 Page 14
Code:
class Triangle:
def __init__(self,a,b,c):
self.a = a
self.b = b
self.c = c
def is_valid(self):
if self.a + self.b > self.c and self.a + self.c > self.b and self.b + self.c > self.a:
return "Valid"
else:
return "Invalid"
def Side_Classification(self):
if self.is_valid() == "Invalid":
return "Invalid"
else:
if self.a == self.b == self.c:
return "Equilateral"
elif self.a == self.b or self.b == self.c or self.c == self.a:
return "Isosceles"
else:
return "Scalene"
def Angle_Classification(self):
if self.is_valid() == "Invalid":
return "Invalid"
else:
sides = sorted([self.a,self.b,self.c])
if sides[0]**2 + sides[1]**2 > sides[2]**2:
return "Acute"
elif sides[0]**2 + sides[1]**2 == sides[2]**2:
return "Right"
else:
return "Obtuse"
def Area(self):
if self.is_valid() == "Invalid":
return "Invalid"
else:
s = (self.a+self.b+self.c)/2
return (s*(s-a)*(s-b)*(s-c))**0.5
Week 1 Page 15
return (s*(s-a)*(s-b)*(s-c))**0.5
a=int(input())
b=int(input())
c=int(input())
T=Triangle(a,b,c)
print(T.is_valid())
print(T.Side_Classification())
print(T.Angle_Classification())
print(T.Area())
GrPA 1
Code:
def find_Min_Difference(L,P):
l = sorted(L)
m = 9999999999999
for i in range(len(l)-P+1):
c = abs(l[i]-l[i+P-1])
if c < m:
m=c
return m
L=eval(input().strip())
P=int(input())
print(find_Min_Difference(L,P))
Solution:
def find_Min_Difference(L,P):
L.sort()
N=P
M = len(L)
min_diff = max(L) - min(L)
for i in range(M-N+1):
Week 1 Page 16
for i in range(M-N+1):
if L[i+N-1] - L[i] < min_diff:
min_diff = L[i+N-1] - L[i]
return min_diff
L=eval(input().strip())
P=int(input())
print(find_Min_Difference(L,P))
GrPA 2
Code:
def is_prime(n):
if n == 1:
return False
elif n == 2:
return True
else:
for i in range(2,n):
if n%i == 0:
return False
return True
def primes(n):
l = []
for i in range(2,n+1):
if is_prime(i):
l.append(i)
return l
def Goldbach(n):
l = primes(n)
result = []
for i in l:
Week 1 Page 17
for i in l:
for j in l:
if i + j == n and (i,j) not in result and (j,i) not in result:
result.append((i,j))
return result
n=int(input())
print(sorted(Goldbach(n)))
Solution:
def prime(n):
if n < 2:
return False
for i in range(2,n//2+1):
if n%i==0:
return False
return True
def Goldbach(n):
Res=[]
for i in range((n//2)+1):
if prime(i)==True:
if prime(n-i)==True:
Res.append((i,n-i))
return(Res)
n=int(input())
print(sorted(Goldbach(n)))
GrPA 3
Code:
def odd_one(L):
d = {int:0,str:0,bool:0,float:0}
for i in L:
d[type(i)] += 1
for i in d:
if d[i] == 1:
return str(i)[8:-2]
print(odd_one(eval(input().strip())))
Solution:
def odd_one(L):
P = {}
Week 1 Page 18
P = {}
for elem in L:
if type(elem) not in P:
P[type(elem)] = 0
P[type(elem)] += 1
for key, value in P.items():
if value == 1:
return key.__name__
print(odd_one(eval(input().strip())))
Week 1 Page 19
Assignments
06 October 2021 22:33
Practice Assignment
1. If n is a positive integer then which of the following statement is correct about function check ?
a. 2
b. 5
c. 4
Week 1 Page 20
d. A-(ii) B-(iii) C-(iv) D-(i)
a.
b.
c.
d.
Accepted Answers:
5. Which of the following options will validate whether n is a perfect square or not? Where n is a
positive integer. [MSQ]
a.
b.
c.
Week 1 Page 21
d.
Accepted Answers:
Graded Assignment
1. S is a non-empty string of English letters without any space. What fun(S) will return after
execution of the above code?
c. Total number of letters that are repeated in the string S more than one time.
d. Difference of total letters in the string S and distinct letters in the string S .
2. Which of the following is/are valid reason(s) for NameError exception? [MSQ]
a. Variable is not defined.
Week 1 Page 22
b. Calling a function before declaration.
Accepted Answers:
(Type: Numeric) 3
4. What will be the output of the above code-snippet?
a. Syntax error
b. 2 1
c. 0 3 1
d. None of these
a. Good morning
d. Good
a.
Week 1 Page 23
b.
c.
d.
Accepted Answers:
7. Given above is a function that checks whether a list satisfies some property. There is an error in
this function. Select the list(s) L = [n1, n2, n3] , where n1 , n2 and n3 are all integers, for which
special3Bad(L) produces a ZeroDivisionError exception. [MSQ]
a. L = [4, 2, 8]
b. L = [4, 2, 4]
c. L = [8, 4, 16]
d. L = [48, 6, 36]
e. L = [44, 6, 36]
8. Given above is a function to check whether a list is a palindrome. There is an error in this function.
Select the list(s) L = [n1, n2,..., n2, n1] , for which isSymmetricBad(L) produces an IndexError
exception. [MSQ]
Week 1 Page 24
a. L = [1, 2, 3, 4, 3, 2, 1]
b. L = [2, 2, 2, 2, 2, 2]
c. L = [1, 1, 1, 1, 1, 1, 1]
d. L = [8]
e. L = [2, 4, 6]
Accepted Answers:
(Type: Numeric) 3
10. Which of the following option(s) is/are correct about the given code? [MSQ] count , name and
course are object variables.
a. name and course are class variable and count is an object variable.
b. name and course are object variables, and count is a class variable.
Week 1 Page 25