Python Tuitorial IIT Bombay
Python Tuitorial IIT Bombay
Python Tuitorial IIT Bombay
This is an online textbook on Python and is a companion resource to the course Programming in
Python offered as a part of the Online Degree Program, IIT-Madras. For more details about the
course, check out our website.
Note to students: This book is meant to be used as a reference. You may find content that has not
been covered in the video lectures. Likewise, there may be some content that is present in the
lectures which is not covered here. Additional content appearing in this book will not be considered
for grading. In summary, please refer to chapters that you feel are relevant for the course. But you
are under no obligation to read the entire book cover to cover. Interested students are always
welcome to read the entire thing!
Programming in Python
Chapter-0: Warm-up
Chapter-1: Introduction to Python
Chapter-2: Conditionals
Chapter-3: Loops
Chapter-4: Functions
Chapter-5: Lists and Tuples
Chapter-6: Dictionaries and Sets
Chapter-7: File Handling
Chapter-0: Warm-up
Lesson-0
Lesson-1.2
Operators
Convention
Expressions
Type of Expressions
Lesson-1.3
Arithmetic Expressions
Boolean Expressions
Lesson-1.4
Replit Editor
Errors | Debugging
Exceptions
Wrong Code Snippets
Lesson-1.5
Strings
Quotes
Length
Operations
Escape Characters
Substrings
Lesson-1.6
Strings
Indexing
Slicing
Immutability
Methods
Chapter-2: Conditionals
Lesson-2.1
Lesson-2.2
Input
Type Conversion | Built-in Functions
Lesson-2.3
Conditional Statements
if
if-else
if-elif-else
Nested Conditional Statements
Lesson-2.4
System libraries
calendar
time
this
Chapter-3: Loops
Lesson-3.1
while loop
break , continue
Lesson-3.2
for loop
range()
Iterating through Strings
Lesson-3.3
Nested loops
while versus for
end
sep
Lesson-3.4
Formatted printing
f-strings
format()
Format Specifiers
Lesson-3.5
System libraries
math
random
Lesson-3.6
Limits
Recurrence relations
Rational approximation
Chapter-4: Functions
Lesson-4.1
Introduction
Examples
Lesson-4.2
Arguments
Positional Arguments
Keyword Arguments
Default Arguments
Call by Value
Lesson-4.3
Scope
Local
Global
Namespaces
locals
globals
Scope and Namespaces
Lesson-4.4
Recursion
Caution in Recursion
Lists
Introduction
Iterating through Lists
Growing a List
Operations on Lists
Useful Functions
Lesson-5.2
Lists
Mutability
Call by Reference
Lesson-5.3
Lists
Simulating an IPL Innings
Lesson-5.4
Lists
List Methods
Stack and Queue
Lesson-5.5
Lists
Nested Lists
Matrices
Shallow and Deep Copy
Lesson-5.6
Tuples
Introduction
More on Tuples
Lists and Tuples
Packing and Unpacking
Dictionaries
Introduction
Examples
Iterating over Dictionaries
Growing a Dictionary
Mutability
Lesson-6.2
Text Processing
Number of Sentences
Number of Words
Number of Unique Words
Frequent Words
Lesson-6.3
Dictionaries
Lesson-6.4
Assignment Model
Submission Model
Grader
Lesson-6.5
Sets
Introduction
Iterating over Sets
Growing Sets
Set Operations
File Handling
Why Files
File Handling
Lesson-7.2
File Handling
Lesson-7.3
File Handling
File object analogy
Mode
Lesson-7.4
File Handling
File methods
read
readline
readlines
write
writelines
Lesson-7.5
File Handling
CSV files
Home Lesson-1.1
Lesson-0
Lesson-0
Why learn Python?
Lessons
Organization
How to read these lessons?
Python Version
Setting up Replit
History
Explore
Around 66% of the 65,000 developers who responded to the survey are currently developing with
Python and have expressed interest in continuing to develop with it. Another strong reason to
learn Python is that it lets us create beautiful things such as this animation:
Thanks to Manim Community for the source code. The code that was used to render this
animation can be found here.
Being able to create something like this is the end goal of this course. Musicians create music;
musical instruments are their tools. Painters create paintings; the brush and the canvas are their
tools. Coders create software; programming languages are their tools. Python is one of the most
versatile and accessible languages. We will start from the basics and systematically cover the
important aspects of the language.
Lessons
Organization
This web resource is organized as a sequence of lessons. Lessons will be numbered as
<chapter>.<lesson> . Each chapter will have about four lessons. These lessons are best read in
the sequence in which they appear, starting from chapter-1 and going all the way up to chapter-
12. If you are already familiar with Python, then have a look at the Table of Contents in the home
page and jump into the lesson that seems least familiar.
Each chapter introduces one important programming concept in Python. This will be that
chapter's title. This doesn't mean that all the lessons in the chapter will focus on only that
particular concept. For example, chapter-2 introduces the idea of conditionals, but built-in
functions and Python's standard libraries also feature in the same week.
Programming courses are among the few courses where the learner has an upper hand over
instructors. No one can trick you. Code does not lie. All that is demanded of you is to make an
effort to execute every snippet of code that you see in these lessons.
Python Version
We will be using Python-3.8 or higher throughout these lessons. If some of you are already
familiar with Python and are used to Python-2, it is strongly recommended that you shift to
Python-3. This is not an arbitrary choice as Python-2 has reached the end of its life.
Setting up Replit
Replit is an online environment where we can write code. It is an ideal place to learn
programming and we will be using it extensively in this course. Head to http://www.replit.com/
and sign up using your Online Degree account. Replit provides an excellent tutorial to get you
started.
History
Python first appeared on the programming landscape 30 years ago, in February 1991. It was
created by a Dutch programmer, Guido van Rossum. He served as the “benevolent dictator for
life” of Python’s development until 2018, when he stepped down from the post.
Image-Source: Wikipedia
A popular question that gets asked often is how the language got its name. This is the answer
from the official Python documentation:
When he began implementing Python, Guido van Rossum was also reading the published
scripts from “Monty Python’s Flying Circus”, a BBC comedy series from the 1970s. Van
Rossum thought he needed a name that was short, unique, and slightly mysterious, so he
decided to call the language Python.
Python is 30 years old. Programmers who boarded the Python-bus 30 years back lovingly talk of it
as though it were a friend. This is not an exaggeration! This is a language that has been built by
people like you and me, and is being used by thousands of people around the globe. Let us jump
in with an open mind and see what it has to offer!
Explore
1. Check out the website of the Python Software Foundation and get to know more about the
organization behind Python.
2. Have a look at this interesting interview of Guido Van Rossum. This is a blog maintained by
Dropbox. Another trivia: Guido worked at Dropbox for six and a half years.
3. Try to watch documentaries and interviews on the web where Guido talks about how Python
came into existence. It is always good to know about some non-technical aspects of the
language, such as its history and something about the people who were behind its
development. It gives a humanistic flavor to technology. We often forget that a lot of
software is written by humans, for humans.
4. In the next few weeks to come, StackOverflow might become the most visited website by
most of you. Some of you might be familiar with it, but for the others, StackOverflow is a
question-answer forum for programming related questions. It is extremely popular not just
among beginners but even experienced developers. Do check it out, but use it wisely. Refrain
from using it to get answers to assignment questions; you won't learn anything that way.
Home Lesson-1.2
Lesson-1.1
Lesson-1.1
Python shell | Replit Console
Prompts
Output
Emojis
Literals and Variables
Basic Data Types | type()
Integer
Float
String
Boolean
Comments
In Replit, this corresponds to the console screen on the right of the repl. This will be our
playground for quite sometime:
Prompts
The orange symbol that is displayed above is called a prompt. Its role is similar to that of the
blinking cursor while editing documents. It is an invitation to type code. Code that is typed at the
prompt is executed by the interpreter. In these lessons, we will use the following symbol to refer
to the prompt: >>> .
Fire up a repl and type the code in the console. You should be getting the output on the next line.
Output
Let us take a closer look at the first line of code that we wrote. print is called a built-in function
in Python. A function is an object that accepts inputs and returns outputs. The term built-in refers
to the fact that this function is something that is readily provided by Python for our use.
The object inside the parenthesis of the print function is called a string. A string is a sequence of
characters enclosed in quotes. Strings can either be in single quotes or double quotes. However, a
single quote can't be matched against a double quote to enclose a string. We have used single
quotes in line 1 and double quotes in line 3. Both lines give identical outputs. The ability to use
both single quotes and double quotes comes in handy in situations like this:
Run the code given above and observe the output. print can also be used to print numbers:
1 >>> print(1)
2 1
3 >>> print(2.0)
4 2.0
Multiple items can be printed on the same line in the following way:
1 >>> print(1, 2)
2 1 2
3 >>> print('online', 'degree', 'program')
4 online degree program
Note the presence of a space between successive elements. If the print command is called
without passing any input to it, then it prints a blank line:
1 >>> print()
2
3 >>>
What happens if we just use type print without having the parenthesis?
1 >>> print
2 <built-in function print>
We don't get an error. Instead, the message is that print is a built-in function. But the following
code throws an error:
The interpreter hits back with a SyntaxError . Think about the syntax like the grammar of human
languages. In the code given above, we have missed the parentheses. The fourth lesson will take
up this issue in greater detail.
Emojis
Before we jump into the serious stuff, let us try and print some emojis!
Try this out in your repl! A full list of emojis can be found here.
1 >>> x = 1
2 >>> print(x)
3 1
4 >>> y = 'a string'
5 >>> print(y)
6 a string
7 >>> foo_bar = 123.456
8 >>> print(foo_bar)
9 123.456
Integer
Float
String
Boolean
Integer
The int type represents integers. Python provides a command called type to determine the
type of an object:
1 >>> print(1)
2 1
3 >>> type(1)
4 <class 'int'>
Float
The float type represents real numbers:
1 >>> print(1.0)
2 1.0
3 >>> type(1.0)
4 <class 'float'>
1 >>> print(1.)
2 1.0
String
The str type represents strings:
1 >>> print('one')
2 one
3 >>> type("one")
4 <class 'str'>
Boolean
The bool type represents boolean values:
1 >>> print(True)
2 True
3 >>> type(False)
4 <class 'bool'>
Please note that bool values are case sensitive. That is, true and false are not bool values.
Comments
A comment is a line of text that is not executed by the interpreter. Comments begin with the #
symbol. The following are comments:
As line-2 is a comment, 1 is not printed in the next line. Comments can also come at the end of a
line of code:
Adding comments is one of the ways to make code more readable. Its use will become clear in
subsequent chapters.
Home Lesson-1.3
Lesson-1.2
Lesson-1.2
Operators
Arithmetic
Relational
Logical
Convention
Expressions
Type of Expressions
Arithmetic Expressions
Boolean Expressions
Operators
Arithmetic
The anatomy of an operation is given below:
The following table gives the symbols for arithmetic operators and the operations that they
correspond to:
Operator Operation
+ Addition
- Subtraction
* Multiplication
/ Division
// Floor division
% Modulus
** Exponentiation
All the operators in the above table are binary, i.e., they operate on two operands. Let us now
take a look at each operator:
1 >>> 10 + 5
2 15
3 >>> 10 - 5
4 5
5 >>> 10 * 5
6 50
7 >>> 10 / 5
8 2.0
9 >>> 10 // 5
10 2
11 >>> 10 % 5
12 0
13 >>> 10 ** 5
14 100000
The last three operators might be new. In more familiar terms, these are the mathematical
operations that they correspond to:
// is called the floor division operator. x // y gives the quotient when x is divided by y .
For example, 8 // 3 is 2 .
% is called the modulus operator. x % y gives the remainder when x is divided by y . For
example, 10 % 3 is 1 .
** is called the exponentiation operator. x ** y returns .
/ and // are two different operators. / gives the complete result of division, while // returns
the quotient. For example, 5 / 2 results in 2.5 while 5 // 2 gives 2 . There are two more
arithmetic operators of interest to us, unary plus and unary minus. These are the + and - signs.
Unlike the operators that we have seen so far, these two are unary operators, i.e., they operate on
one operand. For example:
1 >>> - 2
2 -2
3 >>> + 2
4 2
It is important to note that the symbols for plus and minus operators are the same as the ones
for addition and subtraction. The context determines the nature of the operator:
1 >>> - 1 # unary minus
2 -1
3 >>> 1 - 1 # subtraction operator
1 >>> 1 - - 1
2 2
3 >>> # The minus on the left is subtraction
4 >>> # The minus on the right is unary minus
In all the operations that we have seen so far, the operands have been literals. In general, the
operands can also be variables:
1 >>> x = 1
2 >>> y = x * 5
3 >>> print(x, y)
4 1 5
Relational
The following table gives the symbols for relational operators and the operations that they
correspond to:
Operator Operation
== double equal to
!= not equal to
All the operators in the above table are binary. Let us now take a look at each of them:
1 >>> 10 > 5
2 True
3 >>> 10 < 5
4 False
5 >>> 10 >= 5
6 True
7 >>> 10 <= 5
8 False
9 >>> 10 == 5
10 False
11 >>> 10 != 5
12 True
Relational operators are also called comparison operators. The result of any comparison
operation is a boolean value: True or False . The result of a comparison operation can be
assigned to a variable:
1 >>> x = 10
2 >>> y = 15
3 >>> z = y > x
4 >>> print(z)
5 True
The == symbol corresponds to the equality operator and should not be confused with = , the
assignment operator.
Logical
The following table gives the logical operators and the operations that they correspond to:
Operator Operation
not negation
or logical disjunction
and and or are binary operators; not is a unary operator. Let us now take a look at each of
them:
1 >>> x = True
2 >>> not x
3 False
4 >>> x = False
5 >>> not(x)
6 True
Convention
Consider the following lines of code:
1 >>> print(1 + 2)
2 3
3 >>> print(1+2)
4 3
Both lines 1 and 3 give the same output. Line-1 has a space before and after the + operator,
while line-3 doesn't. Both ways are syntactically correct. In this course, we will be following the
first convention: there is always a space separating the operator from the operands. This is also
true for the = operator.
Expressions
An expression is some combination of literals, variables and operators. For example, the following
are expressions:
1 + 4 / 4 ** 0
x / y + z * 2.0
3 > 4 and 1 < 10
not True and False
Each expression evaluates to some value. This value has a type. In the above examples, the first
two expressions result in a float , while the next two expressions result in a bool . In the next
few sections, we shall study two types of expressions:
Type of Expressions
Arithmetic Expressions
Let us now look at the type of simple arithmetic operations. In mathematics, the result of adding
two integers is another integer. Is this true in the case of Python? First, let us execute the
following statement in the interpreter and see what we get:
1 >>> 1 + 2
2 3
The way to check the type of this expression is to use the type() function. For example, we have:
1 >>> 1 + 2
2 3
3 >>> type(1 + 2)
4 <class 'int'>
So far the interpreter's behaviour conforms to our intuition. Let us now change this code slightly:
1 >>> 1.0 + 2
2 3.0
3 >>> type(1.0 + 2)
4 <class 'float'>
We see that the result is 3.0 which is of type float . The conclusion is that float is more
dominant than int as far as the addition operation is concerned. What about other operations?
Let us check with the help of the following examples:
1 >>> type(7.0 * 5)
2 <class 'float'>
3 >>> type(7.0 / 5)
4 <class 'float'>
5 >>> type(7.0 // 5)
6 <class 'float'>
7 >>> type(7.0 ** 5)
8 <class 'float'>
9 >>> type(7.0 % 5)
10 <class 'float'>
All the operations result in a float . From this we see that float is more dominant than int ,
irrespective of the operator involved.
Boolean Expressions
Expressions that involve a relational operator will result in a bool . For example:
1 >>> 2 > 1
2 True
3 >>> type(2 > 1)
4 <class 'bool'>
Expressions that involve logical operators will naturally result in a bool . For example:
One way to analyze the outcome of boolean expressions that involve variables is to exhaustively
list down the different combinations of values that variables can take and evaluate the expression
for each such combination. For example, assume that X and Y are two boolean variables. Now,
consider the following expression:
1 >>> X or Y
We can take the help of a concept called truth table to analyze the outcomes:
X Y X or Y
Home Lesson-1.4
Lesson-1.3
Lesson-1.3
Arithmetic Expressions
Precedence
Order
Boolean expressions
Precedence and Order
Beware of float !
Short Circuit Evaluation
Arithmetic Expressions
Precedence
Let us start looking at arithmetic expressions that involve multiple operators:
1 >>> 4 // 2 - 1
2 1
(4 // 2) - 1 = 2 - 1 = 1
4 // (2 - 1) = 4 // 1 = 4
Clearly, we see that the interpreter is following the first way. When an expression has different operators, the
interpreter has to make a decision about the way the expression is to be parenthesized, i.e., which operator
takes precedence over the others. From the above example, we see that the floor division operator ( // ) has
greater precedence than the subtraction operator ( - ).
In general, the following table describes the precedence rules for operators. Those with higher precedence
come at the top of the table. Operators in a given cell have the same precedence. For example, + and - have
same precedence.
1 >>> 3 ** 2 * 4 - 4
2 32
Going by the precedence rules, we apply the parenthesis in the following sequence:
1. (3 ** 2) * 4 - 4
2. ((3 ** 2) * 4) - 4
Order
Consider the following example:
1 >>> 3 - 2 + 1
2 2
(3 - 2) + 1 = 1 + 1 = 2
3 - (2 + 1) = 3 - 3 = 0 .
The interpreter is following the first way. Does this mean that subtraction has greater precedence than
addition? No, we just saw that they have the same precedence! We have to be careful here. Python evaluates
expressions from left to right. There are two exceptions to this rule, the ** and = operator, both of which
are evaluated from right to left. We shall return to this in a while.
1 >>> 4 - 3 - 1
2 0
(4 - 3) - 1 = 1 - 1 = 0
4 - (3 - 1) = 4 - 2 = 2
The first way is the one followed by the interpreter. Going back to the evaluation order followed by Python, we
see that this expression is evaluated from left to right.
1 >>> 8 % 4 % 2
2 0
Run the following code in the interpreter. Which of the following parenthesizations matches the expression
given above? This is left as an exercise for you to try out.
1 >>> (8 % 4) % 2
2 >>> 8 % (4 % 2)
1 >>> 2 ** 3 ** 0
2 2
(2 ** 3) ** 0
2 ** (3 ** 0)
The interpreter is following the second way, i.e., the statement is being executed from right to left. This kind of
execution happens only in the case of the exponentiation operator and the assignment operator.
Boolean expressions
The simplest example of an expression that results in a boolean value is given below:
1 >>> 1 > 0
2 True
3 >>> type(1 > 0)
4 <class 'bool'>
The following expression conveys the fact that 3.14 lies between 3 and 4:
Clearly, the interpreter is following the first parenthesization. This is in accordance with the precedence rule
for logical operators. The evaluation order is from left to right. But we will return to this in more detail in the
section on short circuit evaluation. Another example, this time with and and or :
According to the precedence rules, and has greater precedence than or . So, the second way is the one
followed by Python.
Beware of float !
Execute the following expression in the interpreter:
This seems surprising! 10.00000000000000000000001 > 10 is a perfectly valid mathematical statement that
evaluates to True . The reason this returns False in Python has to do with the way floating point numbers
are represented. Python, and programming languages in general, do not support arbitrary precision for
representing real numbers. When the number cannot be represented exactly, an approximate value is
returned. As a result of this behaviour, we should be careful when using float values in expressions that
involve comparisons. Another example:
The above expression presents a typical case of approximation when dealing with float . The number 0.1 **
1000 is extremely small. So, the interpreter is going to represent that as 0. One more example follows:
Let us see what is happening here by starting with the expression to the left of the == operator:
1 >>> 0.1 * 3
2 0.30000000000000004
The problem is with the way 0.1 is represented in binary - it has a non-terminating, recurring sequence of
bits after the decimal point. As the computer uses a finite number of bits to represent data, this sequence will
be truncated at some stage. This results in an approximate representation of 0.1 . For a more detailed
explanation, refer to this resource.
1 >>> 1 / 0
2 Traceback (most recent call last):
3 File "<stdin>", line 1, in <module>
4 ZeroDivisionError: division by zero
Division by zero is not allowed, and the interpreter promptly hits back with an error message. This is not
surprising. But what is surprising is the following statement:
1 >>> True or (1 / 0)
2 True
The expression is evaluated from left to right. The operator is or . Since the operand on the left is True , the
whole expression will evaluate to True irrespective of the operand on the right. So, the interpreter skips
evaluating the operand on the right. This behaviour is called short circuit evaluation.
Let us break this down using the diagram given below. The arrows on the left give us an idea of the expression
that has to be evaluated first. If we keep following the arrows, the last expression in this image on the bottom-
left is the first to be evaluated. By following the arrows on the right, we can see that the two offending
expressions - 5 / 0 and 10 / 0 - are never evaluated.
Home Lesson-1.5
Lesson-1.4
Lesson-1.4
Replit Editor
Errors
Introduction
Debugging
Exceptions
Wrong Code Snippets
Replit Editor
We have been using Replit's console to type code so far. We will now move to the editor in Replit. The
advantage of using the editor is that code typed there gets automatically saved. Think about it like Google
Docs for code. The window to the left of the console is the editor. After typing code, click on the green Run
button on the top. The output of the code will appear in the console on the right. We will completely shift to
the editor from now. This means dropping the prompt symbol before each line of code.
Errors
Introduction
Enter the following line of code in the editor and run it:
1 print('123)
Lines 1-4 represent an error message. It is the interpreter's way of warning us that there is something wrong
with the code. Error messages usually come with some information about the errors which helps us
understand what has gone wrong. In this case, we have a SyntaxError , i.e., something is wrong with the
syntax. The remaining part of the message gives the details:
There was an issue with the end of line while scanning a string literal. The ^ sign points to the place where the
error occurred. This acts as a visual aid while trying to trace the error. Going back to the code, the ending
quote ' is missing. We can now fix it:
1 print('123')
Debugging
Among software professionals, there is an alternative term used for errors in code: bugs. The process of fixing
bugs is called debugging. The process usually works as follows:
Now, the coder closes the loop by running the code again. If the code throws another error, the whole process
repeats.
Debugging code forms an important part of programming. While working on huge codebases it might take
several hours or even days to fix bugs.
Exceptions
We now move to a different set of errors. These are no longer syntax errors. Let us take an example:
1 1 / 0
The error message is quite clear here. We are trying to divide by zero and the interpreter is rightly objecting to
it. This is called a ZeroDivisionError . But in terms of syntax, there is no error here. Such errors that are
detected by the interpreter during the program's execution are called exceptions. We will keep returning to
the idea of exceptions in every chapter. Let us look at another exception:
1 1 + 'one'
Notice that the first line in the code gets printed correctly. The interpreter starts executing the code from top
to bottom. The first line is error free. It is the second line which has a problem. NameError occurs when we try
to reference a variable that has not been defined before. We will look at what referencing a variable means in
the next chapter.
Home Lesson-1.6
Lesson-1.5
Lesson-1.5
Strings
Quotes: single, double and triple
Length
Operations on strings
Concatenation
Replication
Comparison
Escape characters
Substrings
Strings
Quotes: single, double and triple
We briefly looked at strings in the first lesson. A string is any sequence of characters enclosed
within single or double quotes. Some examples:
1 "this is a string"
2 'this is also a string'
3 '1 + 1 = 2'
4 "!, ?, _, @ are special characters"
5 "if you need to use apostrophe ('), you can use double quotes"
It is a good practice to stick to either single or double quotes when using strings. Interestingly,
Python also supports triple quotes ''' , especially for multi-line strings, i.e., strings that span
multiple lines. Let us say that we want the following lines to be captured in a single string:
1 first line
2 second line
3 third line
1 x = 'first line
2 second line
3 third line'
4 print(x)
This is where ''' comes in:
1 x = '''first line
2 second line
3 third line'''
4 print(x)
After executing the above code, head to the console and type x . You will see the following
output:
The \n character that you see above is called a newline character. Head to the section on escape
characters in this lesson to know more about them.
Length
The length of a string is the number of characters in it. Python provides a built-in function called
len to find the length of a string:
1 x = 'good'
2 print(len(x))
The code given above will give 4 as the output. If you are familiar with other programming
languages, such as C, you might be aware of a character data type. Python doesn't have a
separate data type for characters. A character in Python is represented by a string of length 1. In
the following examples, x and y are strings of length 1.
1 x = 'a'
2 y = 'b'
1 x = ''
2 print(len(x))
Operations on strings
Concatenation
We can concatenate two strings using the + operator. Concatenation is just a fancy term for
joining two strings together:
1 string1 = 'first'
2 string2 = ','
3 string3 = 'second'
4 string4 = string1 + string2 + string3
5 print(string4)
1 first,second
Replication
We can make multiple copies of a string and string them all together using the * operator:
1 s = 'good'
2 five_s = s * 5
3 print(five_s)
1 goodgoodgoodgoodgood
The * operator has made the string look too good! This is a fine demonstration of that ancient
adage: "multiplication is repeated addition":
1 s = 'good'
2 s * 5 == s + s + s + s + s # This expression evaluates to True
Comparison
1 x = 'python'
2 print(x == 'python', x == 'nohtyp')
1 True False
Two strings are equal if and only if both of them represent exactly the same sequence of
characters. Now, consider the following lines of code:
It is clear from the above examples that the length of the string is not a metric used by Python to
compare strings. Instead, Python uses the familiar alphabetical ordering to compare two strings.
More precisely it employs what is known as lexicographic ordering:
Lexicographic ordering
The first characters from the two strings are compared. If they differ this determines the
outcome of the comparison. If they are equal, then the second character of both the strings
are compared. This process continues until either string is exhausted.
This leads to another question. How does Python compare two characters? The answer is given in
one of Python's official tutorials:
Python’s string type uses the Unicode standard for representing characters, which lets Python
programs work with different possible characters. What is the Unicode standard? Unicode is a
specification that aims to list every character used by human languages and give each character
its own unique code. The Unicode standard describes how characters are represented by code
points. Another unfamiliar term. What is a code point? A code point value is an integer.
Lexicographical ordering for strings uses the Unicode code point number to order individual
characters.
Python provides a built-in function called ord that returns the code point of any given character.
For example:
1 print(ord('a'), ord('b'))
2 print(ord('a'), ord('A'))
1 97 98
2 97 65
Now, we clearly see why 'a' < 'b' returns True . This is because the code point for 'a' and
'b' are 97 and 98 respectively. As 97 < 98, 'a' < 'b' . We can also infer that 'A' < 'a' should
return True .
Escape characters
In Python, the backslash - \ - is called the escape character. One of its uses is to represent
certain white-space characters such as tabs and newlines. We will look at them one by one using
the following examples:
1 x = '\n'
2 print(len(x))
1 print('a\tb')
1 a b
There is also a way to escape the quotes: \' . This can come in handy when using the apostrophe
symbol in strings with single quotes:
Now remove the backslash from the above string and try to print it. You will be getting an error.
Why do you think that happens?
Substrings
A string is a substring of another string if the first string is contained in the second. For example,
'good' is a substring of 'very good' , whereas 'very good' is not a substring of 'verygood' .
Python provides a keyword - in - which can be used to check if a given string is a substring of
another string. For example:
1 a = 'good'
2 b = 'very good'
3 present = a in b
4 print(present)
5 not_present = b in a
6 print(not_present)
1 True
2 False
in is a powerful keyword which has several other uses. It can also be used along with not in the
following manner:
1 a = 'abc'
2 b = 'ab'
3 print(a not in b)
1 True
Home Lesson-2.1
Lesson-1.6
Lesson-1.6
Strings
Indexing
Slicing
Immutability
Methods
Strings
We looked at string operations in the previous lesson. A quick recap of what we have seen so far:
In this lesson, we will explore the sequential nature of strings. This will also serve as an
introduction to lists in Python. In addition, we will also look at string methods.
Indexing
A string is a sequence of characters. Sequences support indexing. What do we mean by that?
Consider the following image:
Given a word such as "world", we say that 'w' is the first letter in the word, 'o' is the second letter
and so on. What we are referring to is the position of the letter in the word. The "index" is just a
formal way of denoting the position of an element in the sequence. In computer science, starting
the index from 0 is a widespread convention. This is called zero-based numbering.
Once this is defined, we can go ahead and access characters that are at a given position in a
string:
1 word = 'world'
2 print(word[0])
3 print(word[1])
4 print(word[2])
5 print(word[3])
6 print(word[4])
1 w
2 o
3 r
4 l
5 d
Given a variable, say word , that holds a string literal, word[i] gives the character at index i in
the string. Informally, this would be the letter at position i + 1 in the string. Now, let us turn to
the following code:
1 word = 'world'
2 print(word[5])
The interpreter throws an IndexError as we are trying to access an index that is out of range.
The length of the string is 5 . Since we start the index from 0 , the last character will be at index
4 . Anything greater than that is going to throw an error. Now, let us turn to the other end of the
spectrum:
1 word = 'world'
2 print(word[-1])
1 d
Python supports negative indexing. This can be best understood using the following image:
Think about it as follows. You keep moving down a flight of stairs starting from the top most step.
When you reach the last step, you think that you cannot go down any further. At that moment,
some invisible hand magically transports you back to the top most step and you begin your
descent all over again. A good image that captures this analogy is the Penrose stairs:
Image credit: Wikipedia
An index of -1 points to the last element in the sequence. From this, we keep moving backwards
until we reach the first element in the sequence which is at index -5 .
1 word = 'world'
2 print(word[-1])
3 # ... please add the remaining lines!
4 print(word[-5])
Unlike the Penrose stairs, we cannot keep repeating this forever. print(word[-6]) will throw an
IndexError .
Slicing
Assume that you have a collection of email ids of students in IIT-M. Let us say all email ids are of
this form:
Each branch is given a two-letter code. For example, CS stands for Computer Science and ME
stands for Mechanical Engineering. The year is some two digit number that represents the year of
joining. For example, it would be 11 if the year of joining is 2011 . Finally, number is a three digit
roll number. Some sample email ids are as follows:
1 [email protected]
2 [email protected]
3 [email protected]
Given a string, we would like to extract the roll number of the student from it. How do we do this?
Python provides a way to extract this information using the concept of slicing:
1 email = '[email protected]'
2 roll = email[6 : 9]
3 print(roll)
The slicing operator - start:stop - will be our knife in slicing sequences! Let us see how it works.
The substring that we want to extract is 014 . In terms of indices, this will be 6, 7, 8 in the string
email . So, we start slicing at the index 6 and stop before the index 9 . In general, email[start
: stop] will be the substring starting at index start and stopping before the index stop , i.e.,
the character at the index stop will be excluded from the substring.
1 email = '[email protected]'
2 branch = email[0 : 2]
3 year = email[3 : 5]
4 roll = email[6 : 9]
5 college = email[10 : 14]
6 # Print each one of them and check the output
Slicing is quite powerful. If we want the institute roll number, including the branch, we could do
the following:
1 email = '[email protected]'
2 in_roll = email[ : 9]
3 print(in_roll)
This outputs CS_10_014 . If no starting index is specified in the slice, then start will default to 0 .
Likewise, if no stopping index is specified, stop will default to the end of the string or
len(email) . Now, consider:
1 email = '[email protected]'
2 domain = email[-10 : ]
3 print(domain)
This outputs iitm.ac.in . Think for a while about the output. It is just a combination of negative
indexing and slicing. Use the following visual to get a better understanding of slicing:
Using the above visual, we can now very easily process the following slices:
1 word = 'world'
2 print(word[-4 : 3])
3 print(word[1 : -2])
Immutability
Execute the following code and observe the output:
The interpreter throws a TypeError with the following error message: 'str' object does not
support item assignment . We say that something is "mutable" if it can be changed, modified.
Therefore, an object is immutable if it cannot be changed or modified. Strings are immutable.
One or more characters in the string literal present in word cannot be modified in-place.
Here, we are not modifying the variable word in-place. Instead, we are assigning it an entirely
new string literal in line-2. Thus there are two different string literals - 'some string' and 'Some
string' - and the former has NOT been transformed into the latter.
The number on the arrow represents the line number in the code. word binds to the string on
top after line-1. word binds to the string on the bottom after line-2. Note that there are two
different strings here; one doesn't transform into the other. The concept of mutable and
immutable objects will be explored in considerable detail in chapter-5.
Methods
Consider the following problem:
Accept a sentence as input from the user and output the same sentence with the first letter
in the sentence capitalized.
For example, if the input is 'this is a chair.' , the output should be 'This is a chair.' .
Solution
1 sentence = input()
2 cap_sentence = sentence.capitalize()
3 print(cap_sentence)
capitalize is called a method. Methods are essentially functions, but they are defined for
specific objects. So, they have to be called by using the object for which they have been defined.
In the case of capitalize , it is a method that is defined for the str data type. If we try to call it
using an int object, we will get an error:
Getting back to the previous code snippet, sentence.capitalize() returns a string, which is
then assigned to a new variable called cap_sentence . There are plenty of other methods
associated with strings. Let us look at one more method which features in the solution to this
interesting problem:
name.isalpha() returns a boolean value. If every character in the string is an alphabet and the
string is non-empty, it returns True , and False otherwise. A comprehensive list of string
methods can be found here.
Home Lesson-2.2
Lesson-2.1
Lesson-2.1
Variables
Introduction
Assignment Operator
Dynamic Typing
Referencing versus Defining
Keywords and Naming Rules
Reusing Variables
Multiple Assignment
Assignment Shortcuts
Deleting Variables
Variables
Introduction
Variables are containers that are used to store values. Variables in Python are defined by using the
assignment operator = . For example:
1 x = 1
2 y = 100.
3 z = "good"
1 x = 1
2 print('The initial value of x is', x)
3 x = 2
4 print('The value after updating x is', x)
Assignment Operator
The syntax of the assignment statement is as follows:
<variable-name> = <expression>
The assignment operator works from right to left. That is, the expression on the right is evaluated
first. The value of this expression is assigned to the variable on the left. For example:
1 x = 1 + 2 * 3 / 2
2 print(x)
1 4.0
Having a literal to the left of the assignment operator will result in an error:
The assignment statement maps or binds the variable name on the left to an object on the right. A
closer look at the anatomy of an assignment statement:
The number on any arrow represents the line number in the code. The variable on the left binds to
the object on the right after the corresponding line is executed. For example, the variable x binds to
the object 8 - in this case an int literal - after line-1 is executed. The interesting part is line-3. Note
that y = x makes both x and y bind to the same object. When x is updated in line-4, it binds to a
new object. However, the value of y is not disturbed by this operation. It continues to be bound to
the object 18.0 even after line-4 is executed.
As a final point, the assignment operator should not be confused with the equality operator:
The assignment operator must be used for creating or updating variables; the equality operator must
be used when two expressions need to be compared. They cannot be used interchangeably!
Dynamic Typing
Python supports what is called dynamic typing. In a dynamically typed language, a variable is simply a
value bound to a name; the value has a type — like int or str — but the variable itself doesn't
[refer]. For example:
1 a = 1
2 print(type(a))
3 a = 1 / 2
4 print(type(a))
1 <class 'int'>
2 <class 'float'>
In the above example, a was initially bound to a value of type int . After its update in line-3, it was
bound to a value of type float . The image in the previous section will give a clearer picture of why
this is the case.
1 x = 2
2 print(x * x, 'is the square of', x)
In line-2, we are referencing the variable x which was assigned a value in line-1. If a variable is
referenced before it has been assigned a value, the interpreter throws an exception called
NameError :
1 print(someVar)
1 not, and, or, if, for, while, in, is, def, class
We have already seen some of them - not, and, or . We will come across all these keywords in
upcoming chapters. Keywords cannot be used as names for variables. For example, the following line
of code will throw a SyntaxError when executed:
Along with this restriction, there are certain other rules which have to be followed while choosing the
names of variables in Python [refer]:
A variable name can only contain alpha-numeric (alphabets and numbers) characters and
underscores:
a - z
A - Z
0 - 9
_
A variable name must start with a letter or the underscore character.
Note that these are not merely conventions. Violating any one of these rules will result in a
SyntaxError . As an example, the following code will throw a SyntaxError when executed:
1 ##### Alarm! Wrong code snippet! #####
2 3a = 1
3 ##### Alarm! Wrong code snippet! #####
Reusing Variables
Variables can be used in computing the value of other variables. This is something that will routinely
come up in programming and data science. Consider the following sequence of mathematical
equations. We wish to evaluate the value of z at x = 10 :
1 x = 10
2 y = x ** 2
3 z = (x + 1) * (y + 1)
Multiple Assignment
Consider the following statement that defines two variables x and y .
1 x = 1
2 y = 2
Python allows a compact way of writing this assignment on the same line. The following code assigns
1 to the variable x and 2 to the variable y :
1 x, y = 1, 2
Note that the order matters. The following code assigns 2 to the variable x and 1 to the variable y :
1 x, y = 2, 1
To understand how this works, we need to get into the concept of packing and unpacking tuples,
which we will visit in chapter-5. Treat this as a useful feature for the time being. Another way of doing
multiple assignments is to initialize multiple variables with the same value:
1 x = y = z = 10
2 print(x, y, z)
Though x , y and z start off by being equal, the equality is broken the moment even one of the
three variables is updated:
1 x = x * 1
2 y = y * 2
3 z = z * 3
4 print(x, y, z)
1 10 20 30
Assignment Shortcuts
Execute the code given below and observe the output. What do you think is happening?
1 x = 1
2 x += 1
3 print(x)
x += a
Increment the value of x by a . In other words, add a to x and store the result in x . It is
equivalent to the statement x = x + a .
This is not just limited to the addition operator. The following table gives a summary of the shortcuts
for some of the arithmetic operators:
Shortcut Meaning
x += a x = x + a
x -= a x = x - a
x *= a x = x * a
x /= a x = x / a
x %= a x = x % a
x **= a x = x ** a
Note that the arithmetic operator must always come before the assignment operator in a shortcut.
Swapping them will not work:
1 x = 1
2 x =+ 1
3 print(x)
This will give 1 as the output. This is because + is treated as the unary operator here. Statements
like x =* 1 or x =/ 2 will result in errors!
Deleting Variables
Variables can be deleted by using the del keyword:
1 x = 100
2 print('x is a variable whose value is', x)
3 print('we are now going to delete x')
4 del x
5 print(x)
When this code is executed, line-5 throws a NameError . This is because x was deleted in line-4 and
we are trying to access a variable that is no longer defined at line-5.
Home Lesson-2.3
Lesson-2.2
Lesson-2.2
Input
Type Conversion
Built-in Functions
Input
Accepting input from the user routinely happens in programming. Any piece of software shipped
to a customer needs to have a functional interface that will let the user interact with the software.
We all have used apps like Facebook, Instagram and Twitter. These apps regularly accept input
from the user, though we seldom look at it from a programming perspective. Take the case of
commenting on a post in Facebook. The text entered in the comment-box is the input. The code
running in the backend processes this input and then displays it as a comment in a visually
appealing form.
Python provides a built-in function called input() to accept input from the user. This is simple
yet powerful:
1 x = input()
2 print('The input entered by the user is', x)
Execute the code given above and head to the console. Here the interpreter waits patiently for
you to enter text. Press enter after entering the input. This acts as a cue for the interpreter to
understand that you have completed entering your input. This text is stored in the variable x .
The way it looks in the console is as follows:
1 1
2 The input entered by the user is 1
Sometimes we may want to prompt the user to enter a particular type of input. This can be done
by passing the instruction as an argument to the input function:
Execute the above code with the following input types: int , float , str and bool . What is the
output in each case? We see that the input() function always returns a string. Even if the user
enters a number, say 123 , that is processed as the string '123' . If we want to accept an integer
as input, how do we do it? We take the help of an operation called type conversion.
Type Conversion
If we want to convert a string into an integer, Python provides a built-in function called int :
1 x = '123'
2 print('The type of x is', type(x))
3 y = int(x)
4 print('The type of y is', type(y))
The operation in line-3 is called type conversion, i.e., we are converting an object of type str into
an object of type int . The inverse operation also works. Predictably, the function needed for this
purpose is called str :
1 x = 123
2 print('The type of x is', type(x))
3 y = str(x)
4 print('The type of y is', type(y))
If we want to accept an integer input from the user, we first take a string as input and then
convert it into an integer:
Instead of writing this in two lines, we could write this in a single line:
1 x = int(input())
2 print('The integer entered by the user is', x)
What we have done in line-1 is to compose two functions. That is, pass the output of the inner
function - input() - as the input of the outer function - int() . In the above code, what happens
if the input entered is a float value?
The code will throw a ValueError . Let us take a concrete example. When the command
int('1.23') is entered, the interpreter tries to convert the string '1.23' into an integer. But
the number enclosed within the quotes is not an int , but a float . This number cannot be
converted into an integer, hence the error.
Built-in Functions
We have been using the term built-in functions quite often. These are functions that have
already been defined. Loosely speaking, a function in Python is an object that accepts inputs and
produces outputs. For example, print is a built-in function that accepts an input and prints it to
the console.
round accepts a number as input and returns the integer closest to it. For example,
round(1.2) returns 1 , while round(1.9) returns 2 .
abs accepts a number as input and returns its absolute value. For example, abs(-1.2)
returns 1.2 .
int is a bit involved. If an integer enclosed within quotes (string) is entered as input, then
the output is that integer. We have already seen this: int('123') is 123 . If a float is entered
as input, then the decimal part is thrown away and the integer part is returned. For example,
int(1.2) returns 1 and int(-2.5) returns -2 .
pow is another useful function. pow(x, y) returns the value of . This performs the same
function as the ** operator. In general, the ** operator is faster than the pow function. But
for small numbers, the difference is not perceptible. In fact, using the pow function increases
readability of code. An extra feature of pow is that it supports a third argument: pow(x, y,
z) returns the value of . That is, it gives the remainder when is divided by .
isinstance is used to check if an object is of a specified type. For example isinstance(3,
int) returns the value True as the literal 3 is of type int . The first argument could be any
object, not just a literal. For example, if x is a variable of type str then, isinstance(x,
str) will again return True .
Home Lesson-2.4
Lesson-2.3
Lesson-2.3
Conditional Statements
if
if-else
if-elif-else
Nested conditional statements
Defining variables inside if
Conditional Statements
if
Let us explore the idea of conditional statements by solving a simple problem:
Problem: Accept an integer as input from the user. If the number is greater than or equal to
zero, print: non-negative .
Solution
1 x = int(input())
2 if x >= 0:
3 print('non-negative')
if is a keyword in Python. The text adjacent to if is a boolean expression, usually called the if-
condition or just the condition. Line-3 is the body of if . If the condition evaluates to True , then
line-3 is executed. If it is False , then line-3 doesn't get executed. The following diagram captures the
terms that have been introduced:
Note that line-3 in the solution code is indented. In this case, the indentation corresponds to four
spaces. It is very important to keep this consistent throughout the program. In all lessons, the first
level of indentation will have four space . To understand how indentation works and why it is
necessary, consider the following code blocks:
1 # Left | # Right
2 x = 1 | x = -1
3 if x >= 0: | if x >= 0:
4 print('non-negative') | print('non-negative')
5 print('inside if') | print('inside if')
6 print('outside if') | print('outside if')
1 non-negative | outside if
2 inside if |
3 outside if |
Lines 3-5 in the code make up the if-block. Lines 4 and 5 which are indented make up the body of
if . Whenever the if-condition evaluates to True , the interpreter enters the body of if and
executes the lines sequentially. The indentation helps in separating the body of the if-block from the
rest of the code.
Left: For the code on the left, the condition is True . So lines 4 and 5 are going to be executed. Once
we exit the if-block, the interpreter will resume execution from line-6.
Right: For the code on the right, the condition is False . So, lines 4 and 5 are not going to be executed.
The interpreter will skip the body of if and directly move to line-6.
if-else
Let us add one more level of complexity to the problem.
Problem
Accept an integer as input from the user. If the number is greater than or equal to zero, print:
non-negative . If the number is less than zero, print negative .
Solution
1 x = int(input())
2 if x >= 0:
3 print('non-negative')
4 else:
5 print('negative')
else is a keyword in Python. When the if-condition evaluates to True , the statements inside the
body of the if-block are evaluated. When the condition evaluates to False , the statements inside the
body of the else-block are evaluated.
Points to remember:
if-elif-else
Time for another bump in the level of complexity:
Accept an integer as input from the user. If the number is greater than zero, print: positive . If
the number is less than zero, print negative . If the number is equal to zero, print zero .
1 x = int(input())
2 if x > 0:
3 print('positive')
4 elif x == 0:
5 print('zero')
6 else:
7 print('negative')
8 # End of code
To understand how this works, let us consider three different inputs and the corresponding outputs.
Input Output
x=1 positive
x=0 zero
x = -1 negative
This is the process followed by the interpreter in executing the if-elif-else block:
If the if-condition evaluates to True , line-3 is executed and then the control transfers to line-8.
If the if-condition evaluates to False , the control transfers to the elif-block. If the elif-condition
evaluates to True , then line-5 is executed and then the control transfers to line-8.
If the elif-condition is False , the control transfers to the else-block and line-7 is executed. As
there are no more conditions to check, control naturally transfers to line-8.
1 if <condition-1>:
2 <statement-1>
3 elif <condition-2>:
4 <statement-2>
5 else:
6 <statement-3>
Accept three distinct integers as input from the user. If the numbers have been entered in
ascending order, print in ascending order . If not, print not in ascending order .
The problem with the above solution is that it doesn't check if y < z . So, for an input like x, y, z =
1, 3, 2 , it will print in ascending order , which is incorrect. The complete solution is given below:
1 x = int(input())
2 y = int(input())
3 z = int(input())
4
5 if x < y:
6 if y < z:
7 print('in ascending order')
8 else:
9 print('not in ascending order')
10 else:
11 print('not in ascending order')
Whenever a new if-block is introduced, its body should have exactly one level of indentation with
respect to its if-condition. Since line-7 makes up the body of the if-block starting at line-6, it has one
level of indentation with respect to line-6. However, line-6 is already at the first level of indentation
with respect to line-5, so line-7 has two levels of indentation with respect to line-5. According to the
convention we have chosen, two levels of indentation will correspond to eight spaces.
Having a conditional statement inside another conditional statement is called nesting. The if-block
from lines 5-9 forms the outer block. The if-else block from lines 6-9 forms the inner block. The else
in line-8 is paired with the if in line-6 as they are at the same level of indentation. For similar
reasons, the else in line-10 is paired with the if in line-5.
1 x = int(input())
2 if x % 5 == 0:
3 output = 'the number is divisible by 5'
4 print(output)
Run the code multiple times, varying the input each time. What do you observe?
Whenever the input is a multiple of 5, the code runs without any error. When the input is not divisible
by 5, the code throws a NameError . This is because, we are trying to reference a variable that has not
been defined. The variable output is created only if line-3 is executed during run-time. Its mere
presence in the code is not enough.
Home Lesson-3.1
Lesson-2.4
Lesson-2.4
Library
calendar
time
this
Library
A library is a collection of functions that share a common theme. This is a loose definition and will
become clear when we start working with a library.
calendar
Consider the following problem:
1 import calendar
2 calendar.prmonth(3000, 8)
1 August 3000
2 Mo Tu We Th Fr Sa Su
3 1 2 3
4 4 5 6 7 8 9 10
5 11 12 13 14 15 16 17
6 18 19 20 21 22 23 24
7 25 26 27 28 29 30 31
15th of August falls on a Friday. Isn't that lovely? It took just two lines of code! calendar is one
among several libraries in Python's standard library. A comprehensive list can be found here.
Going back to the code, calendar is the name of the library and import is the keyword used to
include this library as a part of the code.
calendar is a collection of functions that are related to calendars. prmonth is one such function.
It accepts <year> and <month> , as input and displays the calendar for <month> in the year
<year> . If we want to use a function in calendar , we must first import the library. Let us see
what happens if skip this step:
1 # import calendar
2 calendar.prmonth(3000, 8)
1 <calendar>.<function>(<arguments>)
1 import calendar
2 print(calendar.weekday(3000, 8, 15))
The output of the above code is 4 . Days are mapped to numbers as follows:
Day Number
Monday 0
Tuesday 1
Wednesday 2
Thursday 3
Friday 4
Saturday 5
Sunday 6
time
Let us now try to answer this hypothetical question:
You are stranded on an island in the middle of the Indian Ocean. The island has a
computing device that has just one application installed in it: a Python interpreter. You wish
to know the current date and time.
Solution
The syntax of the import statement in line-1 looks different. from is a new keyword. The first line
of the code is essentially doing the following: from the library called time import the function
called ctime . This way of importing functions is useful when we need just one or two functions
from a given library:
sleep(x) is a function in time that suspends the execution of the program for x seconds. If we
would be using several functions in the library, then it is a bad idea to keep importing each of
them individually. In such cases, it is good to fall back on importing the entire library.
this
As a fun exercise, consider the following code:
1 import this
These are some nuggets of wisdom from Tim Peters, a "major contributor to the Python
programming language" [refer]. Some of the points make immediate sense, such as "readability
counts".
Home Lesson-3.2
Lesson-3.1
Lesson-3.1
Loops
Introduction
while
break , continue
Loops
Introduction
Consider the following problem:
1 print(1 + 2 + 3 + 4 + 5)
The earlier approach is not going to work. If it takes about five seconds on average to write a
number followed by the + symbol, how much time will it take to find the sum of all 1 million
numbers? Let us check:
It will take nearly 58 days to sum all 1 million integers! This is assuming that we work like
machines that don't need food or sleep. All of this just to do something as trivial as finding the
sum of numbers. This is where loops come in.
while
The "loopy" solution to this problem:
1 total = 0
2 num = 0
3 while num < 1_000_000:
4 num = num + 1
5 total = total + num
6 print(total)
7 # Rest of code will follow below this comment
while is a keyword in Python. The expression adjacent to while is a boolean expression, called
the while-condition, or just the condition. Lines 4 and 5 make up the body of while. If the
condition evaluates to True , control enters the body of while. The lines in the body are
sequentially executed. After the last line in the body is executed, the control loops back to line-3,
where the condition is evaluated again. As long as the condition is True , the body of while keeps
getting executed. The moment the condition becomes False , the body of the while is skipped
and control transfers to line-6. The body of the while-loop must always be indented; this helps to
separate it from the rest of the code.
Keep accepting integers as input from the user until the user enters a negative number.
Print the sum of the positive numbers entered by the user. Print 0 if the user doesn't enter
any positive integer.
Keep accepting integers as input from the user until the user enters a negative number.
Print the maximum among the positive numbers entered by the user. Print 0 if the user
doesn't enter any positive integer.
Solution
1 # Initialize
2 num = int(input())
3 max_num = 0
4 # Loop
5 while num >= 0:
6 if num > max_num:
7 max_num = num
8 num = int(input())
9 # Print output
10 print(max_num)
Note that lines 6-8 make up the body of while and are indented. Lines 1, 4 and 9 have some
comments which are meant to help the reader understand what is happening in the code that
follows them.
break , continue
break and continue are keywords in Python and are associated with loops. The break
statement is used to exit out of a loop without executing any code that comes below it. For
example:
1 num = 1
2 while True:
3 if (num % 2 == 0) and (num % 3 == 0) and (num % 4 == 0):
4 break
5 num = num + 1
6 print(num)
The above code prints the smallest positive integer that is divisible by 2, 3 and 4, which is the
same as the LCM of (2, 3, 4). The moment this number is found, the code breaks out of the loop.
The continue statement is used to move to the next iteration of the loop, skipping whatever
code comes below it. For example:
1 x = 0
2 while x < 50:
3 x = x + 1
4 if x % 3 != 0:
5 continue
6 print(x)
The code given above prints all positive integers less than or equal to 50 that are divisible by 3.
Whenever x is not divisible by 3, we do not want to print the number, so we continue to the next
iteration.
The similarity between break and continue is that whenever either statement is encountered in
a loop, all the statements that follow it are skipped. The main difference is that, break exits the
loop whereas continue moves to the next iteration.
break and continue are interesting features offered by Python. However, it is important to note
that both the examples that we just discussed can be written without using break or continue .
It is left as an exercise for the reader to figure out how this can be done.
Home Lesson-3.3
Lesson-3.2
Lesson-3.2
Loops
for loop
range()
Iterating through Strings
Loops
for loop
Let us look at a simple problem of printing numbers. We would like to print the first 5 non-
negative integers. We have a different kind of a loop now, the for loop:
1 for i in range(5):
2 print(i)
3 # A dummy line
1 0
2 1
3 2
4 3
5 4
for and in are keywords in Python. range is an object that represents a sequence of numbers.
Line-2 is the body of the loop. An intuitive understanding of the code given above is as follows:
In each iteration of the loop, an element in the sequence is picked up and is printed to the
console.
Assuming that the sequence is ordered from left to right, the leftmost element is the first to
be picked up.
The sequence is processed from left to right.
Once the rightmost element has been printed to the console, control returns to line-1 for
one last time. Since there are no more elements to be read in the sequence, the control exits
the loop and moves to line-3.
range()
range(5) represents the following sequence: 0, 1, 2, 3, 4 . In general, range(n) represents
the sequence: 0, 1, ..., n - 1 . range is quite versatile. The following code prints all two digit
numbers greater than zero:
range(10, 100) represents the sequence 10, 11, ..., 99 . In general, range(start, stop)
represents the sequence start, start + 1, ..., stop - 1 . Let us add another level of
complexity. The following code prints all even two digit numbers greater than 0:
range(10, 100, 2) represents the sequence 10, 12, ..., 98 . In general, range(start,
stop, step) represents the sequence start, start + step, start + 2 * step, ..., last ,
where last is the largest element in this sequence that is less than stop . This is true when the
step parameter is positive.
range(n)
range(0, n)
range(0, n, 1)
So far we have seen only increasing sequences. With the help of a negative step size, we can also
come up with decreasing sequences. The following code prints all two-digit even numbers greater
than zero in descending order:
range(5, 5) is an empty sequence. So, the above code will not print anything. Another instance
of an empty sequence:
The point to note is that neither of these code snippets produces any error. Finally, try executing
the following snippet and observe the output.
1 word = 'good'
2 for char in word:
3 print(char)
1 g
2 o
3 o
4 d
1 word = 'good'
2 count = 1
3 for char in word:
4 print(char, 'occurs at position', count, 'in the string', word)
5 count = count + 1
Home Lesson-3.4
Lesson-3.3
Lesson-3.3
Nested loops
while versus for
print: end , sep
end
sep
end and sep
Nested loops
Consider the following problem:
Find the number of ordered pairs of positive integers whose product is 100. Note that order
matters: (2, 50) and (50, 2) are two different pairs.
Solution
1 count = 0
2 for a in range(1, 101):
3 for b in range(1, 101):
4 if a * b == 100:
5 count = count + 1
6 print(count)
The code given above is an example of a nested loop. Lines 2-5 form the outer loop while lines 3-5
form the inner-loop. There are multiple levels of indentation here. Line-3 is the beginning of a new
for loop, so line-4 is indented with respect to line-3. As line-4 is an if statement, line-5 is indented
with respect to line-4.
This problem could have been solved without using a nested loop. The nested loop is not an efficient
solution. It is left as an exercise to the reader to come up with a more efficient solution to this
problem. Let us look at one more problem:
Find the number of prime numbers less than , where is some positive integer.
Solution
1 n = int(input())
2 count = 0
3 for i in range(2, n + 1):
4 flag = True
5 for j in range(2, i):
6 if i % j == 0:
7 flag = False
8 break
9 if flag:
10 count = count + 1
11 print(count)
The outer for loop goes through each element in the sequence 2, 3, ..., n . i is the loop
variable for this sequence.
We begin with the guess that i is prime. In code, we do this by setting flag to be True .
Now, we go through all potential divisors of i . This is represented by the sequence 2, 3, ...,
i - 1 . Variable j is the loop variable for this sequence. Notice how the sequence for the inner
loop is dependent on i , the loop variable for the outer loop.
If j divides i , then i cannot be a prime. We correct our initial assumption by updating flag
to False whenever this happens. As we know that i is not prime, there is no use of continuing
with the inner-loop, so we break out of it.
If j doesn't divide i for any j in this sequence, then i is a prime. In such a situation, our
initial assumption is right, and flag stays True .
Once we are outside the inner-loop, we check if flag is True . if that is the case, then we
increment count as we have hit upon a prime number.
Nesting is not restricted to for loops. Any one of the following combinations is possible:
1 n = int(input())
2 for i in range(n):
3 print(i ** 2)
In the code given above, the number of iterations will keep varying every time the code is run with a
different input. But given the knowledge of the input, the number of iterations is fixed. On the other
hand, consider the following example:
1 x = int(input())
2 while x > 0:
3 x = int(input())
The number of iterations in the above code can be determined only after it terminates. There is no
way of quantifying the number of iterations as an explicit function of user input.
Accept a positive integer n as input and print all the numbers from 1 to n in a single line
separated by commas.
1 1,2,3,4,5,6,7,8,9
1 n = int(input())
2 for i in range(1, n + 1):
3 print(i, ',')
1 1 ,
2 2 ,
3 3 ,
4 4 ,
5 5 ,
6 6 ,
7 7 ,
8 8 ,
9 9 ,
1 n = int(input())
2 for i in range(1, n):
3 print(i, end = ',')
4 print(n)
For n = 9 , this will give the required output:
1 1,2,3,4,5,6,7,8,9
Whenever we use the print function, it prints the expression passed to it and immediately follows it
up by printing a newline. This is the default behaviour of print . It can be altered by using a special
argument called end . The default value of end is set to the newline character. So, whenever the end
argument is not explicitly specified in the print function, a newline is appended to the input
expression by default. In the code given above, by setting end to be a comma, we are forcing the
print function to insert a comma instead of a newline at the end of the expression passed to it. It is
called end because it is added at the end. To get a better picture, consider the following code:
1 print()
2 print(end = ',')
3 print(1)
4 print(1, end = ',')
5 print(2, end = ',')
6 print(3, end = ',')
1
2 ,1
3 1,2,3,
Even though nothing is being passed to the print function in the first line of code, the first line in the
output is a newline because the default value of end is a newline character ( '\n' ). No expression is
passed as input to print in the second line of code as well, but end is set to , . So, only a comma is
printed. Notice that line-3 of the code is printed in line-2 of the output. This is because end was set to
, instead of the newline character in line-2 of the code.
sep
If multiple expressions are passed to the print function, it prints all of them in the same line, by
adding a space between adjacent expressions. For example:
1 this is cool
What if we do not want the space or if want some other separator? This can be done using sep :
1 this,is,cool
1 thisiscool
Accept a positive integer n , which is also a multiple of 3, as input and print the following
pattern:
1 |1,2,3|4,5,6|7,8,9|
Solution
1 n = int(input())
2 print('|', end = '')
3 for i in range(1, n + 1, 3):
4 print(i, i + 1, i + 2, sep = ',', end = '|')
5 print()
Notice that the for loop iterates in steps of 3 starting from 1. To print the comma separated triplet
i,i + 1,i + 2 , sep is set to , . After printing each triplet, the symbol | needs to be printed. This is
achieved by setting end to be equal to | . Line-2 makes sure that the symbol | is present at the
beginning of the pattern. The last print statement outside the loop is there so that the prompt can
move to the next line on the console once the pattern has been printed. You can try removing the last
line and see how that changes the output on the console.
Home Lesson-3.5
Lesson-3.4
Lesson-3.4
Formatted printing
f-strings
format()
Format specifiers
Formatted printing
Consider the following program:
1 name = input()
2 print('Hi,', name, '!')
When this code is executed with Sachin as the input, we get the following output:
1 Hi, Sachin !
This looks messy as there is an unwanted space after the name. This is a formatting issue. Python
provides some useful tools to format text the way we want.
f-strings
The first method that we will look at is called formatted string literals or f-strings for short. Let us
jump into the syntax:
1 name = input()
2 print(f'Hi, {name}!')
When this code is executed with Sachin as the input, we get the following output:
1 Hi, Sachin!
The messy formatting has been corrected. Let us take a closer look at the string inside the print
command:
1 f'Hi, {name}'
This is called a formatted string literal or f-string. The f in front of the string differentiates f-strings
from normal strings. f-string is an object which when evaluated results in a string. The value of the
variable name is inserted in place of {name} in the f-string. Two things are important for f-strings to
do our bidding:
1 name = 'Sachin'
2 print('Hi, {name}!')
3 print(f'Hi, name!')
1 Hi, {name}!
2 Hi, name!
1 l, b = int(input()), int(input())
2 print(f'The length of the rectangle is {l} units')
3 print(f'The breadth of the rectangle is {b} units')
4 print(f'The area of the rectangle is {l * b} square units')
Going back to the code, lines 2 and 3 are quite clear. Notice that line-4 has an expression — l * b —
inside the curly braces and not just a variable. f-strings allow any valid Python expression inside the
curly braces. If the f-string has some {expression} in it, the interpreter will substitute the value of
expression in the place of {expression} . Another example:
1 x = int(input())
2 print(f'Multiplication table for {x}')
3 for i in range(1, 11):
4 print(f'{x} X {i} \t=\t {x * i}')
The \t is a tab character. It has been added before and after the = . Remove both the tabs and run
the code. Do you see any change in the output?
Till now we have used f-strings within the print statement. Nothing stops us from using it to define
other string variables:
1 name = input()
2 qual = input()
3 gender = input()
4 if qual == 'phd':
5 name_respect = f'Dr. {name}'
6 elif gender == 'male':
7 name_respect = f'Mr. {name}'
8 elif gender == 'female':
9 name_respect = f'Ms. {name}'
10 print(f'Hello, {name_respect}')
format()
Another way to format strings is using a string method called format() .
1 name = input()
2 print('Hi, {}!'.format(name))
In the above string, the curly braces will be replaced by the value of the variable name . Another
example:
1 l, b = int(input()), int(input())
2 print('The length of the rectangle is {} units'.format(l))
3 print('The breadth of the rectangle is {} units'.format(b))
4 print('The area of the rectangle is {} square units'.format(l * b))
The output will be identical to the one we saw when we used f-strings. Some points to note in line-3 of
this code-block. There are three pairs of curly braces. The values that go into these three positions are
given as three arguments in the format function. Starting from the left, the first pair of curly braces
in the string is replaced by the first argument in format , the second pair by the second argument
and so on. Few more examples:
1 fruit1 = 'apple'
2 fruit2 = 'banana'
3 print('{} and {} are fruits'.format(fruit1, fruit2))
In this code, the mapping is implicit. The first pair of curly braces is mapped to the first argument and
so on. This can be made explicit by specifying which argument a particular curly braces will be
mapped to:
1 fruit1 = 'apple'
2 fruit2 = 'banana'
3 print('{0} and {1} are fruits'.format(fruit1, fruit2))
The integer inside the curly braces gives the index of the argument in the format function. The
arguments of the format function are indexed from 0 and start from the left. Changing the order of
arguments will change the output. A third way of writing this as follows:
1 fruit1 = 'apple'
2 fruit2 = 'banana'
3 print('{string1} and {string2} are fruits'.format(string1 = fruit1, string2 =
fruit2))
This method uses the concept of keyword arguments which we will explore in the lessons on
functions in the next chapter. Until then, let us put this last method on the back-burner.
Format specifiers
Consider the following code:
1 pi_approx = 22 / 7
2 print(f'The value of pi is approximately {pi_approx}')
1 pi_approx = 22 / 7
2 print(f'The value of pi is approximately {pi_approx:.2f}')
Let us look at the content inside the curly braces: {pi_approx:.2f} . The first part before the : is the
variable. Nothing new here. The part after : is called a format specifier. .2f means the following:
1 pi_approx = 22 / 7
2 print(f'The value of pi is approximately {pi_approx:.3f}')
Let us now take another example. Let us say we want to print the marks of three students in a class:
1 BSC1001: 90.5
2 BSC1002: 100
3 BSC1003: 90.15
While this is not bad, we would like the marks to be right aligned and have a uniform representation
for the marks. This is what we wish to see:
1 BSC1001: 90.50
2 BSC1002: 100.00
3 BSC1003: 90.15
This is much more neater. The following code helps us achieve this:
The part that might be confusing is the second curly braces in each of the print statements. Let us
take a closer look: {marks_1:10.2f} . The part before the : is the variable. The part after the : is
10.2f . Here again, .2f signifies that the float value should be rounded off to two decimal places.
The 10 before the decimal point is the minimum width of the column used for printing this value. If
the number has fewer than 10 characters (including the decimal point), this will be compensated by
adding spaces before the number.
For a better understanding of this concept, let us turn to printing integers with a specific formatting.
This time, we will use the format function:
1 print('{0:5d}'.format(1))
2 print('{0:5d}'.format(11))
3 print('{0:5d}'.format(111))
4 print('{:5d}'.format(1111))
5 print('{:5d}'.format(11111))
6 print('{:5d}'.format(111111))
1 1
2 11
3 111
4 1111
5 11111
6 111111
Home Lesson-3.6
Lesson-3.5
Lesson-3.5
Library
math
random
Library
We will look at two more libraries — math and random — and use them to solve some fascinating
problems in mathematics.
math
Consider the following sequence:
Mathematically, it is known that this sequence converges or approaches a specific value. In other
words, this sequence gets closer and closer to a well defined number as more terms are added. This
number is called the limit of the sequence. What is the limit for the above sequence? Can we use
whatever we have learned so far to estimate this value?
1 import math
2 x = 0
3 for n in range(1, 6):
4 x = math.sqrt(2 + x)
5 print(f'n = {n}, x_n = {x:.3f}')
sqrt is a function in the math library that returns the square root of the number that is entered as
argument. Representing the output shown above as a table:
Approximate value
1 1.414
2 1.848
3 1.962
4 1.990
5 1.998
Isn't that beautiful? It looks like this sequence — the train of square roots — is approaching the value
2. Let us run the loop for more number of iterations this time:
1 import math
2 x = 0
3 for n in range(1, 20):
4 x = math.sqrt(2 + x)
5 print(x)
After just 20 iterations, the value is so close to two: 1.9999999999910236 . But we have used trial and
error to decide when to terminate the iteration. A better way to do this is to define a tolerance: if the
difference between the previous value and the current value in the sequence is less than some
predefined value (tolerance), then we terminate the iteration.
1 import math
2 x_prev, x_curr = 0, math.sqrt(2)
3 tol, count = 0.00001, 0
4 while abs(x_curr - x_prev) >= tol:
5 x_prev = x_curr
6 x_curr = math.sqrt(2 + x_prev)
7 count += 1
8 print(f'Value of x at {tol} tolerance is {x_curr}')
9 print(f'It took {count} iterations')
random
How do we toss a coin using Python?
1 import random
2 print(random.choice('HT'))
That is all there is to it! random is a library and choice is a function defined in it. It accepts any
sequence as input and returns an element chosen at random from this sequence. In this case, the
input is a string, which is nothing but a sequence of characters.
We know that the probability of obtaining a head on a coin toss is 0.5. This is the theory. Is there a
way to see this rule in action? Can we computationally verify if this is indeed the case? For that, we
have to set up the following experiment. Toss a coin times and count the number of heads. Dividing
the total number of heads by will give the empirical probability. As becomes large, this probability
must approach 0.5.
1 import random
2 n = int(input())
3 heads = 0
4 for i in range(n):
5 toss = random.choice('HT')
6 if toss == 'H':
7 heads += 1
8 print(f'P(H) = {heads / n}')
Let us run the above code for different values of and tabulate our results:
10 0.2
100 0.52
1,000 0.517
10,000 0.5033
100,000 0.49926
1,000,000 0.499983
The value is approaching 0.5 as expected! random is quite versatile. Let us now roll a dice!
1 import random
2 print(random.randint(1, 6))
Home Lesson-4.1
Lesson 3.6
Lesson 3.6
Math and Programming
Limits
Recurrence relation
Rational Approximation
Limits
Consider the following number:
It is known that . From this, it follows that . Now, consider the following
sequence:
As becomes very large, the values in this sequence will become smaller and smaller. This is
because, if you keep multiplying a fraction with itself, it becomes smaller and smaller. In
mathematical terms, the limit of this sequence as tends to infinity is zero. Let us verify this
programmatically:
1 import math
2 n = int(input()) # sequence length
3 CONST = math.pow(2, 0.5) - 1 # basic term in the sequence
4 a_n = 1 # zeroth term
5 for i in range(n):
6 a_n = a_n * CONST # computing the nth term
7 print(a_n)
Try this out for a few values of . For , the value is , which is so small that for all
practical purposes, it is as good as zero.
Recurrence relation
Now, here is another fact. For every number , there are unique integers and such that:
For , this is obvious: . What about higher values of ? . We can prove this using
mathematical induction. The following is a sketch of the inductive proof. If ,
then:
The equation given above defines what is called a recurrence relation: each new term in the sequence
is a function of the preceding terms. In this sequence we have . For , the pair of
equations given below forms the recurrence relation:
Loops are useful tools when it comes to computing terms in such sequences:
Rational Approximation
This in turn provides a way to approximate using rational numbers:
As becomes large, this approximation will become increasingly accurate. For example, here is an
approximation after 100 iterations. It is accurate up to several decimal places!
Is any of this useful? I don't know. But honestly, who cares? We don't do things because they are
useful. We do them because they are interesting. And all interesting things will find their use at some
point of time in the future.
Home Lesson-4.2
Lesson-4.1
Lesson-4.1
Functions
Introduction
Examples
Docstrings
Functions
Introduction
In mathematics, a function is an object that accepts one or more inputs and produces one or
more outputs. For example, , is a function that accepts a number and returns the
square of that number. Functions in Python play a similar role, but are much more richer than
their mathematical counterparts. Let us quickly convert the mathematical function, ,
into a Python function:
1 def f(x):
2 y = x ** 2
3 return y
The code given above is called the definition of function f . def is the keyword used to define
functions. f is the name of the function. x is a parameter of the function. Lines 2 and 3 make up
the body of the function and are indented. The body of a function is a collection of statements
that describe what the function does. At line-3, the value stored in variable y is returned. return
is the keyword used for this purpose.
If we run the above code, we will not get any output. Functions are not executed unless they are
called. The following code demonstrates what a function call looks like:
1 def square(x):
2 y = x ** 2
3 return y
4
5 print(square(2))
1 4
square(2) is a function call. We use the name of the function, square , and pass the number 2
as an argument to it. The x in the function definition is called the parameter. The value that is
passed to the function in the call is called the argument. This is a convention that we will follow
throughout this lesson.
Examples
We will look at a wide variety of function definitions. The focus will be on the syntactical aspects of
function definitions.
1 def foo():
2 return "I don't like arguments visiting me!"
When the code given above is executed, we get the following output:
Note that we didn't have to type print(foo()) . We just had to call the function — foo() —
since it already has the print statement inside it. But what happens if we type print(foo()) ? We
get the following output:
If no explicit return statement is present in a function, None is the default value returned by it.
When the interpreter comes across the print(foo()) statement, first the function foo() is
evaluated. This results in the first line of the output. Since foo() has no explicit return statement,
it returns None by default. That is why the second line in the output is None .
1 def foo():
2 pass
pass is a keyword in Python. When the interpreter comes across a pass statement, it doesn't
perform any computation and moves on to the next line. The reason this is minimal is because it
has only those features that are absolutely essential for a function definition to be syntactically
valid: function name and at least one statement in the body.
Such functions might seem useless at first sight, but they do have their place in programming.
While writing a complex piece of code, a coder may realize that she needs to define a function to
perform a specific task. But she may not know the exact details of the implementation or it may
not be an urgent requirement. In such a scenario, she will add a minimal function like the one
given above in her code and name it appropriately. Implementing this function will become a task
on her to-do list and will be taken up as and when the need arises.
Functions could have multiple return statements, but the moment the first return is
executed, control exits from the function:
1 def foo():
2 return 1
3 return 2
foo() will always return 1. Line-3 is redundant. An example of a function having multiple returns
that are not redundant:
1 def evenOrOdd(n):
2 if n % 2 == 0:
3 return 'even'
4 else:
5 return 'odd'
6
7 print(evenOrOdd(10))
8 print(evenOrOdd(11))
1 even
2 odd
When evenOrOdd is called with an even number as argument, the return statement in line-3 is
executed. When the same function is called with an odd number as argument, the return
statement in line-5 is executed.
The exact mechanism of what happens here will become clear when we come to the lesson on
tuples. In line-8, the first value returned by bound is stored in l and the second value returned
by bound is stored in u .
Functions have to be defined before they can be called. The function call cannot come before
the definition. For example:
When the above code is executed, it throws a NameError . Why does this happen? The Python
interpreter executes the code from top to bottom. At line-2, f is a name that the interpreter has
never seen before and therefore it throws a NameError . Recall that NameError occurs when we
try to reference a name that the interpreter has not seen before.
1 def foo():
2 print('I am inside foo')
3
4 def bar():
5 print('I am inside bar')
6 print('I am going to call foo')
7 foo()
8
9 print('I am outside both foo and bar')
10 bar()
11 print('I am outside both foo and bar')
1 def foo():
2 def bar():
3 print('bar is inside foo')
4 bar()
5
6 foo()
Docstrings
Consider the following function:
1 def square(x):
2 """Return the square of x."""
3 return x ** 2
The string immediately below the function definition is called a docstring. From the Python docs:
A docstring is a string literal that occurs as the first statement in a module, function, class, or
method definition. Such a docstring becomes the __doc__ special attribute of that object.
Ignore unfamiliar terms such as "module" and "class". For now, it is sufficient to focus on
functions. Adding the docstring to functions is a good practice. It may not be needed for simple
and obvious functions like the one defined above. As the complexity of the functions you write
increases, docstrings can be a life safer for other programmers reading your code.
The docstring associated with a given function can be accessed using the __doc__ attribute:
1 print(square.__doc__)
Home Lesson-4.3
Lesson-4.2
Lesson-4.2
Arguments
Positional arguments
Keyword arguments
Default arguments
Call by value
Arguments
Python offers a number of options in terms of the way arguments can be passed to functions.
Each method of argument passing tries to answer the following question:
How are the arguments in the function call passed to the parameters in the function
definition?
Positional arguments
All functions that we have seen so far have used positional arguments. Here, the position of an
argument in the function call determines the parameter to which it is passed. Let us take the
following problem:
Write a function that accepts three positive integers x , y and z . Return True if the three
integers form the sides of a right triangle with x and y as its legs and z as the hypotenuse,
and False otherwise.
Solution
Arguments are passed to the parameters of the function based on the position they occupy in the
function call. Look at the comments in the above code to get a clear picture. Positional arguments
are also called required arguments, i.e., they cannot be left out. Likewise, adding more arguments
than there are parameters will throw an error. When positional arguments are involved, there
should be exactly as many arguments in the function call as there are parameters in the function
definition. Try to execute the following code and study the error message:
Keyword arguments
Keyword arguments introduce more flexibility while passing arguments. Let us take up the same
problem that we saw in the previous section and just modify the function calls:
The function call in line-3 uses what are known as keyword arguments. In this method, the names
of the parameters are explicitly specified and the arguments are assigned to it using the =
operator. This is different from positional arguments where the position of the argument in the
function call determines the parameter to which it is bound. One advantage of using keyword
arguments is that it reduces the possibility of entering the arguments in an incorrect order. For
example:
1 isRight(3, y = 4, z = 5)
The interpreter throws a TypeError with the following message: positional argument follows
keyword arguments . That is, in our function call, the positional arguments — 4 and 5 — come
after the keyword argument x = 3 . Why does the interpreter objects to this? Whenever both
positional and keyword arguments are present in a function call, the keyword arguments must
always come at the end. This is quite reasonable: positional arguments are extremely sensitive to
position, so it is best to have them at the beginning.
The interpreter objects by throwing a TypeError with the following message: isRight() got
multiple values for argument x . Objection granted! Another reasonable requirement from
the Python interpreter: there must be exactly one argument in the function call for each
parameter in the function definition, nothing more, nothing less. This could be a positional
argument or a default argument, but not both.
Default arguments
Consider the following scenario. The image that you see here is a map of your neighborhood. The
grid lines are roads that can be used by cars. You wish to reach the point from . There are no
restrictions if you are on foot. The easiest way is to move along the line . This is called the
Euclidean distance between points and . If you are in a car, then you are forced to move along
the grid lines. The distance you would have to cover in a car is . This distance is called
the Manhattan distance between points and .
Let us say that a self-driving car startup operating in your neighborhood uses both these metrics
while computing distances. Assume that its code base invokes the Euclidean distance 10 times
and the Manhattan distance 1000 times. Since these metrics are used repeatedly, it is a good idea
to represent them as functions in the code base:
While the above code is fine, it ignores the fact that the Manhattan distance is being used
hundred times more frequently compared to the Euclidean distance. Default arguments can
come in handy in such situations:
The parameter metric has 'manhattan' as the default value. Let us try calling the function
without passing any argument to the metric parameter:
1 print(distance(3, 4))
This gives 7 as the output. Since no value was provided in the function call, the default value of
'manhattan' was assigned to the metric parameter. In the code base, wherever the Manhattan
distance is invoked, we can just replace it with the function call distance(x, y) .
Parameters that are assigned a value in the function definition are called default parameters.
Default parameters always come at the end of the parameter list in a function definition.
The argument corresponding to a default parameter is optional in a function call.
An argument corresponding to a default parameter can be passed as a positional argument
or as a keyword argument.
The above code throws a SyntaxError with the following message: non-default argument
follows default argument . In the function definition, the default parameter must always come
at the end of the list of parameters. Now, for different ways of passing arguments in the presence
of default parameters:
1 distance(3, 4)
2 distance(3, 4, 'manhattan')
3 distance(3, 4, metric = 'manhattan')
All three function calls are equivalent. The first one uses default value of metric . The second call
explicitly passes 'manhattan' as the metric using a positional argument. The last call explicitly
passes 'manhattan' as a keyword argument.
Call by value
Consider the following code:
1 def double(x):
2 x = x * 2
3 return x
4
5 a = 4
6 print(f'before function call, a = {a}')
7 double(a)
8 print(f'after function call, a = {a}')
We see that the value of a is not disturbed by the function in any way. When the function call
double(a) is invoked, the value in a is assigned to the parameter x in the function. Arguments
are passed by assignment in Python, which means that something like x = a happens when
double(a) is invoked. This kind of a function call where the value in a variable is passed as
argument to the function is called call by value.
1 def square(x):
2 return x * x
3
4 x = 10
5 x_squared = square(x)
We are using the same name for both the parameter of the function square and the argument
passed to it. This is a bad practice. It is always preferable to differentiate the names of the
parameters from the names of the arguments that are passed in the function call. This avoids
confusion and makes code more readable. At this stage, you might be wondering how the
variable x inside the function is related to the variable x outside it. This issue will be taken up in
the next lesson on scopes. The above code could be rewritten as follows:
1 def square(num):
2 return num * num
3
4 x = 10
5 x_squared = square(x)
Home Lesson-4.4
Lesson-4.3
Lesson-4.3
Scope
Local vs Global
Examples
Namespaces
globals()
locals()
Scope and Namespaces
global keyword
Built-ins
Scope
Consider the following code:
1 def foo():
2 x = 1
3 print('This is a veritable fortress. None can enter here.')
4 print('\N{smirking face}')
5
6 foo()
7 print(x)
Why did the interpreter throw an an error in line-7? It tried to look for the name x and was
unable to find it. But isn't x present in the function foo ? Is the interpreter careless or are we
missing something? The interpreter is never wrong! The region in the code where a name can be
referenced is called its scope. If we try to reference a variable outside its scope, the interpreter
will throw a NameError .
Local vs Global
In the above example, the scope of the name x is local to the function; x has a meaningful
existence only inside the function and any attempt to access it from outside the function is going
to result in an error. Think about functions as black holes: they don't let variables (light) escape
the function's definition (event-horizon)! Let us take another example:
1 y = 10
2 def foo():
3 x = 1
4 print('I can access both x and y')
5 print(f'x = {x}, y = {y}')
6
7 foo()
The name y is accessible from within the function as well. We say that the scope of y is global.
That is, it can be referenced from anywhere within the program — even inside a function — after
it has been defined for the first time. There is a slight catch here: if another variable with the same
name is defined within the function, then things change. We will take up this case later.
At this stage, we are ready to formulate the rules for local and global variables [refer]:
Local: Whenever a variable is assigned a value anywhere within a function, its scope
becomes local to that function. In other words, whenever a variable appears on the left side
of an assignment statement anywhere within a function, it becomes a local variable.
Global: If a variable is only referenced inside a function and is never assigned a value inside
it, it is implicitly treated as a global variable.
The scope of the parameters in the function definition are local. The following code will throw a
NameError when executed:
1 def double(x):
2 x = x * 2
3 return x
4
5 double(2)
6 print(x)
Examples
Let us now look at few more examples that bring out some fine points regarding local and global
scope:
1 ### Variant-1
2 def foo():
3 x = 1
4 print('I can access both x and y')
5 print(f'x = {x}, y = {y}')
6
7 y = 10
8 foo()
Notice the difference between this code and the one at the beginning of the earlier section. Here,
the variable y is defined after the function definition, while in the earlier version y was defined
before the function definition. But both versions give the same output. All that matters is for y to
be defined before the function call. What happens if y is defined after foo is called?
1 ### Variant-2
2 def foo():
3 x = 1
4 print('I can access both x and y')
5 print(f'x = {x}, y = {y}')
6
7 foo()
8 y = 10
This throws a NameError at line-5, which is reasonable as y is not defined in the main program
before foo is called. The scope of y is still global; it can be referenced anywhere in the program
once it has been defined.
1 def foo():
2 x = 10
3 print(f'x inside foo = {x}')
4
5 x = 100
6 foo()
7 print(f'x outside foo = {x}')
We have the same name — x — appearing inside the function and outside the function. Are they
the same or different? Let us check the output:
1 x inside foo = 10
2 x outside foo = 100
They are different! The x inside foo is different from the x outside foo .
The scope of the name x inside foo is local; it is a local variable. This is because of the first
rule: a variable that is assigned a value inside the function becomes a local variable. Since x
is assigned a value in line-2, it becomes a local variable.
The scope of the x outside foo is global. Though there is another x inside the function
foo , that cannot be accessed outside the function.
This may start to get a little confusing. How does Python internally manage local and global
variables? For this, we will briefly turn to the concept of namespaces. This will give a different
perspective to the problem of name resolution.
Namespaces
Consider the following snippet of code:
1 x = 1.0
2 avar = 'cool'
3 def foo():
4 pass
We have used three different names here: x , avar and foo . The first two names represent
variables that store literals. The last name represents a function. How does the Python interpreter
internally process these names? It uses a concept called namespaces. A namespace can be
thought of as a lookup table — dictionary to be precise — that maps names to objects.
globals()
There are different types of namespaces. The variables that we define in the main program are
represented in the globals namespace. For example:
1 x = 1.0
2 avar = 'cool'
3 def foo():
4 y = 2.0
5
6 foo()
7 print(globals())
Ignore all the other details and just focus on the region highlighted in yellow. Notice that the
names x , avar and foo are present in the namespace. x and avar are mapped to the objects
1 and cool respectively, while foo is mapped to some complex looking object: <function foo
at 0x7f8ecd2aa1f0> . The number 0x7f8ecd2aa1f0 is the location in the memory where the
function's definition is stored [refer]. There is another way to check whether a given name is in a
namespace:
1 print('x' in globals())
2 print('avar' in globals())
3 print('foo' in globals())
locals()
Notice something interesting in the previous code, the name y is not found in the globals
namespace! We can verify this as follows:
1 print('y' in globals())
This results in False . Variables that are assigned a value inside a function are local to the
function and cannot be accessed outside it. How does the Python interpreter handle names
inside functions? It creates a separate namespace every time a function is called. This is called a
local namespace. Now, consider the following code:
1 def foo():
2 y = 2.0
3 print('Is y in locals?', 'y' in locals())
4
5 foo()
6 print('Is y in globals?', 'y' in globals())
1 Is y in locals? True
2 Is y in globals? False
1 def foo():
2 print(y)
3 print(locals())
4 x = 1
5 print(locals())
6
7 y = 10
8 foo()
Since y is only being referenced inside foo , it doesn't become a part of the local namespace. It
remains a global variable. Since x is being assigned a value inside foo , it is a local variable and
therefore enters the local namespace. The moment control exits the function, the namespace
corresponding to it is deleted.
Whenever the interpreter comes across a name in a function it sticks to the following protocol:
First peep into the local namespace created for that function call to see if the name is
present in it. If it is present, then go ahead and use the value that this variable points to in
the local namespace.
If it is not present, then look at the global namespace. If it is present in the global
namespace, then use the value corresponding to this name.
If it is not present in the global namespace, then look into the built-in namespace. We will
come back to the built-in namespace right at the end.
If it is not present in any of these namespaces, then raise a NameError .
The following image captures this idea. The built-in namespace has been ignored for now.
Refer to the last section to get the complete image.
With this context, let us revisit the problem that we looked at the end of the first section:
1 def foo():
2 x = 10
3 print(f'x inside foo = {x}')
4
5 x = 100
6 foo()
7 print(f'x outside foo = {x}')
When the function is called at line-6, the interpreter creates a local namespace for foo . At line-2,
x becomes a part of this namespace. When x is referenced at line-3, the interpreter first looks at
the local namespace for foo . Since x is present there, it is going to use the value corresponding
to it - in this case 10 . Once control exits the function, the local namespace corresponding to it is
deleted. At line-7, the interpreter will replace the name x with the value 100 which is present in
the global namespace.
global keyword
Let us revisit the scope rules:
Local: Whenever a variable is assigned a value anywhere within a function, its scope
becomes local to that function. In other words, whenever a variable appears on the left side
of an assignment statement anywhere within a function, it becomes a local variable.
Global: If a variable is only referenced inside a function and is never assigned a value inside
it, it is implicitly treated as a global variable.
1 def foo():
2 print(x)
3 x = x + 1
4
5 x = 10
6 foo()
When the above code is executed, we get the following error: UnboundLocalError: local
variable 'x' referenced before assignment [refer]. This code violates the first rule. x is
being assigned a value in line-3 of the function; hence it becomes a local variable. At line-2 we are
trying to reference a value that is yet to be defined. Note that the assignment statement in line-5
doesn't count as the x there is not local to foo , but is a global variable.
But what if we want to reuse the global variable x inside the function foo ? Python provides a
keyword called global for this purpose:
1 def foo():
2 global x
3 print(f'x inside foo = {x}')
4 x = x + 1
5 print(f'x inside foo = {x}')
6
7 x = 10
8 print(f'x outside foo = {x}')
9 foo()
1 x outside foo = 10
2 x inside foo = 10
3 x inside foo = 11
By declaring x to be global inside foo , a new local variable x is not created even though it
appears to the left of an assignment statement in line-4.
Built-ins
So far we have been freely using built-in functions like print , int , input and so on. At some
level, these are also names in Python and these also get resolved during run-time. There is a
separate namespace called builtins where these functions are defined.
If the above code is executed, we don't get an error! This is somewhat surprising. But syntactically,
there is nothing wrong here. But we will get into serious problems when we try to do the
following:
This will throw a TypeError . The name print has been hijacked and is being used as an int
variable. How does Python allow this to happen?
When resolving names, the built-in namespace is the last stage in the interpreter's journey.
Syntactically, nothing prevents us from using the name of a built-in function, such as print , as
the name of a variable. But this is a very bad practice that should be avoided at any cost!
Home Lesson-5.1
Lesson-4.4
Lesson-4.4
Function calling Function
Recursion
Caution in Recursion
Fibonacci series
Counting Function Calls
Turtles all the way down
1 def first():
2 second()
3 print('first')
4
5 def second():
6 third()
7 print('second')
8
9 def third():
10 print('third')
11
12 first()
1 third
2 second
3 first
We have already seen that a function can be called from inside another function. In the code
snippet given above, we have a slightly complex version. Let us try to understand this visually.
This method of visualization is novel and is called the traffic-signal method. You will see why it
has been christened this way.
Consider a simple function which doesn't call any other function within its body. Most of the
functions we have seen so far are like this. The call corresponding to this function could be in one
of these two states: ongoing or completed.
Ongoing if the control is inside the body of the function, executing one of its lines.
Completed if all the lines in the body of the function have been executed and control has
exited out of the function, either because a return statement was encountered or because
the control reached the last line in the function, in which case None is returned by default.
A function which calls another function inside it could find itself in one of the three states:
ongoing, suspended or completed. They are color coded as follows. Now you see why it is called
the traffic-signal theory:
Ongoing and completed have the same meaning. To understand the suspended state, consider
the following diagrams that correspond to the code given above:
Each column here is called a stack. They all represent the same stack at different instants of time,
i.e., the columns here show the state of the stack at three different time instants. The horizontal
arrow shows the passage of time. The vertical arrow indicates that each new function call gets
added onto the top of the stack.
As third() doesn't call any other function, it never enters the suspended state. Line-10 is the
first print statement to be executed; this is why we see third as the first entry in the output. The
job of the function third is done and it turns red. Now, the call transfers to the most recent
suspended function - second . The execution of second resumes from the point where it got
suspended; the print statement at line-7 is executed following which second turns red. Finally,
control transfers to first , the print statement at line-3 is executed and first turns red.
Recursion
A recursive function is one which calls itself inside the body of the function. A typical example of
recursion is the factorial function:
1 def fact(n):
2 if n == 0:
3 return 1
4 return n * fact(n - 1)
In the fact function given above, when the interpreter comes to line-4, it sees a recursive call to
fact . In such a case, it suspends or temporarily halts the execution of fact(n) and starts
executing fact(n - 1) . Let us take a concrete example. This is what happens when fact(4) is
called:
When fact(0) is called, there are no more recursive calls. This is because, the condition in line-2
evaluates to True and the value 1 is returned. This condition is called the base-case of the
recursion. In the absence of a base-case, the recursion continues indefinitely and never
terminates.
Once the base-case kicks in, fact(0) is done with its duty. So, the call transfers to the most
recent suspended function. On the stack, we see that this is fact(1) . fact(1) now becomes
active. When it returns the value 1 , its life comes to an end, so the control transfers to the most
recent suspended function, which is fact(2) . This goes on until we reach fact(4) . When
fact(4) returns the value 24 , all calls have been completed and we are done!
Caution in Recursion
This section discusses some finer aspects of recursion.
Fibonacci series
Let us take another popular example, the Fibonacci series:
Each term in this series is obtained by summing the two terms immediately to its left. We can
mathematically express this as follows. If , then for all , we have the
following recurrence relation:
We can now compute the term of the Fibonacci series using a recursive function:
1 def fibo(n):
2 if n == 1 or n == 2:
3 return 1
4 return fibo(n - 1) + fibo(n - 2)
Now, try calling fibo(40) . You will notice that it takes a very long time to compute the value. Why
does this happen? This is because a lot of wasteful computation happens. Let us see why:
This is a different representation of the recursive computation and is called a recursion tree.
Notice how some function calls appear multiple times. fibo(3) and fibo(1) are being
computed twice, fibo(2) is being computed thrice. For a larger value of n such as 50 , there
would be even more wasteful computation.
Practically, how can we estimate the time that it takes for this program to run? One way would be
to sit in front of the computer with a stopwatch in hand. But that is so un-Pythonic. Thankfully, the
time library provides a good solution to this problem:
1 import time
2
3 def fibo(n):
4 if n == 1 or n == 2:
5 return 1
6 return fibo(n - 1) + fibo(n - 2)
7
8 start = time.time()
9 fibo(40)
10 end = time.time()
11 print(f'It took approximately {round(end - start)} seconds.')
In a standard Python repl, it takes almost a minute! Coming back to the problem of Fibonacci
series, we see that naive recursion doesn't give us an efficient solution. We can instead look at the
following iterative solution:
1 import time
2
3 def fibo(n):
4 if n == 1 or n == 2:
5 return 1
6 x_prev, x_curr = 1, 1
7 while n > 2:
8 x_prev, x_curr = x_curr, x_prev + x_curr
9 n -= 1
10 return x_curr
11
12 start = time.time()
13 fibo(40)
14 end = time.time()
15 print(f'It took approximately {round(end - start)} seconds.')
Line-8 in the above code may be a little confusing. This is nothing but multiple assignment in the
same line done simultaneously. The RHS of the assignment statement will be evaluated first,
these two values will then be simultaneously assigned to their respective containers on the LHS. A
better and more accurate explanation will be given in the next chapter when we discuss tuples.
1 def fact(n):
2 global count
3 count = count + 1
4 if n == 0:
5 return 1
6 return n * fact(n - 1)
7
8 count = 0
9 fact(4)
10 print(count)
When the above function is called with foo() , we get a RecursionError with the following
message: maximum recursion depth exceeded . The limit is usually set to 1000 in most systems,
i.e., If there are more than 1000 recursive calls, then that is going to result in this error. To verify
what the limit is, you can run the following code:
1 import sys
2 print(sys.getrecursionlimit())
Home Lesson-5.2
Lesson-5.1
Lesson-5.1
Lists
Introduction
Iterating through lists
Growing a list
Operations on Lists
Useful Functions
Lists
Introduction
A list in Python is a data structure that is used to store a sequence of objects. Some examples are
given below:
1 numbers = [1, 2, 3, 4, 5]
2 letters = ['a', 'b', 'c', 'd']
3 words = ['this', 'is', 'a', 'list']
Lists can be printed, just like the other types we have seen so far. print(numbers) will give the
following output:
1 [1, 2, 3, 4, 5]
Lists could contain objects of different types. Python permits lists such as this:
Lists have a separate type - list . We can also check if a given variable holds an object of type
list :
1 numbers = [1, 2, 3]
2 print(type(numbers))
3 print(isinstance(numbers, list))
The len function can be used to find the number of elements in a list:
1 numbers = [1, 2, 3]
2 print(f'This list has {len(numbers)} elements in it')
Lists support indexing and slicing. These two operations work exactly the same way as they did
for strings:
1 numbers = [1, 2, 3, 4]
2 print(numbers[0], numbers[1], numbers[2], numbers[3])
3 print(numbers[1 : 3])
4 print(numbers[-2])
1 # Method-1
2 numbers = [1, 2, 3, 4]
3 for num in numbers:
4 print(num)
The loop variable — num — picks one item at a time from the sequence. In the body of the loop, we
are just printing this item. We can rewrite the code given above using a while loop:
1 # Method-2
2 numbers = [1, 2, 3, 4]
3 index = 0
4 while index < len(numbers):
5 print(numbers[index])
6 index += 1
Finally, we can also use the for loop to iterate through the indices of the list. For this, we take the
help of the range function.
1 # Method-3
2 numbers = [1, 2, 3, 4]
3 for index in range(len(numbers)):
4 print(numbers[index])
In the example given above, len(numbers) is equal to 4 . So, the range sequence will be 0, 1, 2,
3 . index is the loop variable that iterates through this sequence.
Methods 2 and 3 are very similar. Both iterate through the sequence of indices, and use list indexing
to access the corresponding element in the list. The only difference is that method-2 uses while ,
while method-3 uses for . Method-1 stands out from the other two as it directly pulls elements from
the sequence.
Growing a list
Lists are typically used in problems where we wish to store a collection of items. Usually, we start with
an empty list. Python provides two ways to create an empty list:
1 list1 = []
2 list2 = list()
Both list1 and list2 are empty lists. The interpreter doesn't mind spaces between the opening
and closing braces, so list1 = [ ] also works. Given an empty list, how do we add items to it?
Python provides two ways to do this:
Both lists end up having just the one element. The first method is called list concatenation, i.e., two
lists are being concatenated or combined together. Treat concatenation like joining two
compartments of a train together. It is very similar to string concatenation. The second way uses a
method called append that is essentially a function defined for the list type. Append adds
elements at the end of the list.
Generate the list of positive integers less than 100 that are divisible by 3.
There are at least two ways of doing this. The first one uses while :
1 # Method-1
2 num = 3
3 nums_div = []
4 while num < 100:
5 nums_div.append(num)
6 num += 3
1 # Method-2
2 nums_div = []
3 for num in range(3, 100, 3):
4 nums_div.append(num)
Operations on Lists
We have already seen how the + operator works with lists:
1 list1 = [1, 2, 3]
2 list2 = [4, 5, 6]
3 list12 = list1 + list2
4 print(list12)
5 list21 = list2 + list1
6 print(list21)
1 [1, 2, 3, 4, 5, 6]
2 [4, 5, 6, 1, 2, 3]
The order matters when two lists are being concatenated! The next is the * operator:
1 list1 = [0] * 5
2 print(list1)
3 list2 = [1, 2, 3] * 3
4 print(list2)
1 [0, 0, 0, 0, 0]
2 [1, 2, 3, 1, 2, 3, 1, 2, 3]
Two lists are equal if they have the same sequence of elements:
1 l1 = [1, 2, 3]
2 l2 = [1, 2, 3]
3 l3 = [3, 2, 1]
4 print(l1 == l2)
5 print(l2 == l3)
1 True
2 False
Finally, two lists can be compared with the > or the < operator. List comparison works very similar
to string comparison, in that it uses lexicographic ordering. We looked at this in the first chapter:
Lexicographic ordering
First element from both lists are compared. If they differ this determines the outcome of the
comparison. If they are equal, then the second element of both the lists are compared. This
process continues until either list is exhausted.
Some example comparisons:
Useful Functions
Let us look at some built-in functions that operate on lists:
sum : this is used to find the sum of the elements in a list of numbers:
1 a = [1, 2, 3]
2 print(sum(a))
max and min : these two functions find the maximum and minimum value in a list respectively.
1 a = [1, 2, 3]
2 print(min(a), max(a))
What happens if a is a list of strings? What would max(a) and min(a) produce?
1 a = [2, 1, 3]
2 print(sorted(a))
We have come across the range object and seen how useful it was in iterating through a sequence.
So far range has been associated with the for loop. Its time has come to break out of the loopy
prison:
1 numbers = range(10)
2 print(numbers)
This gives range(0, 10) as an output. This is a sequence that we can iterate over. Python provides a
way of turning this object into a list:
1 numbers = list(range(10))
2 print(numbers)
Home Lesson-5.3
Lesson-5.2
Lesson-5.2
Lists
Mutability
Call by reference
Lists
Mutability
Consider the following problem:
Assume that you work at a company that analyzes cricket matches. As a part of the data
collection process in the IPL, the data-processing team is tasked with recording the runs scored
in every ball in every match. It is your colleague's turn to do the bookkeeping for the final match
between CSK and MI. Just before the start, the "0" key on his keyboard stops functioning. As a
workaround, you cleverly suggest that he use the letter "O" instead of 0. Once the match is over,
you collect the list of runs scored. Write a program that replaces all appearances of the letter
"O" with the number 0. I leave it to your imagination to decide who won the finals!
Solution
1 runs = [1, 4, 2, 'O', 4, 'O'] # the data for one over is given here
2 print(runs)
3 for i in range(len(runs)):
4 if runs[i] == 'O':
5 runs[i] = 0
6 print(runs)
The most interesting line is the fifth one: runs[i] = 0 . We are updating a list in-place. Python
permits this operation because lists are mutable. Contrast this with strings that are immutable,
which means that they cannot be updated in-place. Mutability makes lists powerful; but reckless
exercise of power always results in instability as is demonstrated by this notorious example:
1 list1 = [1, 2, 3]
2 list2 = list1
3 list2[0] = 100
4 print(list1)
5 print(list2)
Both give the same output even though we are only modifying list2 in-place!
1 [100, 2, 3]
2 [100, 2, 3]
What is happening here? To understand this, we will take the help of a built-in function called id .
Every object in Python has a unique identity: if x is an object, then id(x) returns this object's
identity. From the Python documentation, "this is guaranteed to be unique among simultaneously
existing objects". In the implementation of the Python that we use, this unique id is nothing but the
object's memory address.
In line-2, we are not creating a new object. We are merely creating another name, also called an alias,
for the same object. Think of this like having a nickname. Your name and nickname are two different
words, but both of them refer to you. To see if two Python names point to the same object, we can
use the is keyword:
1 list1 = [1, 2, 3]
2 list2 = list1
3 list2[0] = 100
4 print(list1 is list2)
1 True
2 False
This because equality and identity are two different things. In the code, line-3 checks for equality of
two lists, line-4 checks if the two lists point to the same object. list1 and list2 point to two
different objects and consequently have different identities. But, they store the same sequence of
items and are hence equal.
How do we create a copy of a list so that updating one doesn't end up changing both? Python
provides three ways to do this:
In line-2, we pass list1 as an argument to the list function which returns a new list object
with the same sequence of elements as list1 .
In line-3, we are slicing the list. Slicing a list results in a new list object. As no start or stop
values are mentioned, they are going to default to 0 and len(list1) respectively. So, the
entire list is returned. However, it is a brand new object.
In line-4, we use a method call copy that is defined for the list object.
Lines 10 and 11 verify that the methods used to copy lists in lines 2, 3 and 4 actually work.
Call by reference
Mutability impacts the way lists are handled in functions. Consider these two snippets:
1 # Snippet-1
2 def foo():
3 L.append(1)
4
5 L = [0]
6 print(f'L before: {L}')
7 foo()
8 print(f'L after: {L}')
Snippet-1 doesn't have any parameters. Since L is not being assigned a new value inside foo , the
scope of L remains global.
1 # Snippet-2
2 def foo(L_foo):
3 L_foo.append(1)
4 print(L is L_foo)
5
6 L = [0]
7 print(f'L before: {L}')
8 foo(L)
9 print(f'L after: {L}')
Snippet-2 has L_foo as a parameter whose scope is local to foo . But note that modifying L_foo
within the function changes L outside the function. This is because, L_foo and L point to the same
object. How did this aliasing happen? The function call at line-8 works something like an assignment
statement: L_foo = L , so L_foo is just another name that refers to the object that L is bound to.
This type of function call where a reference to an object is passed is termed call by reference.
Whenever a mutable variable is passed as an argument to a function, the references to the
corresponding object are passed.
If all this seems too complicated, just remember that modifying mutable objects within a function
produces side effects outside the function. What if we don't want these side effects? We have to
create a new list object like we did before:
1 def foo(L_foo):
2 L_foo.append(1)
3 print(L is L_foo)
4
5 L = [0]
6 print(f'L before: {L}')
7 foo(list(L))
8 print(f'L after: {L}')
foo doesn't produce any side effects. Line-7 could be replaced with foo(L[:]) or foo(L.copy()) .
Home Lesson-5.4
Lesson-5.3
Lesson-5.3
Lists
Simulating an IPL Innings
Lists
Simulating an IPL Innings
Let us return to the problem of recording the number of runs scored in every ball of an IPL match. A
typical innings of a T20 match has 20 overs, each over having 6 balls. Let us assume that all balls
bowled are fair deliveries that do not concede any extras, a rather liberal assumption. This leaves us
with exactly 120 numbers that we need to record, all lying between 0 and 6. How can this information
be stored in a Python program that makes it suitable for further processing? A list is a good
candidate.
Let us now simulate an innings. For this, we take the help of the random library:
1 import random
2 runs = random.choices([0, 1, 2, 3, 4, 5, 6], k = 120)
3 print(type(runs))
4 print(len(runs))
choices is a function in the random library. It uniformly samples from the seven numbers (0 to 6)
given in the input list with replacement. If that sounded too cryptic, this is what it does:
Pick a number from the list [0, 1, 2, 3, 4, 5, 6] at random. Each of the seven numbers is
equally likely to be picked.
Add this to the output list. The original list remains undisturbed, i.e., we are not moving an
element from the input list to the output list, we are only copying it.
Repeat this process 120 times.
1 0 appears 19 times
2 1 appears 20 times
3 2 appears 19 times
4 3 appears 16 times
5 4 appears 18 times
6 5 appears 11 times
7 6 appears 17 times
The counts are quite close. But this is not very practical:
1 import random
2 # choices is distributed over multiple lines
3 # this is done to improve readability
4 runs = random.choices([0, 1, 2, 3, 4, 5, 6],
5 weights = [30, 30, 20, 5, 10, 0, 5],
6 k = 120)
7 for run in [0, 1, 2, 3, 4, 5, 6]:
8 print('{} appears {} times'.format(run, runs.count(run)))
9 print(f'Total number of runs scored = {sum(runs)}')
1 0 appears 32 times
2 1 appears 34 times
3 2 appears 32 times
4 3 appears 7 times
5 4 appears 12 times
6 5 appears 0 times
7 6 appears 3 times
8 Total number of runs scored = 185
We have used sum(runs) to get the sum of the elements in the list. sum is a built-in function. The
way to understand the weights keyword-argument is using the following table:
Run Weight
0 30
1 30
2 20
3 5
4 10
5 0
6 5
Total 100
The weight is the importance given to a run. From the table given above, we see that 0 and 1 occur
30% of the times, 6 occurs 5% of the times and so on. choices function will keep this distribution in
mind while picking up items from the input-list.
Let us now start analyzing this innings. We have already seen how to count the number of
occurrences of singles, doubles, fours and sixes. What about the first occurrence of a six? In which
ball was the first six scored?
1 first_six_ball = runs.index(6) + 1
2 print(first_six_ball)
index is a method that accepts an element as input and returns the first occurrence of this element
in the list. For example, runs.index(6) returns the first index where a six occurs in the list runs .
Since the number of balls is one more than the index, 1 has been added. What happens if we pass
an input that is not present in the list:
1 first_five_ball = runs.index(5)
2 print(first_five_ball)
In this case, 5 never occurs in the list. So this throws a ValueError with the following message: 5 is
not in list . One must be careful while using the index method. We could have done this using
another method:
The enumerate object can be very handy when we want to access both the element and its index
while iterating through a list. The enumerate object yields pairs: (index, list[index]) . In some
sense, we have two loop variables: the first is the index of the element in the list while the second is
the element itself. Coming back to cricket, what if we want to find the number of balls it took to score
the last 50 runs in the innings? It would be easier to reverse the list and then iterate through it:
1 balls = 0
2 last_runs = 0
3 for run in reversed(runs):
4 last_runs += run
5 balls += 1
6 if last_runs >= 50:
7 print(f'It took {balls} balls to score the last 50 runs.')
8 break
The reversed object helps us iterate through the list in the reversed order. Note that it doesn't make
any changes to the original list. One final question: we wish to find if the batsmen have run three runs
at any point in the match. We don't want to know at which point in the innings this has happened.
1 three_existence = 3 in runs
2 print(three_existence)
Recall that we used the in keyword to check for the presence of one string in another. Something
similar is happening here. The code given above prints True if 3 is an element in runs and False
otherwise.
Home Lesson-5.5
Lesson-5.4
Lesson-5.4
Lists
List Methods
insert
pop
reverse
sort
remove
Stack
Queue
Strings and Lists
split
join
Lists
List Methods
insert
We have looked at list methods like append , count and index so far. There are some more
interesting methods that will come in handy. insert can be used to insert an element in a list at
a given position:
1 L = [1, 1, 2, 3, 8]
2 L.insert(4, 5)
3 print(L)
list.insert(index, object) inserts the object before index in the list . In the code given
above, the element 5 is inserted before the index 4 in the list L . Let us try a few more inserts:
pop
L.pop(index) removes the element at index in L and returns it. If no argument is provided to
pop , index defaults to -1. index is thus a default argument for the method pop . A default value
of -1 means that the last element in the list is removed. To see this an action, execute the
following code:
reverse
1 L = [1, 2, 3, 4, 5]
2 print('Before:', L, id(L))
3 L.reverse()
4 print('After:', L, id(L))
It is called in-place because the list before and after have the same id , i.e., they correspond to
the same object. One must be careful while using methods that perform operations in-place. A
common error is to do something like this:
1 L = [1, 2, 3, 4, 5]
2 L = L.reverse()
3 print(L)
This prints None , which is expected as reverse doesn't return a list. But sometimes, one may
want to hold on to the original copy as well as its reverse. In such cases, we could do the
following:
1 L = [1, 2, 3, 4, 5]
2 L_reversed = L.copy()
3 L_reversed.reverse()
4 print('Original list:', L)
5 print('Reversed list:', L_reversed)
sort
1 L = [2, 1, 5, 6, 4, 3]
2 print('Before', L)
3 L.sort()
4 print('After', L)
Though this appears to be such a simple function to call, sorting is a non-trivial algorithm. We will
be studying various algorithms to sort a sequence of items in the next course on data structures
and algorithms.
remove
1 L = [1, 2, 3, 4, 5] * 2
2 print('Before', L)
3 L.remove(1)
4 print('After', L)
L.remove(x) removes the first (leftmost) occurrence of the element x in the list L . Trying to
remove an element that is not there in the list will raise a ValueError with the message
list.remove(x): x not in list . A safe way to remove items is as follows:
Stack
A list along with the methods append and pop simulate a data structure called stack. A stack is a
storage mechanism where the last item added to it is the first item to be removed. This is
analogous to a stack of books. The topmost book in the stack is the most recent addition. When
we want to remove books from this stack, the topmost book is the first to be removed. There is a
catchy mnemonic for this, LIFO: Last In First Out.
1 # Start with an empty stack
2 stack = [ ]
3 # Append items to end of the stack; also called a push operation
4 stack.append('Harry Potter and the Philosopher\'s Stone')
5 stack.append('Harry Potter and the Chamber of Secrets')
6 # State of the stack
7 print(stack)
8 # Remove items from the end of the stack; also called a pop operation
9 stack.pop()
10 # State of the stack
11 print(stack)
Queue
A list along with the methods insert and pop simulate a data structure called queue. A queue is
a storage mechanism where the first item added to it is the first to be removed. This is analogous
to any queue that we encounter in real life, say at a billing counter. The first person to stand in the
queue, is the first to be served, and naturally the first to exit the queue. The mnemonic for this is
FIFO: First in First Out.
Lists make a frequent appearance while processing strings. Consider the following problem:
Accept a sentence as input and find the number of words in it. Assume that it is a simple
sentence with a single space separating consecutive words. There are no other punctuation
marks in the sentence.
Solution-1
1 sentence = 'this sentence is false' # a simple sentence
2 count = 1
3 for char in sentence:
4 if char == ' ':
5 count += 1
6 print(count)
We just scanned the sentence character by character and checked the number of spaces. The
total number of words is one more than the number of spaces. As an aside, the sentence that we
are dealing with is an example of a paradoxical statement. It can't be true or false: if it is true then
it is false, if it is false then it is true! Back to Python, we shall look at the solution that uses lists.
Solution-2
split is a string method that splits a string along a delimiter. A delimiter string is one or more
characters that specify where to split the string. The output of the split operation is a list of
strings that are split along the delimiter. If we print the list words , we get the following list:
['this', 'sentence', 'is', 'false'] . Let us take another example:
1 comma_words = 'one,two,three,four'
2 numbers = comma_words.split(',')
3 print(numbers)
We get ['one', 'two', 'three', 'four'] as the output. Note that we have specified ',' as
the delimiter. The delimiter is not limited to characters, it can be any string. For example:
1 some_string = 'allISwell'
2 words = some_string.split('IS')
3 print(words)
join
Just as we went from a string to a list, we can also move from a list of strings to a string. Consider
the following problem:
Solution-1
1 print(sentence[-1])
It is not the letter e but a space. We ended up printing an extra space at the end. This might
seem trivial, but programming is all about precision. A better solution is given below:
Solution-2
This is more accurate. But it seems clumsy as we had to iterate from the second word in the list.
The final solution uses a simple method and is quite sophisticated.
Solution-3
Isn't that a thing of beauty! Just as split chops a string along a delimiter, join stitches together
the strings in a list, and the thread it uses is a space in this case. We could also stitch them
together using any other string, let us use a comma instead:
This output is one,two,three . The stitching seems too tight. Let us give it some space:
Notice the space after the comma. The output is one, two, three .
Home Lesson-5.6
Lesson-5.5
Lesson-5.5
Lists
Nested Lists
Matrices
Shallow and Deep Copy
Lists
Nested Lists
Recall the runs list that we generated with the help of the random library:
1 import random
2 runs = random.choices([0, 1, 2, 3, 4, 5, 6],
3 weights = [30, 30, 20, 5, 10, 0, 5],
4 k = 120)
5 assert len(runs) == 120
An assert statement is used whenever we wish to verify if some aspect of our code is working as
intended. For example, in line-5 of the code given above, we are making sure that the length of
the list is 120 . This is a useful check to have as subsequent computation will depend upon this. If
the conditional expression following the assert keyword is True , then control transfers to the
next line. If it is False , the interpreter raises an AssertionError .
1 overs = list()
2 new_over = list()
3 for ball, run in enumerate(runs):
4 new_over.append(run)
5 if (ball + 1) % 6 == 0:
6 overs.append(new_over)
7 new_over = list()
overs is a nested list, which is nothing but a list of lists. Each element in overs corresponds to
an over in the match and is represented by a list that contains the runs scored in that over. The
following code does a quick check if the sizes of the outer and inner lists are 20 and 6 respectively.
1 assert len(overs) == 20
2 for over in overs:
3 assert len(over) == 6
With this representation in place, how many runs were scored in the fourth ball of the third over?
The first index corresponds to the outer list while the second index corresponds to the inner list.
If this is still confusing, print the following code to convince yourself:
1 third_over = overs[2]
2 print(third_over)
3 fourth_ball = third_over[3]
4 print(fourth_ball)
5 assert fourth_ball == overs[2][3]
Matrices
Matrices are 2D objects. We can represent them as nested lists. Let us first populate a
matrix of zeros:
1 mat = [ ]
2 for i in range(3):
3 mat.append([ ]) # we are appending an empty list
4 for _ in range(3):
5 mat[i].append(0)
6 print(mat)
This gives the following output:
Do you find anything odd in line-4? We have used _ as a loop variable. The inner-loop variable is
insignificant and never gets used anywhere. As a convention, we use the _ to represent such
variables whose sole purpose is to uphold the syntax of the language. Let us now construct
another matrix:
1 mat = [ ]
2 num = 1
3 for i in range(3):
4 mat.append([ ])
5 for _ in range(3):
6 mat[i].append(num)
7 num += 1
8 print(mat)
The code given above to construct this matrix could be written in the following manner as well:
1 mat = [ ]
2 num = 1
3 for _ in range(3):
4 row = [ ]
5 for _ in range(3):
6 row.append(num)
7 num += 1
8 mat.append(row)
9 print(mat)
We already know what will happen here. Lists are mutable. mat2 is just an alias for mat1 and
both point to the same object. Modifying any one of them will modify both. We also saw three
different methods to copy lists so that modifying one doesn't modify the other. Let us try one of
them:
1 mat2 = mat1.copy()
2 mat2.append([5, 6])
3 print(mat1)
4 print(mat2)
5 print(mat1 is mat2)
What is happening here? mat1 has also changed! Wasn't copy supposed to get rid of this
difficulty? We have a mutable object inside another mutable object. In such a case copy just does
a shallow copy; only a new outer-list object is produced. This means that the inner lists in mat1
and mat2 are still the same objects:
1 print(mat1[0] is mat2[0])
2 print(mat1[1] is mat2[1])
Both lines print True . In order to make a copy where both the inner and outer lists are new
objects, we turn to deepcopy:
Home Lesson-6.1
Lesson-5.6
Lesson-5.6
Tuples
Introduction
More on Tuples
Lists and Tuples
Packing and Unpacking
Tuples
Introduction
A tuple is an immutable sequence of values:
Tuples share a close resemblance to lists. They can be indexed and sliced just like lists:
1 print(family[0])
2 print(family[:2])
The main point of difference between lists and tuples is that tuples cannot be updated in-place
since they are immutable. So, the following operation will throw an error:
The interpreter throws a TypeError with the following message: TypeError: 'tuple' object
does not support item assignment . As a consequence, we cannot append or insert elements
into a tuple. Likewise, elements in a tuple cannot be deleted. count and index are the only two
methods which are defined for tuple and they carry the usual meaning:
1 numbers = (1, 2, 3, 1, 1)
2 print(numbers.count(1))
3 print(numbers.index(2))
We can iterate through a tuple using for :
Since tuples are immutable, they are passed by value in functions similar to other immutable
types such as strings and numbers. As for functions that operate on tuples, sum , max , min are
useful ones.
More on Tuples
A few more points on tuples.
1 i_am_single = (1, )
2 print(len(i_am_single))
3 print(isinstance(i_am_single, tuple))
Note the presence of a comma after the element. Let us see what happens if it is removed:
1 i_am_single = (1)
2 print(isinstance(i_am_single, int))
It is an integer!
1 a_list = [1, 2, 3]
2 a_tuple = tuple(a_list)
3 b_tuple = (1, 2, 3)
4 b_list = list(b_tuple)
1 1 in (1, 2, 3)
2 'hello' not in ('some', 'random', 'sequence')
We see that the id of the element inside the tuple remains unchanged. Thus the identities of the
sequence of objects that make up a tuple can never change, and the interpreter will never allow
that to change. If the objects inside the sequence are mutable — such as lists — then the values
that they hold might change, but they continue to retain their identities.
List Tuple
Mutable Immutable
L = [1, 2, 3] T = (1, 2, 3)
The partnership between lists and tuples is quite interesting and can be explored further with
another example.
Populate a list that contains all ordered pairs of positive integers whose product is 100. Note
that order matters: (2, 50) and (50, 2) are two different pairs.
Solution
1 pairs = [ ]
2 for a in range(1, 101):
3 for b in range(1, 101):
4 if a * b == 100:
5 pairs.append((a, b))
6 print(pairs)
pairs is a list of tuples. We could have stored each pair as a list. But a tuple is the better choice
here since the two elements in the pair have a well defined relationship and we don't want to
accidentally modify them.
1 T = 1, 2, 3
2 print(T)
3 print(isinstance(T, tuple))
At first sight, line-1 seems to be an error. We have seen multiple assignment on the same line,
perhaps we are two variables short on the LHS? But on execution, we see that there is no error. T
is in fact the tuple (1, 2, 3) . This is called tuple packing. The values 1 , 2 and 3 are packed
into a tuple. The reverse operation is called sequence unpacking:
1 x, y, z = T
2 print(x, y, z)
Here, the tuple T is unpacked into the corresponding variables x , y and z . This is the principle
behind multiple assignment. From the Python documentation, we have [refer]:
1 x, y, z = 1, 2, 3
In the line given above, the RHS is first packed into a tuple and the sequence is then unpacked
into the variables x , y and z . But why does the unpacking operation have the qualifier
sequence before it? This is because any sequence can be unpacked:
That's fun! The same operations are invoked when multiple values are returned from functions:
We see that x is a tuple. In the return statements at lines 3 and 4, the multiple values are packed
into tuples. So, the function is essentially returning a tuple.
Home Lesson-6.2
Lesson-6.1
Lesson-6.1
Dictionaries
Introduction
More Examples
More on Keys
Hash Tables
Iterating over Dictionaries
Growing a Dictionary
Mutability
Dictionaries
Introduction
Let us assume that we want to store the following information in Python:
Country Capital
Brazil Brasilia
Russia Moscow
China Beijing
A minor geographical observation: South Africa has three capitals; we have only mentioned the
legislative capital for convenience. A geopolitical point: these five countries form a part of a block
called BRICS [refer].
Coming back to Python, a dictionary is possibly the most interesting data structure offered by Python.
It is basically a look-up table. This is how we would store the details of the BRiCS nations and their
capitals:
1 brics = {
2 'Brazil': 'Brasilia',
3 'Russia': 'Moscow',
4 'India': 'New Delhi',
5 'China': 'Beijing',
6 'South Africa': 'Cape Town'
7 }
A dictionary is a collection of key-value pairs. In the code given above, brics is a dictionary. It has
countries mapped to their respective capitals. For instance, 'India' is mapped to 'New Delhi' .
Here, 'India' is the key and 'New Delhi' is the value. That is, the country is the key and its capital
is the value. A dictionary object is of type dict :
1 print(type(brics))
2 print(isinstance(brics, dict))
New key-value pairs can be added to a dictionary. Let us expand the horizons of our dictionary to
include countries outside the BRICS nations. It no longer makes sense to call this brics , so let us
create a new dictionary called globe which starts off as a copy of brics . Recall the copy method
that we used to copy lists. A similar method is defined for dictionaries:
1 brics = {
2 'Brazil': 'Brasilia',
3 'Russia': 'Moscow',
4 'India': 'New Delhi',
5 'China': 'Beijing',
6 'South Africa': 'Cape Town'
7 }
8 globe = brics.copy()
9 globe['Spain'] = 'Madrid'
Adding a new key-value pair is as simple as the statement given in line-9 of the code given above.
Keys of a dictionary are unique. This means that a dictionary cannot have two or more identical keys
mapped to different values. On the other hand, two different keys could have the same value. For
example:
1 some_dict = {'key_1': 0, 'key_2': 0}
Trying to access a key that is not present in the dictionary will result in a KeyError :
More Examples
The key of a dictionary can be any immutable object. There is a small catch here. We will return to this
constraint in the next section. Let us look at different combinations key-value pairs that are possible
beginning with the basic types: int, str, float, bool :
Next, we have dictionaries that have list and tuple as the type of their values:
Tuples can be keys, provided they don't contain any mutable objects within them:
Towards the end, we will look at an example where a tuple cannot be a key. Finally, the richness of
dictionaries comes out in the following example:
1 # mixed
2 report_card = {
3 'name': 'Ramanujan',
4 'age': 18,
5 'school': 'KV',
6 'marks': (75, 80, 60, 95, 100)
7 }
More on Keys
Earlier, it was mentioned that the keys of dictionaries have to be immutable. This statement is not
entirely accurate. In this section, we will explore why. What happens if we use a list as a key?
It throws a TypeError with the following message: unsashable type: 'list' . A list cannot be a key
in a dictionary; but the error message doesn't talk about immutability, instead it says that the list
type is unhashable. A more accurate statement about keys in a dictionary is given below:
To understand what we mean by the term hashable, we shall briefly look at the way Python
implements dictionaries. The following section on hash tables is a bit involved and can be skipped.
Hash Tables
Python dictionaries are implemented using a data structure called a hash table. It is best to think
about a hash table as a book-rack that has a number of rows. Picture the key-value pairs as books
that are going to be stored in these racks. To access a book, we need to know the row number in
which it is present. This is where the idea of a hash function comes in. The hash function is denoted
by and converts the key to the row number.
The hash function accepts a key as input and returns a value, , as output. This is called the hash
value. In our analogy, the hash value is synonymous with the rack number. Once we know the rack
number, the book (key-value) stored in it can be easily retrieved. The description is somewhat naive,
but you get the point.
Now, an object in Python is hashable if it has a hash value which never changes during its lifetime and
can be compared to other objects. Most of the immutable objects that we have seen so far are
hashable: int, float, str, bool . Mutable containers such as lists are not hashable. So, can we
just go back to the original definition and claim that all immutable objects can be used as keys in
dictionaries? No! Consider the following example:
1 ##### Alarm! Wrong code snippet #####
2 some_tuple = ([0, 1], [2, 3])
3 bad_dict = {some_tuple: 0}
4 ##### Alarm! Wrong code snippet #####
Though some_tuple is immutable, it contains a sequence of lists which are mutable. According to the
Python documentation, immutable containers are hashable only if their elements are hashable. So,
some_tuple is not hashable, and hence it cannot be used as a key! For a better explanation, check
out the docs.
squares.keys() returns a sequence of keys over which we can iterate. Python makes things even
more simple and lets us drop the keys method.
Growing a Dictionary
An empty dictionary can be defined in one of the following ways:
1 D1 = dict()
2 D1[0] = 1
3 D2 = { }
4 D2[0] = 1
Accept a list of words as input and create a dictionary that maps words to their lengths.
Solution
A piece of trivia: what is common among the words in the list words ?
Mutability
Like lists dictionaries are mutable objects. To see the mutability of dict objects in action, consider
the following code:
We see that dict_2 is alias of dict_1 and both point to the same object. If we want a new dict
object with the same contents as dict_1 , we could either use the copy method or the dict built-in
function:
The last line prints True which confirms that we have two different objects. So modifying one doesn't
affect the other. But note that copy only produces a shallow copy. As long as the values are
immutable, this doesn't matter. But if we have mutable values, then we have a problem:
Here, we see that the value corresponding to the key 'one' in both dictionaries gets affected. This is
because dict_1['one'] and dict_2['one'] are still the same object. This can be seen from the last
statement of the code given above. To set this right, we need to do a deepcopy:
1 from copy import deepcopy
2 dict_1 = {'one': [1], 'two': [1, 1], 'three': [1, 1, 1]}
3 dict_2 = deepcopy(dict_1)
4 dict_2['one'].append(100)
5 print(dict_1, dict_2)
6 print(dict_1 is not dict_2)
7 print(dict_1['one'] is not dict_2['one'])
Home Lesson-6.3
Lesson-6.2
Lesson-6.2
Text processing
Number of sentences
Number of words
Number of Unique Words
Frequent Words
Summary
Text processing
The following paragraph is an excerpt from a talk given by Guido. The full text can be found here.
In reality, programming languages are how programmers express and communicate ideas —
and the audience for those ideas is other programmers, not computers. The reason: the
computer can take care of itself, but programmers are always working with other programmers,
and poorly communicated ideas can cause expensive flops. In fact, ideas expressed in a
programming language also often reach the end users of the program — people who will never
read or even know about the program, but who nevertheless are affected by it.
Text processing plays an important role in analyzing text data. Given a piece of text, the following are
some of the basic questions that we can ask:
Are these meaningful questions to ask? Do they lead us anywhere? Yes, they do! Consider the task of
classifying articles. Some sample categories could be: lifestyle, science and technology, literature,
films. If we want to understand what category an article falls under, one way to go about it is to read
the entire article. We can do it for one or two articles, but what if we have to do this for hundreds of
them? A better solution would be to computationally process each article, find the top five most
common words and use that to get an idea of what the text is about.
Let's get started. The first task is to store the text as a string:
1 text = "In reality, programming languages are how programmers express and
communicate ideas — and the audience for those ideas is other programmers, not
computers. The reason: the computer can take care of itself, but programmers are
always working with other programmers, and poorly communicated ideas can cause
expensive flops. In fact, ideas expressed in a programming language also often
reach the end users of the program — people who will never read or even know
about the program, but who nevertheless are affected by it."
Number of sentences
Sentences could end with one of the following tokens: full stop, exclamation mark or question mark.
For simplicity, let us assume that all sentences in our text ends with a full stop. We can split the string
using full stop as a delimiter to get a list of sentences:
1 sentences = text.split('.')
2 # Prints one sentence in each line
3 for sentence in sentences:
4 print(sentence)
5 print(f'There are {len(sentences)} sentences in this text.')
Notice that there are only three sentences, but we get the output to be four in the last line. On closer
inspection, we see that sentences[-1] is not a sentence but an empty string. This is because, when a
string is split using a delimiter which is present in the string, two substrings get generated, one to the
left of the delimiter and the other to its right. As the full stop is the last character in the text, the
substring to its right is an empty string. One way to correct this is to remove all empty strings in
sentences :
Number of words
To get the number of words, we can split each sentence by space:
1 words = [ ]
2 for sentence in sentences:
3 words_ = sentence.split(' ') # words_ contains words in sentence
4 words.extend(words_) # words is the collection of all words
5 print(f'There are {len(words)} words in this text')
We get the number of words to be 86. Is that correct? wordcounter.net claims that there are 82 words
in this text. Something is wrong with our code. Let us print each word along with its index in separate
lines and see what we have:
1 for index, word in enumerate(words):
2 print(index, word)
1 11 —
2 23
3 49
4 67 —
Indices 11 and 67 are em dashes (—) while 23 and 49 correspond to empty strings. Since we have two
different characters to remove, let us clean up the list in the following way:
1 proc_words = [ ]
2 for word in words:
3 if not(word == '' or word == '—'):
4 proc_words.append(word)
5 print(f'There are {len(proc_words)} words in this text')
1 uniq_words = dict()
2 for word in proc_words:
3 if word not in uniq_words:
4 uniq_words[word] = 0
5 uniq_words[word] += 1
6 print(f'There are {len(uniq_words)} unique words in this text')
Let us now test if our code is working as expected. Upon manual inspection, the word "programmers"
occurs four times in the text. What does our dict have to say?
1 print(uniq_words['programmers'])
We get 2 as the output, another wrong answer! Programming doesn't seem like magic after all. We
are making mistakes far too often. Note that this is not the exception, but the norm. The nice part of
making mistakes is that they are almost always an opportunity to learn something. An error in the
code is hidden knowledge, it is some piece of insight that we are yet to unmask. Now, back to the
drawing board. Let us search for all entries in the list proc_words that have the substring
"programmers" in them:
1 for word in proc_words:
2 if 'programmers' in word:
3 print(word)
1 programmers
2 programmers,
3 programmers
4 programmers,
So, the problem is with the special character: comma. To confirm this:
Another problem is introduced by the capitalization of words, usually at the beginning of sentences.
Now that the problems have been identified, let us go ahead and fix them. This means going back to
the list of words and then generating proc_words in the right way:
1 proc_words = [ ]
2 for word_ in words:
3 word = word_.lower()
4 if not(word == '' or word == '—'):
5 if not word_.isalnum():
6 word = word_[:-1]
7 proc_words.append(word)
8 print(f'There are {len(proc_words)} words in this text')
Several things are happening here. In line-3, every word is converted to lower case. In line-4, em
dashes and empty strings are being ignored. Line-5 checks if a word contains a special character. If it
does, then it is unburdened of that dangling character in line-6. Here we assume that special
characters usually appear at the end of the word. In this text, there are two cases: "programmers,"
and "reason:". All processed words are finally added to proc_words in line-7. Now that we have
cleaned up proc_words , we can go back and generate unique_words :
1 uniq_words = dict()
2 for word in proc_words:
3 if word not in uniq_words:
4 uniq_words[word] = 0
5 uniq_words[word] += 1
6 print(f'There are {len(uniq_words)} unique words in this text')
Lovely! There are 58 unique words in the text. As a test, we can also see if the sum of the counts gives
back the total number of words:
1 total = 0
2 for word in uniq_words:
3 total += uniq_words[word]
4 assert total == len(proc_words)
Frequent Words
Finally, let us calculate the top three most frequently occurring words:
We see that "programmers" is the second most frequent word. First and third most frequent words
are "the" and "in" respectively. Such common words are called stop-words. If they are removed from
the text, "programmers" becomes the most frequent non-trivial word. So, without reading this text,
one can guess that it should be something about programmers, thanks to Python!
Summary
The main takeaway from this lesson is the kind of mistakes we made and the way we fixed each one
of them. In almost every problem, we started off with a solution, then tested it. We figured out that
something was wrong, so we went back and tried to fix the problem.
Home Lesson-6.4
Lesson-6.3
Lesson-6.3
Dictionaries
Pangrams and Dictionaries
Dictionary Methods
Dictionaries
Pangrams and Dictionaries
Assume that we wish to compute the following mapping between letters of the English alphabet and
numbers from 1 to 26:
Letter Number
a 1
b 2
... ...
z 26
Each letter in the alphabet is mapped to a unique number from 1 to 26. In the table given above, the
mapping is a simple linear mapping: a is mapped to 1 , b to 2 and so on. This mapping can be
computed in the most uninteresting and lousy way given below:
Let us try a round about but interesting way. Consider the following line:
This sentence is called a pangram. A pangram is a sentence that uses all the letters of the alphabet.
Does that ring any bell?
1 pangram = 'the quick brown fox jumps over the lazy dog'
2 words = pangram.split(' ') # get list of words in the sentence
3 letters = ''.join(words) # join the words back; eliminates spaces
4 sorted_letters = sorted(letters) # sort letters
5 mapping, count = dict(), 0
6 for letter in sorted_letters:
7 # check if letter is not present in dict
8 # to avoid counting same letter multiple times
9 if letter not in mapping:
10 count += 1
11 mapping[letter] = count # map the letter to count
12
13 for letter, count in mapping.items():
14 print(letter, count)
Plenty of things to learn from those 14 lines of code. Not all diversions are bad. Now that we have an
interesting dictionary in place, let us jump into some methods that are bundled along with dict .
Dictionary Methods
We have already seen keys and items . Both these are methods that return a view object over which
we can iterate. According to the Python documentation, "a view object provides a dynamic view on
the dictionary's entries, which means that when the dictionary changes, the view reflects these
changes."
1 keys = mapping.keys()
2 print(keys)
1 dict_keys(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'])
Using the list function, both the keys and items views can be converted into lists:
1 keys_list = list(mapping.keys())
2 print(keys)
3 items_list = list(mapping.items())
4 print(items)
1 ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p',
'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
2 [('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', 5), ('f', 6), ('g', 7), ('h', 8),
('i', 9), ('j', 10), ('k', 11), ('l', 12), ('m', 13), ('n', 14), ('o', 15),
('p', 16), ('q', 17), ('r', 18), ('s', 19), ('t', 20), ('u', 21), ('v', 22),
('w', 23), ('x', 24), ('y', 25), ('z', 26)]
keys_list is a list of keys in the dictionary mapping . items_list is a list of tuples, where each tuple
is a key-value pair. Another useful method is values . This returns a view on the values:
1 view = mapping.values()
2 view_list = list(view)
1 print('a' in mapping.keys())
2 print(1 in mapping.values())
3 print(('a', 1) in mapping.items())
All three return True . Membership tests for keys can be done in a simpler way:
1 print('a' in mapping)
2 print('x' in mapping)
3 print('ab' not in mapping)
Note that we dropped the keys method and it still worked! Now, to delete a key from a dictionary,
we use the familiar pop method:
If key is a key in a dictionary D , D.pop(key) removes the key key in D and returns the value
associated with it. Removing a key naturally removes the value associated with it. Dictionaries are
aristocratic data structures: keys are higher up in the hierarchy and values depend on the keys for
their existence.
Home Lesson-6.5
Lesson-6.4
Lesson-6.4
Dictionaries in Action: LMS
Assignment Model
Submission Model
Grader
As a user, your communicate with the frontend. The frontend is the website where you see all the
content displayed. When you make an action, say clicking the submit button in a graded
assignment, that action is fed to the backend as input. The backend processes this input and
returns some output to the frontend, which is then displayed as the outcome of your action.
Where does Python come into the picture? It features prominently in the backend.
So how do we expect grading to work? It needs two inputs. The assignment and the submission
corresponding to this assignment. It will return the result as output:
The function is incomplete. We need to decide how an assignment and its corresponding
submission are going to be modeled.
Assignment Model
Let us consider an assignment. It is essentially a list of problems. So, modeling an assignment
breaks down to modeling a problem. A problem could have the following attributes:
Attribute Type
id string
question string
type string
options list
answers tuple
marks float
For grading, we only need two attributes, the problem-id and the answers. With this, the
assignment model will look like the following. The entire assignment will now be a list of
dictionaries:
A point to note. A singleton tuple is represented as (<item>, ) . The comma cannot be ignored.
Coming back to the assignment model, we see that there are several attributes in the table that
haven't entered into the assignment dictionary since they are not relevant from the point of view
of grading. They have been mentioned so that it gives a better understanding of how assignments
can be modeled.
Submission Model
The submission model is slightly more involved. There are some global attributes like name of the
user, the user's roll number and the time of submission. And then there are local attributes like
the options selected for each problem.
Attribute Type
name string
roll_number string
timestamp string
problems list
1 submission = {
2 'name': 'Kapil Dev',
3 'roll_number': 'BSC1001',
4 'time': 'Sunday 18 April 2021 10:23:30 PM IST',
5 'problems': [
6 {'id': '10001', 'selected': (0, 1)},
7 {'id': '10002', 'selected': (1, )},
8 {'id': '10003', 'selected': (3, )}
9 ]
10 }
submission is a fairly complicated object. To begin with, it is a dictionary. The first three keys do
not pose any challenges. The value of the key 'problems' is a list of dictionaries! We could add
one more level of complexity. Since a user could make multiple submissions, we could have a list
of submissions! But for now, let us not complicate things any further.
Grader
The assignment is a list of dictionaries. While this is not a bad representation, the grader has to
search for the problem id through this list every time it has to grade a problem. Since the problem
id is unique, we can come up with a better representation for the assignment:
1 assignment_ = [
2 {'id': '10001', 'answers': (0, 1), 'marks': 2.0},
3 {'id': '10002', 'answers': (1, ), 'marks': 1.0 },
4 {'id': '10003', 'answers': (2, ), 'marks': 2.0}
5 ]
6 assignment = dict()
7 for problem in assignment_:
8 problem_id = problem['id']
9 answers = problem['answers']
10 marks = problem['marks']
11 assignment[problem_id] = {'answers': answers, 'marks': marks}
1 assignment = {
2 '10001': {
3 'answers': (0, 1),
4 'marks': 2.0
5 },
6 '10002': {
7 'answers': (1, ),
8 'marks': 1.0
9 },
10 '10003': {
11 'answers': (2, ),
12 'marks': 2.0
13 },
14 }
We are now ready to complete the grader using this new assignment model:
Home Lesson-7.1
Lesson-6.5
Lesson-6.5
Sets
Introduction
Iterating through Sets
Growing Sets
Set Operations
Other Set Methods
Mutability
Sets
Introduction
A set is an unordered collection with no duplicate elements [refer]. Unlike lists and tuples, there is
no notion of order in a set. This is why it is called an unordered collection as opposed to a
sequence. A set can be defined as follows:
Notice the similarity in syntax between sets and dictionaries. Both are enclosed within curly
braces. While a dictionary has key-value pairs in it, a set just has a collection of values. A set in
Python is a remarkably accurate representation of a mathematical set. Therefore, most of the
properties that you are used to seeing in mathematical sets nicely carry over to Python sets. This
connection is so strong that you can often forget that you are dealing with Python sets.
As stated before, sets do not support duplicate elements. We see that nums_1 and nums_2 are
equal sets. However, they don't point to the same object. Sets support membership just like lists,
tuples and dictionaries.
1 nums = {1, 2, 3, 4, 5}
2 print(1 in nums)
3 print(6 not in nums)
The number of elements in a set, which is the same as its cardinality, is given by the len function:
1 nums = {1, 2, 3, 4, 5}
2 print(f'Cardinality of nums is {len(nums)}')
Sets cannot be indexed. This is quite reasonable as they are not ordered collections. The following
code will throw an error:
Any hashable object can be added to sets. This means most of the immutable types such as int ,
float , str and tuple can be added to sets. A small caveat as far as tuples are concerned: a
tuple of lists is unhashable and therefore cannot be added to sets.
1 nums = {1, 2, 3, 4, 5}
2 for num in nums:
3 print(num)
Growing Sets
How do we define an empty set?
We see that empty_set is in fact an empty dictionary. Computers are precise machines, which
makes them very faithful. Few lessons back we used { } to initialize an empty dictionary. It
hasn't changed. { } is still an empty dictionary. So, how do we define an empty set then?
1 empty_set = set()
2 print(isinstance(empty_set, set))
Simple enough! With the empty set and set-iteration defined, we can now grow sets from scratch.
Note down the last digit of each of these powers. How many of them are unique? What are
these numbers?
This problem has a simple mathematical solution. But humor me and assume that you don't
know how to solve this problem. Let us go for a computational solution.
1 num = 1
2 digits = set()
3 for i in range(100):
4 num *= 7
5 last = num % 10
6 digits.add(last)
7 print(digits)
add is a method used to add elements to a set. The solution to this problem is a typical use case
of sets. When you expect duplicate elements to come up often and if you are not concerned with
duplicates, then sets are ideal objects for storage. The same problem can be solved using lists:
1 num = 1
2 digits = [ ]
3 for i in range(100):
4 num *= 7
5 last = num % 10
6 if last not in digits:
7 digits.append(last)
8 print(digits)
Set Operations
Mathematical sets are friendly objects. They routinely interact with each other through one of the
following operations:
Subset
Superset
Union
Intersection
Difference
Python sets strive to be as friendly as their mathematical counterparts. We will see how each of
these operations are represented:
Both lines return the value True . A set is a proper subset of if every element in is present
in and . It is denoted by . That is, there is at least one element in which is not in
:
1 A = {1, 2, 3}
2 B = {1, 2, 3}
3 print(A <= B) # method-1
4 print(A < B) # method-2
The A < B operator checks if A is a proper subset of B . In this case A is not a proper subset of
B , so the second print statement returns False .
1 A = {1, 3, 5}
2 B = {1, 2, 3, 4, 5}
3 B.issuperset(A) # method-1
4 print(B >= A) # method-2
Union: The union of two sets and is the set of elements that are present in either or
or both. It is denoted by .
1 A = {1, 3, 5}
2 B = {2, 4, 6}
3 C1 = A.union(B) # method-1
4 C2 = A | B # method-2
5 print(C1, C2)
6 print(C1 == C2)
1 A1, A2, A3, A4 = {1}, {2, 3}, {4, 5, 6}, {7, 8, 9, 10}
2 B1 = A1.union(A2, A3, A4) # method-1
3 B2 = A1 | A2 | A3 | A4 # method-2
4 print(B1, B2)
5 print(B1 == B2)
Intersection: The intersection of two sets and is the set of elements common to both. It
is denoted by .
1 A = {2, 4, 6}
2 B = {2, 4}
3 C1 = A.intersection(B) # method-1
4 C2 = A & B # method-2
5 print(C1, C2)
6 print(C1 == C2)
What happens if there are no elements in common? We should get the empty set:
1 even, odd = {2, 4, 6}, {1, 3, 5}
2 common = even & odd
3 assert common == set()
We have used an assert statement just to introduce some variation. As it doesn't raise an
AssertionError , we are right on target.
Difference: The difference between two sets and is the set of elements present in one
set but not in the other. It is denoted by or , and the two are not the same!
1 A = {1, 2, 3, 4}
2 B = {2, 4, 5}
3 C1 = A.difference(B) # method-1
4 C2 = A - B # method-2
5 print(C1, C2)
6 print(C1 == C2)
7 D1 = B.difference(A) # method-1
8 D2 = B - A # method-2
9 print(D1, D2)
10 print(D1 == D2)
To remove an element from the set, we can use the remove method:
If we try to remove an element that is not present in the set, the interpreter will throw a
KeyError :
Given a list L , extract all unique elements from it and store the result in another list,
L_uniq . The order of elements does not matter.
1 L = [1, 2, 3, 3, 4, 5, 6, 1, 2, 2]
2 S = set(L)
3 L_uniq = list(S)
4 print(L_uniq)
Passing a list to the set function removes all duplicates and returns the unique elements.
Mutability
Sets are mutable entities.
1 A = {1, 2, 3}
2 B = A
3 B.add(4)
4 print(A, B)
5 print(A is B)
A and B are the same objects. As before, there are two ways to do a shallow copy:
1 A = {1, 2, 3}
2 B1 = A.copy()
3 B2 = set(A)
4 B1.add(4)
5 B2.add(0)
6 print(A, B1, B2)
7 print(A is not B1)
8 print(A is not B2)
Home Lesson-7.2
File Handling
File Handling
Why files
File handling
Why files
The best way to motivate files is to take the human example. Consider our memory. There is a
certain volume of information that we can retain in our working memory. A popular claim is that
we can retain around seven chunks of information in our short-term memory. Anything that
exceeds this volume of information, we have to resort to external aids such as notebooks.
Something similar happens in computers. Modern day computers are quite powerful and can
retain several chunks of information at a time. Though computers are machines, the amount of
short-term memory that they possess is still finite. This is where the idea of external storage
comes in. Files are to computers what books are to humans. A file is used to record information in
a permanent location so that it can be retrieved as and when needed.
File handling
We are all used to opening files in our computers by simply double clicking on an icon. Let us take
the example of a simple file having the following contents:
1 Income Expenditure
2 12,000 10,000
3 50,000 45,000
4 75,000 35,000
5 14,000 12,000
6 60,000 40,000
This file has the income-expenditure details of a family for five months. We wish to create a new
file that has the savings details added as a third column. That is, we wish to generate the following
file:
1 Income Expenditure Savings
2 12,000 10,000 2,000
3 50,000 45,000 5,000
4 75,000 35,000 40,000
5 14,000 12,000 2,000
6 60,000 40,000 20,000
This seems like a simple task. Open this file, plug the numbers in the calculator, get the result and
paste it in a new column and we are done. But what if the number of entries in the file increases?
For example, let us say we wish to perform this operation for all families in the neighborhood. If
we have 10 years worth data for 1000 families, we are looking at 1000 * 10 * 12 = 120,000
entries! Our calculator will break down and so will we out of exhaustion.
This is where Python comes to our rescue. We can write a piece of code to automate the whole
process. And all it is going to take is a few lines of code! In the next few lessons, we will see how to
process files. We will learn the following operations:
File Handling
File Handling
Creating a file in Replit
Opening and reading from a file
Opening and writing to a file
Each file should be given a name. Let us call our file examples.txt . Now, we shall add the following
lines to the file:
1 one
2 two
3 three
4 four
5 five
After creating the file, this is how it should look in Replit when we click on examples.txt :
examples.txt is called a text file. We can identify this from the extension — txt that comes at the
end of files. Don't worry too much about the extension. It is enough if you know that different files
come with different extensions. In fact, main.py is itself a file with py as the extension. This is why it
gets listed along with examples.txt under the Files tab in Replit.
1 f = open('examples.txt', 'r')
2 for line in f:
3 print(line)
4 f.close()
file name
mode
The first argument is the file name, which is 'examples.txt' in our case. The second argument
corresponds to the mode in which we want to process the file. In this case, we want to read the file.
So, we open the file in read-mode. The single character 'r' is used to denote this mode. Notice that
both arguments passed to open are strings.
The open function returns a file object. Do no worry about the terminology as yet. We will discuss it
in detail in the next lesson. For now, it is enough to know that the open function returns a file object
that we have called f in our code.
In lines 2-3, we loop through each line in the file and print it. As simple as that. Finally, in line-4, we
close the file using the method close . It is a good practice to close the file once we are done with
processing it. Let us now see the output at the end of execution of this code block:
1 one
2
3 two
4
5 three
6
7 four
8
9 five
Seems interesting! We have all the contents of the file. But, for whatever reason, there is an extra line
appearing between successive lines in the file. To suppress these new lines, we have to modify our
print function slightly:
1 f = open('examples.txt', 'r')
2 for line in f:
3 print(line, end = '') # there is NO SPACE between the quotes
4 f.close()
Note the change in line-3. By default, print appends a newline character ( \n ) at the end of
whatever it is printing. By using end = '' , we are just appending the empty string. Therefore, the
extra line that was appearing in the output will no longer bother us when we execute the code we
have just written:
1 one
2 two
3 three
4 four
5 five
1 f = open('writing.txt', 'w')
2 f.write('one ')
3 f.write('two ')
4 f.write('three ')
5 f.write('four ')
6 f.write('five')
7 f.close()
Here, we have opened the file in write mode. When this code is executed, it creates a file in Replit
called writing.txt .
We have used what is called the write method to write to the file. We pass the content we wish to
write as a string argument to the method. Notice that, even though we have used the write method
to write five different words on five lines in the code, all of them get written to the same line in the
file. The way to tell the file object to go to a new line is using the \n character. Let us now, try the
following piece of code:
1 f = open('writing.txt', 'w')
2 f.write('one')
3 f.write('\n')
4 f.write('two')
5 f.write('\n')
6 f.write('three')
7 f.write('\n')
8 f.write('four')
9 f.write('\n')
10 f.write('five')
11 f.close()
A better way of achieving this in fewer lines of code is to append the \n character to every line of the
file we wish to write:
1 f = open('writing.txt', 'w')
2 f.write('one\n')
3 f.write('two\n')
4 f.write('three\n')
5 f.write('four\')
6 f.write('five')
7 f.close()
This results in the same file but with fewer lines of code! In the next lesson, we will take a closer look
at the idea of a file object.
Home Lesson-7.4
File Handling
File Handling
File Object
Analogy
Mode
File Object
As mentioned earlier, the open function returns a file object. The following image gives a better
picture of the whole setup.
Analogy
You are the CEO of a tech company. Even though you are good at multi-tasking, there are simply
too many things for you to keep track of. To help you manage the mounting load of activities, you
hire a personal assistant (PA). Think about the kind of work you generally assign to a PA. Let us
say that you are meeting delegates from another company at 5:00 PM next Tuesday. The typical
instruction to your PA would be this: "make a note of this meeting". Your PA would dutifully
record this information in a file.
Few days later, you might be suddenly reminded of this important meeting. At this point, this
would be your instruction: "fetch me the details of the meeting with those delegates". In both
cases, notice that it is your PA who is interacting with a file. In the first instruction, your PA noted
down the details of a meeting in a file. In the second instruction, your PA retrieved the
information from the file.
The file object is your PA who mediates between you, the coder, and the file that resides on the
hard disk of your computer. You pass an instruction to your file object, which does the job of
reading and writing to a file. All communication between you and the file is routed through the file
object.
Mode
Read mode
The dotted line in the image given below corresponds to the mode in which you wish to process
the file. This instruction always originates from you and is directed at the file object. When you are
reading from a file, information flows from the file, through the file object and reaches you. This
represented by the solid arrow.
1 f = open('<file_name>', 'r')
2 ...
3 f.close()
Write mode
When you are writing to a file, information flows from you, through the file object and to the file.
To write to a file, we open it in the write mode:
1 f = open('<file_name>', 'w')
2 ...
3 f.close()
In the next lesson, we will see some more aspects of file handling.
Home Lesson-7.5
File Handling
File Handling
File methods
read
readline
readlines
write
writelines
File methods
read
Let us continue working with examples.txt that we created in the previous lesson. If you recall,
examples.txt has the following contents:
1 one
2 two
3 three
4 four
5 five
Let us now look at a different way of reading from a file, using the read method.
1 f = open('examples.txt', 'r')
2 content = f.read()
3 print(content)
4 f.close()
1 one
2 two
3 three
4 four
5 five
read is a method defined for the file object. When it is called without any argument, it returns a
string that contains the entire content of the file. If you head to the console (it is to the right of the
editor in Replit) and type the string content , this is what you get:
1 'one\ntwo\nthree\nfour\nfive'
Notice that content is a single string. It contains the contents of the file, but between consecutive
lines in the file, there is a \n or a newline character:
Except for the last line, every line in the file ends with a \n character. When this string is printed
to the console — print(content) — we get five separate lines even though we are only passing
a single string to the print function. This is because of the presence of the newline character in
the string. Whenever a newline character is encountered, the Python interpreter moves to the
next line.
Now, it is clear why the following piece of code printed an extra line between consecutive lines in
the file:
1 f = open('examples.txt', 'r')
2 for line in f:
3 print(line)
4 # line ends with a \n character for all lines except the last one
5 # this is why we get an empty line between consecutive lines in the
console
6 f.close()
readline
As its name suggests, the readline method reads from the file one line at a time:
1 f = open('examples.txt', 'r')
2 line1 = f.readline()
3 line2 = f.readline()
4 line3 = f.readline()
5 line4 = f.readline()
6 line5 = f.readline()
7 f.close()
The variables line1 , line2 , …, line5 will hold the following values at the end of execution of
the code given above:
Variable Value
line1 'one\n'
line2 'two\n'
line3 'three\n'
line4 'four\n'
line5 'five'
Notice that line5 doesn't have a \n at the end as it is the last line in the file. Here, we know that
there are five lines in the file. This helped us define five separate variables. But what if there are
more lines? Generally, we read a file so as to see what its contents are because we don't know
what is there in it. Clearly, we need a way to figure out when the file ends.
Now, consider the following code. What happens if we try to read the file using readline after all
the lines in the file have been read?
1 f = open('examples.txt', 'r')
2 line1 = f.readline()
3 line2 = f.readline()
4 line3 = f.readline()
5 line4 = f.readline()
6 line5 = f.readline()
7 line = f.readline()
8 f.close()
If we execute this and head to the console, we see that the variable line defined in line-7 is an
empty string! This gives us a way to determine when a file is empty:
Keep reading lines from the file until an empty string is encountered.
1 f = open('examples.txt', 'r')
2 line = f.readline()
3 while line != '':
4 print(line, end = '')
5 line = f.readline()
6 f.close()
Here, we have managed to read the file using just one string variable. Let us make few more
changes to this code:
1 f = open('examples.txt', 'r')
2 line = f.readline()
3 while line:
4 print(line.strip())
5 line = f.readline()
6 f.close()
In this code, we have made two changes. One in line-3 and another in line-4. The loop condition in
line-3 checks for the empty string. If line is an empty string, it evaluates to False and the loop
will be terminated. This is a compact way of writing line != '' . Python treats empty sequences
as False . If this is confusing, execute the following code and check the output:
1 line = ''
2 if not line:
3 print('It works!')
In line-4, we are using the strip method to strip the string line of all the whitespace characters
at the beginning and at the end. In this way, the trailing newline at the end of line will be
stripped. This way, we don't need to use the end argument.
readlines
Finally, Python also provides a way to read the file and store it as a list of lines:
1 f = open('examples.txt', 'r')
2 lines = f.readlines()
3 for line in lines:
4 print(line.strip())
5 f.close()
Here, lines is a list of lines. Notice that each element in lines corresponds to one line in the
file. It is always a string:
write
We already saw the write method earlier. There, we used the write method five times to write
five lines. Let us now use a loop with the help of the lines list. First, we run the code:
1 f = open('writing.txt', 'w')
2 lines = ['one', 'two', 'three', 'four', 'five']
3 for line in lines:
4 f.write(line + '\n')
5 f.close()
We see that there are six lines in the file and not five, though we seem to have written only five
lines. The problem is with line-4, where we are adding \n after every string in the list lines . We
should make sure that we don't add a \n after the last string in the list:
1 f = open('writing.txt', 'w')
2 lines = ['one', 'two', 'three', 'four', 'five']
3 for i in range(len(lines)):
4 line = lines[i]
5 if i != len(lines) - 1:
6 f.write(line + '\n')
7 else:
8 f.write(line)
9 f.close()
Now, check the file, you will see that it has exactly five lines! Let us now try to write an integer to
the file:
1 f = open('writing.txt', 'w')
2 f.write(1)
3 f.close()
We see that write method accepts only string arguments. If we want to write integers to a file,
we have to first convert them to strings:
1 f = open('writing.txt', 'w')
2 f.write(str(1))
3 f.close()
As an exercise, try to run the following code. What do you observe? Why do you think this
happens?
1 f = open('writing.txt', 'w')
2 f.writeline(str(1))
3 f.close()
writelines
We can write a list of lines to a file using the writelines method:
1 f = open('writing.txt', 'w')
2 lines = ['1\n', '2\n', '3\n', '4\n', '5']
3 f.writelines(lines)
4 f.close()
Note that the argument passed to the writelines method is a list of strings. This will create a file
having the following contents:
1 1
2 2
3 3
4 4
5 5
Home Lesson-8.1
File Handling
File Handling
Reading CSV files
CSV files
Reading a CSV file
Files to Collections
1 col0,col1,col2,col3
2 row1,item11,item12,item13
3 row2,item21,item22,item23
4 row3,item31,item32,item33
5 row4,item41,item42,item43
6 row5,item51,item52,item53
A CSV file is one where adjacent values in each line are separated by a comma. Such files are a
good choice for representing tabular data. For the rest of this lesson, we will assume that CSV files
are used to represent some such tabular data. The first line in the file is called the header. The
header gives information about the fields or columns in the data. The rest of the lines can be
treated as rows in the data. If this file is represented as a table, it would look like this:
Reading a CSV file
Let us create a CSV file in Replit and name it table.csv :
Opening and reading a CSV file is no different from opening a text file. Let us try to print the lines
in the file:
1 f = open('table.csv', 'r')
2 for line in f:
3 print(line.strip())
4 f.close()
1 Name,Physics,Mathematics,Chemistry
2 Newton,100,98,90
3 Einstein,100,85,88
4 Ramanujan,70,100,70
5 Gauss,100,100,70
So far so good. Now that we are able to extract the lines from the file, let us start asking some
questions.
Print the chemistry marks scored by the students, one in each line.
This requires us to extract the last column from the file. How do we do this? Consider any one line
in the file, say the second one:
1 # The `\n` at the end will be present for all lines except the last one
2 line = 'Newton,100,98,90\n'
3 line = line.strip() # removes the \n character
This is a string that corresponds to one row in the file. If we need to separate it into different
columns, we need to use the split method and split the line based on a comma:
1 line = 'Newton,100,98,90\n'
2 line = line.strip()
3 columns = line.split(',')
4 print(columns)
1 line = 'Newton,100,98,90'
2 line = line.strip()
3 columns = line.split(',')
4 chem_marks = int(columns[-1])
5 print(chem_marks)
That is all! We have done this for one row. We need to do this for all the rows. Enter loop:
1 f = open('table.csv', 'r')
2 for line in f:
3 line = line.strip()
4 columns = line.split(',')
5 chem_marks = int(columns[-1])
6 print(chem_marks)
7 f.close()
Can you see why? We have tried to convert the last column of the header into an integer as well.
The moral of the story is, when reading CSV files, we need to find a way to deal with the header.
Let us modify our code towards that end:
1 f = open('table.csv', 'r')
2 header = f.readline()
3 # The file object has finished reading the first line
4 # It is now ready to read from the second line onwards
5 for line in f:
6 line = line.strip()
7 columns = line.split(',')
8 chem_marks = int(columns[-1])
9 print(chem_marks)
10 f.close()
This works! In the second line, we read the header. Now, when the for loop starts in line-5, we are
ready to read from the second line in the file. If this seems confusing, consider the following
approach that uses the readlines method alone:
1 f = open('table.csv', 'r')
2 lines = f.readlines()
3 # lines[1: ] is the rest of the list
4 # after ignoring the header
5 for line in lines[1: ]:
6 line = line.strip() # strip the line of \n
7 columns = line.split(',') # split based on comma
8 chem_marks = int(columns[-1]) # convert last column to int
9 print(chem_marks)
10 f.close()
readlines is a reasonable choice for reading small files, say under 1000 lines. We get all the lines
of the files in a list. Reading a file reduces to processing a list of strings. If lines is the list of lines,
then lines[i] corresponds to the line in the file. Going the other way, the line in the
file corresponds to the string lines[i - 1] .
IMPORTANT NOTE
However, when it comes to large files, readline is the best method to use. Processing large files
is best done by reading it one line at a time. Using readlines for large files is a dangerous idea.
This is because, readlines dumps the entire content of the file into a list of strings. When the file
is large, this list will occupy huge memory. Let us try to write the same program given above using
the readline method:
1 f = open('table.csv', 'r')
2 header = f.readline().strip() # this is for the header
3 line = f.readline() # second line; actual rows begin
4 while line:
5 line = line.strip() # strip the line of \n
6 columns = line.split(',') # split based on comma
7 chem_marks = int(columns[-1]) # convert last column to int
8 print(chem_marks)
9 line = f.readline() # read the next line in the file
10 f.close()
Files to Collections
It is often useful to convert a CSV file and store in a suitable collection. We could do this in several
ways. Here, let us try to create the following list of dictionaries from the file:
1 data = [
2 {'Name': 'Newton', 'Physics': 100, 'Mathematics': 98, 'Chemistry': 90},
3 {'Name': 'Einstein', 'Physics': 100, 'Mathematics': 85, 'Chemistry': 88},
4 {'Name': 'Ramanujan', 'Physics': 70, 'Mathematics': 100, 'Chemistry': 70},
5 {'Name': 'Gauss', 'Physics': 100, 'Mathematics': 100, 'Chemistry': 70}]
This is a list of dictionaries. Each element in the list corresponds to one row in the file.
The elements in the header appear as keys in every dictionary.
The values of the dictionary are of different data types. Names are strings, marks are
integers.
This is going to be a fairly long code. Let us break it down. First, some basic processing to get the
list of lines from the file after stripping them of the trailing newlines:
Home
On the Way
1 print('Dear Learner!')
2 print('Lessons are on the way!')