Python For Scientists: Sejin Kim
Python For Scientists: Sejin Kim
Scientists
Introduction to Python for Scientific Computing
Zeroth Edition
Sejin Kim
Python for
Scientists
Introduction to Python for Scientific Computing
Zeroth Edition
Sejin Kim
Kenyon College, Gambier, Ohio
Kenyon College
Department of Mathematics & Statistics - Scientific Computing
Gambier, Ohio
Credits and acknowledgements in this textbook appear on the appropriate
page within the text.
Optional Sections
Some sections are marked as optional. For students who might learn a second
programming language soon, these sections are highly recommended, since they
introduce important concepts that are crucial to many other languages, even if
they are not a part of Python.
Accessibility to Students
It is not enough for a book to simply present the right topics in the right order.
It is not even enough for it to be clear and correct when read by an instructor
or some other experienced programmer, like a tutor. It’s important that the
material is presented in a way that is accessible to beginning students. However,
subsequent versions of this textbook may improve readability. The authors
and publishers encourage students and instructors alike to submit suggestions
iii
iv
and recommendations for improving the readability of the content for future
editions.
Edition Compatibility
All efforts will be made to make different editions compatible. Future editions
will be written with two or three sections. For example, a version of the first
edition may be written as 1.3.1. However, the book and subsequent editions are
written in a way that a student could read any version of the first edition. That
is, versions 1.0.0, 1.1.0, and 1.1.1 should all be intercompatible by material.
Page numbers may differ slightly. We would encourage instructors to reference
material based on chapter number, rather than page number, especially because
page numbers do not line up between the instructor’s edition and the student
edition.
Printing
When printing this book, we highly recommend printing in color and at 100%
scaling to ensure readability of code. Code snips in this book have been color-
coded using color coding, a common feature found in many IDEs, including
Visual Studio Code and Atom. At the very least, the colors will allow students
to read and distinguish different elements of code. Even though your colors
might be different in your IDE than are represented in this book, they are
at least consistent. For example, keywords in this book are always written in
purple, operators in orange, and comments in forest green. Your IDE might
have keywords in light blue, operators in black, and comments in grey, but these
colors will be consistent.
Support Material
There is support material available to all users of this book and additional
material available only to qualified instructors. All material is provided free of
charge and under the MIT License.
You can find the support material at https://pythonforscientists.github.io.
v
Contributing
I believe that the greatest strength in an open-source textbook is the very
fact that anyone can edit and improve it for the greater good. We encourage
students and instructors alike to submit their suggested changes and recom-
mendations for future editions of this book. This book will be published in an
open repository on GitHub, and we invite changes in the following ways.
Open an issue and describe what you think should be changed
vi
Contact the authors directly with what you think should be changed
License
The material in this book is licensed under the MIT license. A copy of the
license is provided below.
Acknowledgements
Many people have assisted me by providing their suggestions, discussions, and
other help in preparing this textbook. Much of the work for the zeroth edi-
tion of this book was written while I was working with the wonderful people
at the Kenyon College Department of Mathematics and Statistics and in the
Integrated Program in Humane Studies.
Table of Contents
1 Introduction 9
1.1 Reading This Book . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Development Basics 13
2.1 Introduction to Python . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.1 A Brief History of Python . . . . . . . . . . . . . . . . . . 13
2.1.2 Characteristics of Python Scripts . . . . . . . . . . . . . . 14
2.1.3 Layout of a Simple Python Script . . . . . . . . . . . . . 15
2.2 The Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Hardware and Software . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.2 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4 How to Program . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4.1 A New Type of Problem . . . . . . . . . . . . . . . . . . . 25
2.4.2 The Power of Pseudocode . . . . . . . . . . . . . . . . . . 25
3 Basic Datatypes 27
3.1 Strong and Weak Typing . . . . . . . . . . . . . . . . . . . . . . 28
3.2 Booleans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.1 Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.2 Floats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4 Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
TOC i
TABLE OF CONTENTS TOC ii
4.4 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.4.1 Line Comments . . . . . . . . . . . . . . . . . . . . . . . . 59
4.4.2 Block Comments . . . . . . . . . . . . . . . . . . . . . . . 60
4.5 Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.5.1 Logical Versus Syntactical Errors . . . . . . . . . . . . . . 63
4.5.2 Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.5.3 Handling Errors On the Fly . . . . . . . . . . . . . . . . . 70
4.6 Typecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.7 F-strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.8 Statements and Expressions Review . . . . . . . . . . . . . . . . 81
4.9 Arithmetic Operations . . . . . . . . . . . . . . . . . . . . . . . . 83
5 Complex Datatypes 91
5.1 Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.1.1 Creating Lists . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.1.2 Accessing Data Inside of Lists . . . . . . . . . . . . . . . . 95
5.1.3 Appending to Lists . . . . . . . . . . . . . . . . . . . . . . 97
5.2 Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.2.1 Lists Versus Dictionaries . . . . . . . . . . . . . . . . . . . 101
5.2.2 Creating Dictionaries . . . . . . . . . . . . . . . . . . . . . 101
5.2.3 Accessing Data Inside of Dictionaries . . . . . . . . . . . . 103
5.2.4 Appending to Dictionaries . . . . . . . . . . . . . . . . . . 103
5.3 Tuples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.4 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.5 Subsetting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.6 String Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.6.1 Stripping . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.6.2 Regular Expression Matching . . . . . . . . . . . . . . . . 116
Regexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Regex Matching in Python . . . . . . . . . . . . . . . . . 119
5.7 Random Number Generation . . . . . . . . . . . . . . . . . . . . 125
Index 311
Chapter 1
Introduction
9
CHAPTER 1. INTRODUCTION 10
Don’t worry if you don’t understand this code. What you should notice are the
colors. Keywords have been written in blue and orange, operators in orange,
strings in red, and comments in green. All of this helps you understand the
code that was written to you, as a human. Your IDE, or integrated development
environment, will also color the text, though the colors might be different. Even
if the colors are different, they should be consistent throughout the document,
and as you get used to programming, you will better understand what the colors
mean.
Pay attention to the line numbers! Line numbers are marked on the left
side of the script, and they indicate single lines as they are stored in the file,
not as they are rendered. This is why a single logical line might span multiple
physical lines, in the case where we’ve run out of space on the page. We’ll cover
this (it’s called wrapping) in chapter 4.3, but for now, just be aware that you
should read the line number on the left side of this textbook.
We may also give you code that doesn’t work or that you shouldn’t run.
In these cases, we have prefaced the code block with a comment that says
# DO NOT RUN or # NOT RUN. If it says ”DO NOT RUN,” that means that the
code might break your computer or is deliberately not syntactically correct. If
it says ”NOT RUN,” it is your choice as to whether you run it, but the examples
provided in the book operate under the assumption that you did not run that
block of code.
Sometimes, we’ll also give you some output. Output has no color formatting,
and it just appears in a regular monospace font.
Iteration 0
Iteration 1
Iteration 2
Iteration 3
Iteration 4
Iteration 5
Iteration 6
Iteration 7
The purpose of this is to differentiate it from the regular English text and also
from the Python code. This is what Python might output.
11 1.1. READING THIS BOOK
We’ve also provided you with some extra notes, warnings, and fun facts.
These show up in colored boxes (one more reason to print in color or just view
the textbook on your computer as a digital edition).
Note
Notes are provided in blue boxes. These can help you under-
stand what you’re doing a little bit better, but aren’t, strictly
speaking, necessary.
Warning
Fun Fact
Fun facts are provided in green boxes. These aren’t very helpful,
but they are fun! (At least we think so...)
CHAPTER 1. INTRODUCTION 12
Chapter 2
Development Basics
for loop, documentation will use for spam in eggs instead of for foo in bar.
13
CHAPTER 2. DEVELOPMENT BASICS 14
one learning to program in Python, including you! While Python had its own
influences, it also played roles in the development of new languages, including
JavaScript, Ruby, and Swift.
We won’t cover some of these for the rest of this book, so here’s the explanation
to those right now.
The first two lines are called import statements, and they tell the interpreter
about certain extra modules that are used in your program. This is incredibly
useful, since a lot of code already exists for a lot of complicated stuff, like math
functions, data science, graphics, or web scraping, so you don’t have to write
these from scratch. We’ll see how to use modules when we begin to work with
files and in data science.
You should follow your instructor’s directions on where to place your import
statements. Most instructors have you place all of your import statements at
the very beginning of your program. This means that you don’t have to question
what modules you have imported without having to search for all of them, useful
in extremely long scripts. It also means that you can use anything in any of
the modules without having to figure out where in the script you imported the
module, since modules can only be used after they have been imported.
Next, we have function definitions. Don’t worry exactly about what a func-
tion. We’ll cover the subtleties of functions in Chapter 7.1. We mark the
beginning of the function using the colon and what is inside of the function
using indentation.
After our functions, we have the main part of the program. This is where
17 2.1. INTRODUCTION TO PYTHON
we can refer to functions that are in modules that we have imported or those
which we created ourselves. This is also where we’ll control the basic flow of
the program.
As your programs become more and more sophisticated, they will also de-
viate more and more from this basic structure. Advanced programs can span
across different files and folders. However, we need to start somewhere, and for
our purposes, dealing with single files is perfectly adequate.
CHAPTER 2. DEVELOPMENT BASICS 18
1 i = 0
2 while True :
3 print ( i )
4 i += 1
are very much exceptions. These languages typically exist only as novelties.
19 2.2. THE TOOLS
Now, let’s look a look at what the assembly code for this will be. It’s sloppy,
but the trained eye might be able to break this down.
1 i : DB 0
2 . loop :
3 MOV i , eax
4 ADD eax , 1
5 PSH eax
6 CALL printf
7 POP eax
8 JMP . loop
This makes perfect sense to the computer, even if you don’t understand it. (On
a side note, take a computer architecture class to understand what this code
is!)
Fun Fact
One of the most important things to understand about computers is, they’re
not magic. It’s not a mysterious black box, and as much as we’d like to think
that they operate using magic blue smoke, that’s not really the case. You can
do as little or as much as you want with them, but they can do a lot for you.
Have fun on your programming journeys! It will be difficult, but rewarding!
Exercise Questions
These exercise questions cover chapters 2.1 and 2.2.
Exercise 0
1. What placeholder words do Python’s documentation use?
2. Does Python support emojis?
3. What version of Python is this book written for (and subsequently what
you will be coding in)?
4. Are Python 2 and 3 interchangeable?
Exercise 1
1. What are some file extensions that you might find in a Python script?
2. What file extension will a regular Python script have?
CHAPTER 2. DEVELOPMENT BASICS 20
3. What line of code can you run to get the Zen of Python in your Python
shell?
Exercise 2
1. What is the difference between a statement and spacing?
2. Does Python execute code line-by-line or are Python statements exe-
cuted separated by semicolons?
3. If your course has a style guide, where should you put your import
statements according to that style guide?
Exercise 3
1. What kind of code do we, as programmers, typically write: source code,
assembly code, or binary code?
2. What kind of code can a computer understand natively: source code,
assembly code, or binary code?
3. What kind of code is used to create binary code: source code or assembly
code?
Exercise 4
1. What is the difference between a text editor, a compiler, and an inte-
grated development environment?
2. Look at your course’s syllabus. Are you using a text editor, a compiler,
an integrated development environment, or some combination of these?
If it is a combination, what is the combination?
Exercise 5
(No wrong answers!)
1. Why are you choosing to learn how to program?
2. How familiar are you with computers in general (regardless of program-
ming experience)?
3. Do you know any other programming languages? If so, which ones?
21 2.3. HARDWARE AND SOFTWARE
At its very core, the CPU carries out something called the fetch-execute
(sometimes also referred to as the fetch-decode-execute) cycle. To run this
cycle, we need four basic components in our processor: a clock, a counter, an
instruction register, and an accumulator. We’ll also need direct access to our
memory, that RAM from above.
This fetch-execute cycle is run every time the clock tells it to. When you
see processors referred to by their clock speed, this is that number, measured in
Hertz. Modern processors run at billions of times per second, or in gigahertz,
but for now, we’re going to slow it way down to one clock tick per sentence
(or so). With each tick of this clock, the processor is going to do one of three
things: fetch new instructions from the memory, decode the instructions that it
just fetched, or execute the instructions that it just decoded. The instructions
and the results from any evaluations are stored in memory. Each chunk of data,
whether it’s an instruction or some data chunk, is allotted its own memory
space within the random access memory. This memory space, in theory, is not
CHAPTER 2. DEVELOPMENT BASICS 22
where a process can overwrite memory that it shouldn’t have access to with malicious code.
There are also other attacks that can provide unauthorized access to a memory space that
shouldn’t be exposed.
23 2.3. HARDWARE AND SOFTWARE
computer has something called the basic input/output system, or BIOS for
short. The BIOS is a set of instructions in firmware which control the input
and output operations of your computer. When you press the power button
on your computer, it’s your BIOS’s job to instruct the computer’s hardware
on what exactly to do. In the last decade or so, there’s been a move to more
extensive BIOS interfaces using the Unified Extensible Firmware Interface, or
UEFI. You can use the UEFI to edit the BIOS firmware settings.
Exercise Questions
These exercise questions cover chapter 2.3.1.
Exercise 6
1. What does CPU stand for?
2. What does RAM stand for?
3. What is the CPU responsible for doing?
4. What is the RAM responsible for doing?
Exercise 7
1. What is the difference between volatile and non-volatile memory?
2. Think about a USB flash drive that you can store files on. Is this volatile
or non-volatile memory? Why?
Exercise 8
1. How is RAM different from ordinary non-volatile, secondary, or tertiary
storage?
2. What is a memory space?
3. How is a memory space different from RAM?
4. Can a program be run from ordinary non-volatile, secondary, or tertiary
storage mediums? Why or why not?
2.3.2 Software
As you might expect, you’ll be writing software in an introductory program-
ming course. More advanced programming courses might also have you cover
something called firmware, but for your purposes, you are only concerned
with software. Programmers write software using programming languages.
However, the term ”programming language” is pretty broad; different languages
operate at different levels, with high-level languages being the most popular
in recent years. High-level languages are the most akin to human languages,
and they are designed to be easy for humans to write and understand. Com-
paratively speaking, low-level languages are much closer to what a computer
CHAPTER 2. DEVELOPMENT BASICS 24
can understand. Python, C, C++, C#, Java, Swift, and Objective-C are all
high-level languages. Assembly language (that really tough stuff to read from
chapter 2.1) or machine code (the 1’s and 0’s themselves that a computer pro-
cessor can directly interpret) is considered a low-level language.
However, computers can only read binary: the 1’s and 0’s that make up a
software program. So, how can we convert print(’Hello, World!’) into a
bunch of 1’s and 0’s that our processor can execute? We have to convert the
code into something that the computer can read. There are two main schools for
converting code into machine code: interpretation and compilation. Both
code interpreters and code compilers operate do the same thing (convert high-
level language code into machine code), but how and when they make that
conversion differs greatly. Since both types of conversion begin with code and
end with a different kind of code, we need some new terms to refer to what kind
of code is what. Source code is the high-level language code that you write
in Python, C, C++, C#, Swift, or any other common programming language.
After the conversion is made, the computer will produce object code, which
is directly executable by the computer. You can open a source code file in a
text editor, but you cannot natively execute it. Conversely, you can execute an
object code file, but you cannot natively open it in a text editor. The decision
to make the code an interpreted language or a compiled language happens when
the language itself is being designed; you cannot choose which you’d like.
Interpreted languages are translated into object code while the actual
code is being executed. As the program is being run, the computer is interpret-
ing the code, line-by-line. This means that you don’t have to wait for the code
to be translated into your executable binary file, but it also incurs a perfor-
mance penalty on the code’s execution, since each line needs to be interpreted
as it is needed. Interpretation, while perhaps insignificant on a small project,
could add up to seconds or even minutes on a large project.
Conversely, compiled languages are translated into object code before any
code is actually executed. Before your program can even run, it needs to be
converted into a binary executable (object code) that can then be run. When
you’re ready to run your code, your computer will compile the code, then run
the compiled code, not the source code. It means that it takes longer to get the
compiled code (as the entire source code file or files are compiled), but that once
the code is compiled, it’s much faster than an interpreted language, since the
computer’s not trying to compile the instruction and execute the instruction,
too, only execute the instruction.
Python is an interpreted language. For your purposes, the execution time
between an interpreted and a compiled language do not matter nearly as much
as writing good code in the first place. Both will seem nearly instantaneous to
mere mortal humans like you or me.
25 2.4. HOW TO PROGRAM
same Python script in many different ways in pseudocode. For example, here’s
a Python script. Don’t worry, you don’t have to understand what this code
does now.
1 name = None
2 name = str ( input ( " What is your name ? " ) )
3 print ( " Your name is " , name )
This might not all make sense to you now, but here’s a way to write this as
pseudocode.
Empty variable called name
Fill name with user input as a string
Output " Your name is " and the name var
This is only one type of pseudocode. Remember that pseudocode can also
be flowcharts and diagrams. When your projects get more complicated, you
should be comfortable with writing your algorithms out on paper. It will help
you understand the nuances of your algorithms, and you can then convert that
into code much easier.
Chapter 3
Basic Datatypes
27
CHAPTER 3. BASIC DATATYPES 28
Exercise Questions
These exercise questions cover chapter 3.1.
Exercise 9
1. What are some of the advantages of a strongly typed language?
2. What are some of the disadvantages of a strongly typed language?
3. What are some of the advantages of a weakly typed language?
29 3.1. STRONG AND WEAK TYPING
Exercise 10
1. What is the difference between strongly and weakly typed languages?
2. Can you typecast data with weakly typed languages?
3. What kind of language is Python: strongly or weakly typed?
CHAPTER 3. BASIC DATATYPES 30
3.2 Booleans
The most primitive data in every single programming language is the Boolean
datatype. A Boolean represents either a true or a false. These values are stored
in memory as a 1 or a 0. As such, it only requires one bit to store an entire
Boolean.1 This makes Booleans extremely fast to work with. It doesn’t take as
many processor cycles to work with a Boolean variable as it does to work with
more complex datatypes.
1 while ( True ) :
2 ...
1 variable = False
If you try to write a Boolean value with all lowercase letters, Python cannot
evaluate your program, and it will crash.
Python requires the first letter of all Boolean values to be capitalized, but
this is not exclusive to Python, nor are other languages syntactically bound
like this. For example, Java, C++, and C# use all lowercase letters, true or
false, while PHP uses all uppercase letters, TRUE or FALSE.2 You don’t need to
get bogged down with how each language deals with Boolean states, only the
language that you’re actively working with. However, if you do read code in
other languages, you shouldn’t assume that it’s wrong just because a Boolean
value isn’t capitalized. That might be the correct syntax for that language, and
you should always check that language’s documentation.
1 Technically speaking, it takes more than just one bit to store a Boolean value, since you
also need to track the name of what thing that Boolean is telling the state of. However, the
actual Boolean value itself only takes one bit to store.
2 PHP is a very funky language, and while the official documentation states you should
use all-caps, it’s not technically required. There are many, many faults and issues with PHP,
but it’s not a topic for this book.
31 3.2. BOOLEANS
There are several ways that the term Boolean gets used in programming. In
general, it can be used in three different ways: as a statement to set a variable
value:
1 active = False
In a comparison expression (in the below expression, laps < 10 would eval-
uate to True or False):
1 while ( laps < 10) :
2 ...
Or as a control mechanism:
1 while ( True ) :
2 ...
You don’t need to understand the distinction between these three uses,
especially since we haven’t covered control logic or variables yet, but you should
be aware that you might see the term Boolean used in several different ways.
Programmers sometimes refer to Boolean values as ”bools.” If you hear the
term ”bool,” it’s referring to a Boolean value.
Exercise Questions
These exercise questions cover chapter 3.2.
Exercise 11
1. What states can a Boolean hold?
2. How does Python mandate that you write out these states in code?
3. At a very basic level, how is a True value stored in memory? Only
consider the value itself; ignore the memory address.
CHAPTER 3. BASIC DATATYPES 32
Exercise 12
1. Do some research: How are Booleans written out in Matlab code?
2. Do some research: How are Booleans written out in JavaScript code?
3. Do some research: How are Booleans written out in C++ code?
4. Do some research: How are Booleans written out in Swift code?
33 3.3. NUMBERS
3.3 Numbers
Booleans alone are very difficult to work with, especially if you want to store
anything other than true or false. Let’s be honest, as great and as simple as
Booleans are, they’re not the right tool for a lot of jobs. Next, we’re going to
cover two ways to represent numbers in Python.
3.3.1 Integers
Think back to your algebra class. Remember when you had to classify counting
numbers from integers and integers from decimals? No? That’s okay, let’s jog
your memory.
An integer is simply a whole number, including zero and negative numbers.
5 is an integer, and so are 13, 0, and -51. When writing integers in Python,
always remember to include the number and the number only. No symbols
(other than the negative symbol -) are accepted in an integer value. Trying to
put another character in will result in a syntax error.
Integers in Python have a limited size. Since they need to be stored in a
defined and limited memory space, they are assigned 32 bits. This equates to a
range of -2147483648 to 2147483647. This is really, really big, but sometimes,
it isn’t big enough, so Python has another secret type called the long integer.
Long integers have an infinite capacity. Since Python is a weakly typed lan-
guage, it can detect if your integer is too large for a standard 32-bit integer
space and automatically upgrade the datatype to a long integer.3 Neat!
Note that some languages have the ability to convert a numerical value into
an unsigned value, meaning that the range stays the same, but instead of the
range being from −x to x, the range is from x to 2x. Python 3, due to its
unlimited long integer size, does not support unsigned numerical values.
one way of handling this. Other languages may require you to explicitly typecast to a dif-
ferent variable type altogether, such as a double, that has the capacity for the data that the
programmer is trying to store.
CHAPTER 3. BASIC DATATYPES 34
Different integers?
Exercise Questions
These exercise questions cover chapter 3.3.1.
Exercise 13
1. What states can an integer hold?
2. Give three examples of integers.
3. How can you refer to an integer in speech, when talking with other
programmers?
Exercise 14
1. Does Python support unsigned integers? If it does, why does it sup-
port unsigned integers? If not, why doesn’t it need to support unsigned
integers?
2. Do some research: What is a programming language that supports un-
signed integers?
Exercise 15
1. What is the difference between a regular integer and a long integer?
2. Do you need to worry about whether a value is an integer or a long
integer in Python? Why or why not?
3.3.2 Floats
But what if we need to use decimal points? This would be handy for many
reasons, such as storing a money value, a precise mass, or many other things.
An integer doesn’t support decimal points, so instead, we can use a float or a
floating-point value. In strongly-typed languages, floats are typically stored
in 64-bits instead of 8- or 16-bits, meaning that if used to store whole numbers,
they support much larger ranges, while still being able to hold decimal values.
35 3.3. NUMBERS
In this spirit, Python 3 uses the term float to mean any number that isn’t an
integer, that would ordinarily be ”too large” for an integer.
Like integers, floats can be negative. So, 3.5 is a float, but so is 78.38323526
and -635.5465. If a number has a decimal point, Python will automatically
assume that it should store the value as a float.
At this point, it begs the question, why should we use integers at all? Well,
floats take dramatically more memory to hold. For a more concrete example,
check out this demonstration.4 The top line represents a big loop with an
integer, and the bottom line represents the same loop, but with the execution
time penalty of a floating point value. The faster line is taking less time,
computationally, to evaluate the next position of compared to the slower line.
Exercise Questions
These exercise questions cover chapters 3.3.1 and 3.3.2.
Exercise 16
1. What states can a float hold?
2. Give three examples of floats.
3. How is a float different from an integer?
Exercise 17
1. Why shouldn’t we use floats for everything?
2. Provide three examples where we should use floats over integers.
3. Provide three examples where we should use integers over floats.
4 https://processing.org/examples/integersfloats.html
CHAPTER 3. BASIC DATATYPES 36
3.4 Strings
What if we want to represent anything other than a true/false value or a num-
ber, like text? Well, we have one last datatype at our disposal: the string. Why
do we call it a ”string?” Imagine that you had alphabet soup and you picked
out the ”alphabet” part. If you got a piece of twine and put your alphabet
on the twine in a particular order, you are representing some information as a
string of letters. In computer science terms, we refer to individual letters as
chars or characters and to specific combinations of chars as strings.
In Python, strings are represented using quotation marks or single quotes.
Python doesn’t care whether you use either, as long as you’re consistent. In
Python, the following lines are syntactically valid strings:
1 " Aunt Jacky went up the hill . "
2 ’ Aunt Jacky went up the hill . ’
However, the following lines is not syntactically correct, since the opening and
closing quotes don’t exactly match for each string:
1 " Aunt Jacky went up the hill . ’
2 ’ Aunt Jacky went up the hill . "
You can always typecast an integer or float into a string, but you can’t always
typecast a string into an integer or a float. We’ll get into exactly how typecast-
ing works in Python in chapter 4.6. For now, let’s just cover the basics of what
can and cannot be typecast.
Let’s say that you calculated the sum of two integers using Python and you
now want to display this result to your user. You need to be able to display this
newly created integer, but the print() function only accepts strings. Instead,
we need to typecast our integer into a string. We can’t do any arithmetic
(covered in chapter 4.8) on a string, but we can print it.
Likewise, it’s possible that we’ll need to accept some user input. However,
all of our standard mechanisms for accepting user input in the console only
accepts strings, not only numbers. But what if we want our user to enter the
price of something? Instead, we can accept our input as a string, then typecast
it to a float. Again, we’ll see exactly how to do this in chapter 4.6.
You may see the term string literal. This means a string that is literally
spelled out, such as the ones that are described above. This is in contrast to
dynamically created strings, such as those that are generated directly by Python
code. If you write single or double quotes, then you are creating a string literal.
It’s also worth mentioning that some word processors create something
called ”smart quotes,” which angle the quotation mark based on where they fall
in relation to the words around them (for clean opening and closing quotes).
Smart quotes are NOT compatible with Python code, so if you paste in code
from Word documents or Google Docs documents, be careful that your quota-
tion marks are not smart quotes!
37 3.4. STRINGS
Is the string or code that you copied and pasted not working?
Check whether the quotes are actually smart quotes!
Exercise Questions
These exercise questions cover chapters 3.4.
Exercise 18
1. What states can a string hold?
2. Give three examples of strings, syntactically marked up for Python (in-
clude any symbols that denote a string).
3. Where does the term ”string” come from?
Exercise 19
1. What characters do you use to enclose a string? Are they interchangable?
2. Let’s say you copied a string from a Microsoft Word document. Why
might your quotes not be syntactically correct?
Exercise 20
1. Can a string hold data that could also be stored by a float? Justify your
answer.
2. Can a float hold data that could also be stored by a string? Justify your
answer.
CHAPTER 3. BASIC DATATYPES 38
Exercise 21
1. You should now have all of the basic datatypes. List them here.
2. What is the difference between each of the datatypes that you listed?
Exercise 22
Identify the datatype of the following objects from the list of datatypes that
you created in the last exercise. What gave it away?
1. 4
2. 3.14
3. "cat"
4. ’dog’
5. ’4’
6. "’bread’" (single quotes inside of a set of double quotes)
7. ’"banana"’ (double quotes inside of a set of single quotes)
Chapter 4
General Python
Programming
faces, like sensors and motors, but we won’t cover these here.
39
CHAPTER 4. GENERAL PYTHON PROGRAMMING 40
7.1.
print() only requires one argument: what you actually want to print. It
looks like this:
1 print ( " Hello , World ! " )
1 Hello , World !
In the above example, observe how because we’re printing a string, we need to
use quotes. However, as noted in chapter 3.4, we could also write the code as
follows, with single quotes:
1 print ( ’ Hello , World ! ’)
1 Hello , World !
The print function can accept multiple arguments, or things inside of the paren-
theses. We pass in multiple arguments by writing them inside of the parenthe-
ses, and we distinguish between these arguments using commas. The most
common reason to pass in multiple arguments is to concatenate two or more
things. Concatenation is the act of linking two or more things together in a
chain. Consider the following example:
1 print ( ’ Hello , ’ , ’ World ! ’)
1 Hello , World !
This would print the exact same thing as the previous two examples in the
terminal. The only difference is that they’re separated by a comma. When
concatenating two elements together, Python’s default behavior is to add an
extra space, which is why the above example would print the exact same result
as the first two examples. In fact, if we tried to put an extra space inside of
our strings that we were passing in, we’d end up with a double-space.
1 print ( ’ Hello , ’ , ’ World ! ’)
1 Hello , World !
It’s subtle, but in the output, there’s an extra space between the comma:
Hello,_World! versus Hello,__World!.
We can change this default behavior by adding in another argument: sep = "".
The sep argument specifies how each element that is to be concatenated should
be separated, and it should equate to a string. This string can be empty, or it
can be full of stuff. Consider the following examples:
1 print ( ’ Hello , ’ , ’ World ! ’ , sep = " " )
41 4.1. BASIC OPERATIONS
1 Hello , World !
There’s one more common argument to pass into the print() function:end="".
The end argument specifies what the end of the line should look like, and it
brings up another topic: escape characters or escape sequences.
1 Hello ,
2 World !
1 Hello , World !
1 Hello ,
2 World !
You can also escape quotes inside of quotes by using the backslash, or the escape
character. Using the escape character will force Python to evaluate the string
literally, rather than trying to parse anything that comes after the backslash.
1 print ( " Hello , \" World ! \" " )
But what if we need to print an actual backslash? Well, then we need to escape
our escape character. We can use double backslashes \\, and here’s what it
means: the first backslash does what it always does and tells the interpreter
that we need to prepare to evaluate an escape character; the second backslash
tells the interpreter that we’re actually trying to output a backslash.
1 print ( " \\ " )
1 \
Now, we can use our escape characters to customize the end behavior of a
Python print function. Just like sep, we can pass in end as an argument.
1 print ( " Hello , World ! " , end = " \ n " )
2 print ( " Hello , Python ! " , end = " " )
1 Hello , World !
2 Hello , Python !
Exercise Questions
These exercise questions cover chapter 4.1.
Exercise 23
1. What is the console?
2. What is the function name to output to the console?
3. What goes inside of the parentheses?
4. Name two arguments that you could pass to the console output function.
Exercise 24
1. Write some code to print the following string on one line, without spec-
ifying the end of the line: Hello, Python
2. Write some code to print ”Hello”, then ”Python” by concatenation, but
with no space in between.
CHAPTER 4. GENERAL PYTHON PROGRAMMING 44
3. Write some code to print ”Python” and end the line with a tab, rather
than a newline and carriage return.
Exercise 25
Are the following print() statements syntactically correct? Why or why
not?
1. print Hello, World!
2. print "Hello, World!"
3. print(Hello, World!)
4. print("Hello, World!"
5. print("Hello, "Dottie!""
6. print(’Hello, World!’
7. print(’Hello, "Dottie!"’)
8. print(’Hello, World!")
45 4.2. VARIABLES
4.2 Variables
We’ve alluded to storing data a lot already, but how do we actually do this?
Well, we use variables. Just like in algebra, variables in Python can be repre-
sented by lots of different things, and they can represent a lot of other things.
We’ll use variables to store nearly everything in Python. They’re the best
way to store the state of something while you’re working on it, track how many
times you’ve run through a loop (we’ll get there later), store user information,
and so many other things. In fact, variables are foundational to every single
programming language.
All data that is worked on in variables is stored in memory when you’re
running a Python script. When you declare a new variable, your variable is
assigned a memory address in the computer’s RAM. You can do anything you
want with that memory space, including filling it with data or editing the data
that’s inside of it. For a review on what a memory space is, refer to chapter
2.2.
Since all of your variables are stored in memory, they can only persist while
the program itself is running. After your program is terminated, the mem-
ory space is marked as free by the operating system, meaning that any other
program is now free to overwrite that memory.
This code is declaring the four new variables cashValue, accountHolder, isBroke,
and kitties. Each of these variables is being initialized to a different type.
cashValue is being initialized to the value 3.39, which would be stored as a
float. Likewise, accountHolder is being initialized to the value "Stinky Pete",
2 The exception to this is if you decide to use constant values, or consts. You might set
a constant for the value of Pi or for the force of gravity. Constants cannot be changed after
initialization.
CHAPTER 4. GENERAL PYTHON PROGRAMMING 46
which is a string. isBroke is being initialized to the Boolean value True. Note
that kitties is being initialized to 390, and Python will try to store this value
as an integer, since there isn’t a decimal point. If we wanted to force Python to
store kitties as a float, we could just add a .0 to the end of the initialization
value:
1 kitties = 390 . 0
When declaring or updating the value of a variable, the name of the variable
always goes to the left side of the equal sign. You can think of a variable
statement as putting the thing on the right side of the equal sign into the
variable on the left side of the equal sign. So, the following code is syntactically
correct:
1 accountHolder = " Stinky Pete "
In the above examples, can you guess what the variables are describing? Prob-
ably. The variable names cashValue, accountHolder, and isBroke explain
what information the variables hold pretty well. However, the variable kitties
doesn’t really make sense. Is this how many kitties I have? Is it my love or
hatred of cats on a scale from 1 to 1000? We don’t really know. In program-
ming (not just in Python), it is critically important that you get into the good
habit of declaring good variable names. Be descriptive, but concise. A better
variable name might be numKitties.
There are several rules that you MUST follow when making new Python
variables. Failing to follow these rules will result in syntax errors:
A variable name must start with a letter or the underscore character _
A variable name cannot start with a number
A variable name can only contain alphanumeric characters and under-
scores (A-Z, a-z, 0-9, and )
Variable names are case-sensitive: age, Age, and AGE are three different
variables
Variable names cannot be reserved words in Python, such as print
Now is also a good time to go over naming conventions. There are several
different naming conventions that you might see while programming. It doesn’t
terribly matter, but whichever convention you go with, you should stick with it
throughout the entire project.
snake case is written with underscores in between each word. In Python,
snake case is typically used for package or modules that must be longer
than one word. Avoid using snake case when possible.
47 4.2. VARIABLES
lowercaseCamelCase is written with the first word in all lowercase and all
following words capitalized. In Python, lowercase camel case is typically
used for variable names.
ALLCAPS is written with all letters capitalized. In Python, all caps case
is typically used for constants.
When using camel case, capitalize all letters of abbreviations (e.g. HTTPSServer
instead of HttpsServer). Always avoid naming this with O (easily confused with
0) or I (easily confused with 1 or L).
It is possible to declare a new variable without initializing it, but you must
declare a variable before initializing it. If you don’t know what type of variable
will be, but you know that you’ll need a variable, you can initialize to something
called the Nonetype. To do so, simply set the initialization to None. This is the
practical equivalent to declaring without initializing.4 Consider the following
code.
1 emptyVar = None
2 print ( type ( emptyVar ) )
In general, you should always try to initialize your variables, even if it’s just to
0 (if it’s an integer), 0.0 (if it’s a float), or "" (if it’s a string). If it’s a Boolean
variable, you can initialize it to True or False, depending on what makes sense.
wIsElY.
4 Other languages have no Nonetype, since they support direct declaration without initial-
ization. For example, in C++, you could run the line ’int age;’, which would declare the ’age’
variable without initializing it with anything.
CHAPTER 4. GENERAL PYTHON PROGRAMMING 48
We’ve talked about variables, but there’s a specific type of variable in Python
called a constant.5 The difference between a regular variable and a constant
is that a constant cannot be changed after initialization. To mark a constant,
you should name it in all caps. It might be useful to use constants to store data
that you know will never change, such as a conversion factor or a mathematical
constant, like π or e.
1 PI = 3 . 1415
2 E = 2 . 71
3 GRAVITY = 9 . 8
Changing Constants
Exercise Questions
These exercise questions cover chapter 4.2.
Exercise 26
1. When you create a variable, where is it stored: primary, secondary, or
tertiary memory?
2. What are the two steps for creating a variable?
5 Yes, in math, variables and constants are different concepts, but in Python, a constant is
3. Can you do those two steps in one line, or must they be done in two
lines?
4. What character should you use to initialize a variable?
Exercise 27
1. What is the difference between a datatype and a variable?
2. What datatype can a variable be?
Exercise 28
What is a good variable name for the following pieces of data? What type
of variable would Python probably initialize it to? Provide an example of how
Python would store that data. For example, if you came up with a variable
named numCookies that was an integer, then an example of the data might be
5.
1. The name of your favorite sports team
2. What home stadium your team plays at (or home city)
3. The number of wins your team got
4. Whether your team qualified for the championship playoffs or not
5. The total number of matches or games your team played
6. The height of your favorite player on that team in meters
7. The average weight of the athletes on that team in kilograms
8. The number of players on the team
9. Whether the team is sponsored or not
Exercise 29
Are the following variables syntactically correct for Python? Why or why
not?
1. pointsReceived
2. PointsReceived
3. 10DayPointsReceived
4. _pointsReceived
5. _PointsReceived
Exercise 30
1. Consider the variables pointsReceived and PointsReceived. Are these
variables interchangable? Why or why not?
2. Consider the variables _pointsReceived and pointsReceived. Are
these variables interchangable? Why or why not?
CHAPTER 4. GENERAL PYTHON PROGRAMMING 50
1 Python is cool !
Observe how we are not using any quotation marks inside of our print state-
ment. This is because we are not actually printing ”stringToPrint,” but rather,
we are printing the string literal that is stored as a value of the variable
stringToPrint.
Recall how we created a variable and filled with something. The process
of creating the variable was called the declaration, and the ”filling” was called
the initialization. We can also reinitialize a variable by simply overwriting the
contents of the memory space that the variable uses. We need not redeclare the
variable, since it already exists. Attempting to redeclare an existing variable
will result in a syntax error. Consider the following code.
1 string1 = " Python is cool ! "
2 print ( string1 )
3 string1 = " Python is rad ! "
4 print ( string1 )
1 Python is cool !
2 Python is rad !
In this code sample, we are printing the same variable string1 twice. However,
we also observe that the output changes. This is because we’re reinitializing
the variable in the third line using string1 = "Python is rad!". Because of
this, we will get a different result when we try to print the same variable.
The same can also be said for variables that have non-string contents. For
example, consider the following code.
1 int1 = 5
2 print ( str ( int1 ) )
3 int1 = 7
4 print ( str ( int1 ) )
51 4.2. VARIABLES
1 5
2 7
Since we’re changing the contents of the variable, the second print statement is
simply getting the updated variable value.6
Exercise Questions
These exercise questions cover chapter 4.2.2.
Exercise 31
1. Define a new variable name and initialize it to your name.
2. Now, print ”Your name is: ” and concatenate it with the name variable
in the print statement.
Exercise 32
1. Using your variable naming rules, create a new variable that would de-
scribe the name of a building on your college campus. Initialize it to the
name of a building on your college campus.
2. Create a new variable that describes the discipline of that building
(math, history, psychology, English, etc.) and initialize it to that value.
can be printed by the print() function. We’ll cover typecasting in chapter 4.6.
CHAPTER 4. GENERAL PYTHON PROGRAMMING 52
3 print ( " Your name is " , name , " . " , sep = " " )
When you run this code, your program will prompt you for your name. You
can then type directly into the console using your keyboard. When you press
Enter, the contents that you typed are put into the variable name as a string.
The last line then prints some string literals and your name.
When using input(), it doesn’t matter whether you input only letters or
only numbers. The input() function will always return the input as a string.
You can, however, typecast a variable as some other datatype. See chapter 4.6
on typecasting.
The input() function can take one argument, a string to prompt the user.
You can either pass in a variable which is of type string or just use a string
literal. So, the following code is logically the same as the previous example.
1 name = input ( " What is your name ? " )
2 print ( " Your name is " , name , " . " , sep = " " )
Exercise Questions
These exercise questions cover chapters 4.2.3.
Exercise 33
1. Is it possible to use the input() function standalone (that is, without
an variable and assignment operator)? Why or why not?
2. What happens when you try to use the input() function standalone? If
you receive an error, what kind of error is it?
Exercise 34
1. Create a new variable age and initialize it to the Nonetype. See chapter
4.2.1 for a review on the Nonetype.
2. Use the input() function to ask for the user’s age and put the result
into the age.
3. Print the datatype of age. What datatype is it? Why is this so? (Hint:
to print the datatype of a variable, you can print the type(variableName).)
Exercise 35
53 4.2. VARIABLES
4.3 Whitespace
Whitespace is a crucial component of every programming language, and using
whitespace correctly is very important. Different languages handle whitespace
differently, but you should know how to correctly format your code, regardless
of which language you’re writing code in.
1 if ( a > 0) :
2 print ( a )
3 else :
4 if ( a < 10) :
5 print ( a )
6 else :
7 print ( " a is negative or greater than 10 " )
This code is fairly sloppy. Sure it’s readable, but keep in mind that this is only a
short snippit of code. If our code was hundreds of lines long, it’s easy to see how
small indentation errors could add up, resulting in trouble finding where the
beginnings and ends of functions are, where your loops are controlled, among
other things.
Instead, we could fix our code as follows.
1 if ( a > 0) :
2 print ( a )
3 else :
4 if ( a < 10) :
5 print ( a )
6 else :
7 print ( " a is negative or greater than 10 " )
55 4.3. WHITESPACE
Observe how this code uses consistent whitespace patterns for all of its inden-
tation. One indentation is, in this case, always two spaces. Indentation doesn’t
always need to be two spaces. It could just as easily be four or eight spaces, or
a tab character.
Regardless of which whitespace rules you choose to follow and those you
choose to ignore, think about how your decisions will affect the readability of
the code.
Whitespace is also vertical. You can generally put empty lines at logical
breaks in the code, such as between loops, functions, or if/else statements.
Consider the following code.
1 def function1 ( arg1 , arg2 ) :
2 return arg1 + arg2
3 def function2 ( arg1 , arg2 ) :
4 return arg1 * arg2
5 def function3 ( arg1 , arg2 ) :
6 return arg1 / / arg2
You don’t need to understand exactly what this code does, but while it does
have good horizontal whitespace, we could improve its vertical whitespace by
adding extra empty lines. Compare the previous example to the following code.
1 def function1 ( arg1 , arg2 ) :
2 return arg1 + arg2
3
4 def function2 ( arg1 , arg2 ) :
5 return arg1 * arg2
6
7 def function3 ( arg1 , arg2 ) :
8 return arg1 / / arg2
Breaking apart each of these logical blocks by adding extra empty lines doesn’t
affect the functionality of the software, but it does make it a lot easier to read.
It’s easier to see where the block starts and where it ends.
The comparison operators (==, !=, <, >, <=, and >=) should always be
surrounded by one space on either side. For example, use cats >= 10
instead of cats>=10.
CHAPTER 4. GENERAL PYTHON PROGRAMMING 56
Use the same indentation standard throughout your entire project. This is
non-negotiable. Which standard you use is up to you, but stay consistent.
You can use 2 spaces, 4 spaces, 8 spaces, tab, 2 tabs, or something else
that conveys indentation, as long as you’re consistent.
Avoid splitting individual lines into two different lines if it would disrupt
the flow of the software. Instead, turn on the soft wrap functionality in
your text editor.
Different computers and different text editors have different standards for how
they’ll treat spaces between brackets and braces and how they write tabs in,
so when you change computers, you should double-check what the tab settings
are for that computer, as you might need to change them.
Soft wrap is a really handy viewing mode that you can use in your text
editor, and we mentioned it above. But what is it? Soft wrap allows your
editor to wrap lines that are too long for one line into the next line without
affecting any functionality of your source code and without changing of the line
numbers. When you write in a standard word processor like Microsoft Word or
Google Docs, you’ve actually already used soft wrap. Notice how as you type,
when you approach the end of the line, your word processor makes a new line
for you without you having to explicitly hit the carriage return key. The same
thing can be done in your text editor or IDE, and the option is typically found
under the View options as Soft wrap or Toggle Soft Wrap. In advanced text
editors, there are three options: soft wrap, hard wrap, and no wrap. Most text
editors only use soft wrap and no wrap. The difference between a soft wrap
and a hard wrap is that soft wrapping will add your extra line without breaking
any words (it breaks on spaces) whereas a hard wrap will add your extra line
wherever the end of the line is (it breaks on characters.
You can play around with which whitespace rules you’d like to use and which
you will ignore. The important thing is that you try to stay organized and that
you are consistent throughout your entire source code file or project.
Exercise Questions
These exercise questions cover chapters 4.3.1 and 4.3.2.
Exercise 36
1. What is whitespace?
2. What directions does whitespace apply: horizontal, vertical, or horizon-
tal and vertical?
57 4.3. WHITESPACE
Exercise 37
Open your text editor or IDE. Use it to answer the following questions. You
may also need to open a source code file.
1. What does the soft-wrap option do? Where is it?
2. Where is the option to change the tab setting, if you have an option to
do so? What is it currently set to: 2 spaces, 4 spaces, 8 spaces, 1 tab, 2
tabs, or something else?
3. Do all text editors or IDEs use the same default indentation or wrap
settings?
Exercise 38
1. What should surround the assignment operator to achieve good whites-
pace?
2. What should surround a comparison operator for good whitespace?
3. Should you put an extra space in between two characters in a comparison
operator (!= versus ! =)?
4. Is it okay to change the indentation standard or whitespace pattern in
the middle of a project? When is it okay to, if ever?
5. If you had a really long line that was wider than your text editor window,
what should you do?
Exercise 39
1. Teamwork is an important part of programming. Say you’re working on
a team with several people. One of your team members doesn’t put a
space between functions and arguments, while another does. What can
you do rectify this situation? Should you just ignore it and let people
do what they want if it will make good code? Why or why not?
2. Sometimes, you’ll need to use someone else’s computer, including their
text editor in the way that it was set up. What are some things you
might want to check before starting to work on your project on that
computer?
Exercise 40
Fix the following lines of code by correcting the whitespace errors. What
did you fix, and why did you choose to make that fix? If there are no errors,
say so.
CHAPTER 4. GENERAL PYTHON PROGRAMMING 58
1.
1 if ( var ! = 3) :
2.
1 print ( " x is the smallest " )
3.
1 def addFour ( inputVal ) :
4.
1 def addFour ( inputVal ) :
2 # Do something
3 def addFive ( inputVal ) :
4 # Do something
59 4.4. COMMENTS
4.4 Comments
Comments are a critical part of any program, even if they don’t contribute to
the actual functionality of your code. Just like how whitespace is so important,
your comments are the key for you being able to understand your code. Talk
to any experienced programmer, and they’ll have stories of how they spent all
night working on a chunk of code, then forgot what it meant or how it worked
the very next day and had to rewrite the entire chunk of code.
While you don’t need to comment every single print statement, commenting
with enough detail that others can understand your code is a good habit to get
into.
1 Hello , Python !
You can use line comments to describe a line of code as shown above. They’re
also handy if you’re testing out different lines of code and you want to only
evaluate certain lines. You can also stack pound signs. Consider the following
chunk of code.
Instead of having to delete lines of code that might or might not work, you can
just comment them out until you’re absolutely sure that you don’t need them.
They won’t evaluate or execute until the pound sign before them is removed.
CHAPTER 4. GENERAL PYTHON PROGRAMMING 60
Note that none of the commented lines have pound signs at the beginning of
them. Since they’re enclosed by the quotation marks, none of these lines will
execute.
The following code is not syntactically correct. Remember that the quota-
tion marks must be on their own line.
1 """ This is a bad block comment .
2 This will result in an error . """
Block comments are a powerful tool, and they can be used to test parts of your
code without having to write pound signs before every single line. For example,
consider the following code.7
1 print ( " This code will execute . " )
7 It’s worth noting, however, that some IDEs support the ability to comment multiple lines
at once through a keyboard shortcut or menu option. When you choose an option to comment
multiple lines in the menu, the IDE will almost never make a block comment, instead inserting
pound signs before every line. The same goes for uncommenting those lines - the IDE will
remove the pound signs before the line. However, not all IDEs support this behavior. When
you are sharing code with others who might not be using the same IDE as you, you should be
very careful to use this feature, since your compatriot might have to manually remove those
same pound signs. This means that they could end up having to manually remove hundreds
of pound signs if you had mass-commented out hundreds of lines using your IDEs multiline
comment feature.
61 4.4. COMMENTS
2 """
3 print (" But this code won ’t execute . ")
4 print (" That ’s good , because there ’s a lot of these
lines . ")
5 # print (" Yay ! ")
6 """
7 print ( " So will this line of code . " )
Also note that some IDEs and text editors might color your block comments
differently from your line comments. This is nothing to be worried about.8
Exercise Questions
These exercise questions cover chapters 4.4.1 and 4.4.2.
Exercise 41
1. What are some reasons to write comments?
2. How can you mark a line comment in Python?
3. How can you mark a block comment in Python?
Exercise 42
1. Are the marks for line comments the same as in Perl? If not, what is
the symbol or symbol combination?
2. Are the marks for line comments the same as in C++ or C#? If not,
what is the symbol or symbol combination?
3. Are the marks for block comments the same as in Swift? If not, what is
the symbol or symbol combination?
4. Are the marks for block comments the same as in HTML? If not, what
is the symbol or symbol combination?
Exercise 43
1. Where can a line comment be placed? Is it possible to end a line com-
ment?
8 The reason for this is because a block comment in Python is actually a multiline string.
Suppose you had a really big string that included lots of horizontal breaks (new lines, carriage
returns). Writing your string as a multiline comment allows you to skip the \n and \r marks.
However, multiline strings make excellent block comments, and this has become the defacto
way of making a block comment in Python.
CHAPTER 4. GENERAL PYTHON PROGRAMMING 62
2. Where can a block comment be placed? How must the quotation marks
be written out, or can they just be written anywhere to begin the com-
ment?
Exercise 44
Using what you know about Python, write a comment for the following
blocks of code. If you want to put multiple comments in, specify in between
which line numbers you would put those comments (e.g., before line 1, between
lines 3 and 4). If you’re not sure what the code does, make your best guess
(imagine that it’s your code that you need to explain to someone else), but be
descriptive.
1.
1 result = input ( " What is your grade ? " )
2.
1 print ( " The result is : " , end = " " )
2 print ( str ( result * 4) )
3.
1 tax = input ( " What was your latest in taxes ? " )
2 deductible = input ( " What was your latest
insurance deductible rate ? " )
3 print ( " Here ’s some information we calculated : " )
63 4.5. ERRORS
4.5 Errors
Errors are an inevitable part of programming, but it’s a critical skill to under-
stand how to read errors and what kind of errors you might get. Being able to
understand the error messages that you get means that you’ll be able to resolve
them much faster.
This is an example of a syntax error. If you tried to write this exact code in
your IDE and run it, you’d get an error on line 1. The interpreter sees an
opening quotation mark, but it can’t find a matching closing quotation mark,
so it throws an error instead. In this case, it’s clear that there’s an issue, since
our interpreter throws the error:
SyntaxError : EOL while scanning string literal
Our IDE also gives us hints with the coloring of the text. In this text, strings
are colored red, so the fact that we see red coloring on line 2 means that we
never closed the string on line 1. There are several default error types, of which
SyntaxError is one. Some of the other types of errors are as follows.
AssertionError: raised when an assertion fails
AttributeError: raised when an attribute reference or attribute assign-
ment in a function fails; if the object doesn’t support attribute references
or assignments, Python will raise a TypeError instead
EOFError: raised when the input() function reaches the end of a file or
hits the end-of-file condition without reading any data
CHAPTER 4. GENERAL PYTHON PROGRAMMING 64
OSError: raised when the host operating system (like Windows, MacOS,
or Linux) encounters a system-related error in response to a Python re-
quest, including I/O failures like ”file not found” or ”disk full” errors
There are plenty of other errors that you could encounter, including ones that
are not defined by Python itself.
Now, consider the following code. Can you spot the error? (Hint, it’s in line
2.)
65 4.5. ERRORS
We’ve said it before, and we’ll say it again: never just ignore
a warning. They could be giving you some very valuable infor-
mation (including potential syntax or logical errors)!
Warning messages can occur for all sorts of reasons, including out-of-date
packages, missing functions that aren’t called, possible indentation errors, or
many other things.
CHAPTER 4. GENERAL PYTHON PROGRAMMING 66
Exercise Questions
These exercise questions cover chapters 4.5.1.
Exercise 45
1. Name three different types of errors that you might receive in Python.
2. For each of the three errors that you encountered in part (1), write lines
of code that will result in the error.
3. For each of the three errors that you created in part (2), give the error
code that you received and what the error code tells you.
Exercise 46
Examine the following error message.
File " < stdin >" , line 1
input (" What is your name ’)
^
SyntaxError : EOL while scanning string literal
4.5.2 Debugging
There are many, many types of errors that you might experience while program-
ming, and being able to understand what those errors are is key to remedying
them. Error messages contain useful information, and you shouldn’t dismiss
their importance.
Error codes will typically contain certain bits of information, including the
file that the error is occurring in, the line number of the error, the line itself,
where in the line the error is occurring, and some error code. Let’s take a look
at a few errors to figure out what errors look like.
Consider the following code.
1 # This is some code that won ’t run quite right
2 # Line 3 is a print statement
3 print ( ’ What \ ’ s your name ?")
In the first line of the error, the interpreter is offering some important infor-
mation. Firstly, it’s telling us that this error is occurring in the ”main.py” file.
This is especially important if you’re working with multiple files for just one
program, which can happen quite easily if you’re working with classes. The
interpreter is telling us that on line 3, there’s an error. Next, the interpreter
is copying the offending line so that we can get a glimpse of what might be
causing the error. On the next line, the interpreter is trying to point out where
in the line it thinks the error is coming from. Depending on the type of error,
the interpreter might be close or right on the issue, and other times, it’ll put
the carot at the end of the line if it can’t figure it out. Lastly, the interpreter
will give you the error code.
These error codes can be quite difficult to decipher, but there’s two parts to
the error: the classification of the error and the actual error code. In this case,
the error class is a SyntaxError and the error code is an
EOL while scanning string literal error.
Let’s look at another syntax error.
1 name = input ( ’ What is your name ?\ n ’)
2 print ( ’Hi , ’ , userName , sep = " " )
Let’s break down this output. The first line of the output is working as we
expect it to. The input() is printing a prompt for our user, which is showing
up. The second line of the output is what our user entered. However, on the
third line, we see a Traceback, with the most recent call listed last. If you had
a chain of errors, which typically occurs when you’re using functions, classes,
or modules, the root of these errors would show up so that you can trace back
the issue. Next, we see the filename (main.py) and the line number that the
error is occurring in. Next, the interpreter is giving us the offending line of
code, but it doesn’t have a clue where the error is, so it doesn’t even try to
point out where it thinks the error is. Lastly, it gave us the error class and
the error code. The error class is NameError, which might occur if there’s an
unspecified variable or object that’s being called. Sure enough, the error code
indicates that ’userName’ is not defined. This code’s author accidentally wrote
the wrong variable name in the print statement. If we replace userName with
name, the code runs properly.
In the process of programming, you’ll probably run into new errors that
you haven’t seen before, such as TypeErrors, IndexErrors, ImportErrors,
CHAPTER 4. GENERAL PYTHON PROGRAMMING 68
You may also get something called a traceback, especially if your error has
been caused by a chain of events (like a function call). Don’t worry about
understanding how chains of events are formed (this is called data structures,
which we’ll cover in Chapter 7). As its name suggests, a traceback provides
breadcrumbs for you to find the origin of the error. The first breadcrumb will
fall on the line of code that caused the execution error. The second breadcrumb
will give you the line of code that caused the first breadcrumb’s error. The third
breadcrumb will give you the line of code that caused the second breadcrumb’s
error, and so on. Python could just give you the line of code that caused
the exact error, but it often doesn’t reveal deeper problems that will help you
diagnose a problem. The error may occur anywhere between the first and last
breadcrumb, but as you get better at reading a traceback error log, you’ll also
get better at reading exactly where an error occurred.9
9 Read this footnote after you’ve read Chapter 7.1 on functions, since we’ll give you some
more information on tracebacks that rely on knowledge from that chapter. Tracebacks are
commonly found when you have a function that called another function. For example, let’s
say you had a function called double() that was responsible for doubling an integer or a float.
Then, later in the code, you called double("spot"). Obviously, ”spot” is not an integer nor a
float, and this will result in a syntax error. However, the act of calling double("spot") is not
inherently syntactically invalid: the act of multiplying ”spot” by two is invalid. If Python only
gave us the line of code that failed, we’d only get a line saying that double("spot") failed,
but not why it failed or why ”spot” is an invalid argument. However, the traceback would
give us the following information: the line double("spot") failed in the function double()
because the double() function encountered a TypeError. Along with this information, the
traceback will also give us line numbers for each line of code the error occurred and which
file it’s in.
69 4.5. ERRORS
Exercise Questions
These exercise questions cover chapters 4.5.1 and 4.5.2.
Exercise 47
1. Name three different types of errors that you might receive in Python.
2. For each of the three errors that you encountered in part (1), write lines
of code that will result in the error.
3. For each of the three errors that you created in part (2), give the error
code that you received and what the error code tells you.
Exercise 48
Examine the following error message.
File " < stdin >" , line 1
input (" What is your name ’)
^
SyntaxError : EOL while scanning string literal
Based on what we’ve already seen, this will result in a NameError. However,
we can deal with this error by using a try...except set.
1 try :
2 print ( x )
3 except :
4 print ( " Something went wrong . " )
If we had something outside of the exception block, the program would continue
to run. For example, consider the following code.
1 try :
2 print ( x )
3 except :
4 print ( " Something went wrong . " )
5 print ( " All done ! " )
Since the print("All done!") is outside of the exception block because of its
indentation, it is run regardless of the results of the try...except blocks.
71 4.5. ERRORS
We can also throw different exceptions depending on the type of error that
we get. For example, consider the following block of code.
1 try :
2 print ( x )
3 except NameError :
4 print ( " X isn ’t defined ! " )
5 except :
6 print ( " Something else went wrong ! " )
X isn ’ t defined !
X isn ’ t defined !
You can also throw your own exceptions. To throw your own exceptions, you
can use raise. You can either define what kind of error to raise and what the
error code is. You can also make a general exception by using the Exception
error class.
Consider the following code.
1 x = " hello "
2 if not type ( x ) is int :
3 raise Exception ( " Only integers allowed " )
You don’t have to understand exactly what this code does, but do pay attention
to the exception. This code tests if the variable x is an integer. If it isn’t, then
it’ll throw a generic exception. We can also throw a TypeError, since we could
generally classify this as a datatype error. Consider the following code.
CHAPTER 4. GENERAL PYTHON PROGRAMMING 72
There is one final block that you can include in a try-except block in Python:
finally. The finally block runs regardless of whether the code in the try
block successfully executed or not. Most of the time, developers just skip the
finally block, since the code that falls outside of the except block runs any-
ways. However, this can still be useful for explicitly running something.
This is massively useful if you plan on throwing a lot of errors and you’d
like them to be remotely organized.
The careful and controlled use of exception handling is generally classi-
fied under graceful failure. Graceful failure is the ability for a system to fail
without crashing or even self-recovering. The mode of recovery might in-
clude restarting, giving a warning, or bypassing that portion of the script.
Why is graceful failure important?
”tail” keyword gets the last ten lines of a file, and the -f flag enables
the filestreaming mode so you can see when the file is written to by the
system in real time. The /var/log directory holds most of the logs on most
Linux systems, and the syslog file holds almost all system-related messages
(except for authentication messages).
73 4.5. ERRORS
Exercise Questions
These exercise questions cover chapter 4.5.3.
Exercise 49
1. What are the keywords used for a try-catch block in Python?
Exercise 50
1. If an exception is thrown, where can one get the exception message from?
That is, let’s say we wanted to handle the exception gracefully, but we
still wanted to know what the error was before we continue. Where is
the message located, and how can we view it? Provide either theory or
a working example.
Exercise 51
1. What is an exception error class?
2. Provide five examples of possible exception error classes.
3. Is it possible to make your own exception error class?
CHAPTER 4. GENERAL PYTHON PROGRAMMING 74
4.6 Typecasting
Typecasting is an important part of dealing with data. Sure, we’ve talked about
typecasting in previous chapters, but what actually is it, and how can we use
it?
Recall the material that you learned in Chapter 3, on the basic datatypes
in Python. We know how to use Booleans, integers, floats, and strings, but it’d
also be really handy to be able to convert between different datatypes. This
is the essence of typecasting. Some of the functionality in Python and some
functions can only accept certain types of data, so typecasting allows us to have
the exact type of data needed to execute these operations.
Let’s look at one of the most common functions that you’ll use in Python:
the input() function. We know that input() can take one argument, but what
does it spit out?
A little bit of sleuthing in the Python documentation reveals that input()
always returns a string. Consider the following code. Assume that the variable
price has not been declared or initialized already.
1 price = input ( " Input the total price of your groceries
as a decimal : " )
2 print ( price )
What datatype would price be? No matter what kind of data we try to input
in our program, price will always be a string, since that’s the datatype that
input() will always insert into the variable that it is being assigned to.
This would be great if we could always work with strings, but this just isn’t
the case. For example, arithmetic operations, which we’ll cover in chapter 4.8,
require us to use an integer or a float, not a string.
As you can probably guess, this is where typecasting comes in. We can try
to typecast our input into the datatypes that we need so that we can perform
our basic arithmetic operations. Consider the following code. Assume that the
variable price has not been declared or initialized already. It’s okay if you
don’t understand what’s going on in the third line yet.
1 price = input ( " Input the total price of your groceries
as a decimal : " )
2 price = float ( price )
3 tax = price * 0 . 08
4 print ( " You need an extra \ $ " , tax , " for tax , sep = " "
)
We can see that we’re using a previously declared and initialized variable price
and we’re reinitializing the variable’s value to the typecasted value of the orig-
inal variable.
In line 1, price was declared a string, since it was assigned a string by the
input() function.
Then, in line 2, we typecast that string into a float and replaced the value
of the original string with the new float. In line 3, we did some arithmetic on
our float and put the result into the variable tax. We don’t need to typecast
tax, since the value of price * 0.08 is already a float. Finally, in line 4, we’re
printing a string literal using string concatenation.
We can also typecast variables to different datatypes. The functions are as
described here.
In the above table, the argument can be any type of variable: a Boolean,
integer, float, or string. However, it’s worth noting that while you could pass
any datatype into your typecasting function, not all variables can be passed
in. For example, consider the following example.
1 cat = input ( " Input your favorite type of cat : " )
2 cat = int ( cat )
3 print ( cat )
This would almost certainly not work, since we can’t typecast ”tabby” or ”Jelli-
cle” to an integer. In this case, you’ll probably end up with a TypeError letting
you know that your typecast attempt is invalid.
You can handle these errors in all manner of different ways, such as those
described in chapter 4.5.3. Using this, you can let your user know that they
need to input their data according to a certain format.
Let’s say that you were writing a program that attempted to calculate the
amount of tax that someone would pay on their groceries. You know that
you’ll need to typecast into a float, but your user might input something that
isn’t typecastable, like $45.97 instead of just 45.97. So, you could write the
following block instead to catch the exception and prompt the user again.
Again, assume that price and tax are undeclared and uninitialized.
1 price = 0 . 0
2 while price = = 0 . 0:
3 try :
4 price = input ( " Input the total price of your
groceries as a
5 decimal : " )
CHAPTER 4. GENERAL PYTHON PROGRAMMING 76
We’ll get to what the while is in chapter 6.3, but the gist of the code is, Python
will declare a new variable price to be 0.0. As long as the price is 0.0, Python
will assume that the user hasn’t inputted a valid price and will keep prompting
the user until they enter a valid price that can be typecast into a float. Then,
it can carry on with the rest of the program.
Exercise Questions
These exercise questions cover chapter 4.6.
Exercise 52
1. What is typecasting?
2. What are the basic datatypes in Python that can be typecast?
3. Why might one want to typecast?
Exercise 53
1. What is the syntax to typecast an integer into a string?
2. What is the syntax to typecast a string into an integer?
3. What is the syntax to typecast an integer into a float?
Exercise 54
Write the applicable code to typecast the following variables into a string.
If typecasting cannot be performed or is not applicable to the variable, explain
why.
1. v = 3
2. cats = "Tabby"
3. isBlue = True
4. r2 = 3.2209
Exercise 55
77 4.6. TYPECASTING
Write the applicable code to typecast the following variables into an integer.
If typecasting cannot be performed or is not applicable to the variable, explain
why.
1. v = 3
2. cats = 5.0
3. n = "three"
4. k = 9.003
Exercise 56
1. Provide an example for why you would want to typecast a string into an
integer.
2. Provide an example for why you would want to typecast an integer into
a string.
CHAPTER 4. GENERAL PYTHON PROGRAMMING 78
4.7 F-strings
Beginning in Python 3.6, Python included something called a f-string.10 F-
strings allow you to print two or more things, one of which as a variable, with-
out explicitly concatenating. Essentially, f-strings create areas with placeholder
objects without having to close your string quotes, typecast, or add a concate-
nation operator.
The fundamental parts of a f-string are the literal and the variable. The
literal portion of the f-string is printed just as any other string is. As a review,
this is a string literal printed.
1 print ( " The cat is orange . " )
This should look familiar. We still have a string literal portion (”The cat is ”),
a variable (”orange”), and another string literal (”.”). However, we could use a
f-string to avoid having to close our quotes out at all. To do this, we can simply
enclose our variable inside of curly braces {}. The curly braces tell Python that
the contents of the curly brace are actually a variable, and we want Python to
take that variable instead of the literal string.
The above example, as a f-string, looks like this.
1 color = " orange "
2 print ( f ’ The cat is { color } . ’)
We’re not actually concatenating here, so we don’t need the sep argument
to dictate what the separator value should be between concatenated objects.
Instead, we can build our separation right into our f-string.
In the above f-string, take note of the following. The f-string is predicated
with a f. This tells Python that we want to use a f-string, rather than a
traditional string. Next, there is no space between the f and the opening quote
10 The ability to use f-strings should be in standard Python after version 3.6, but some
interpreters don’t respect f-strings, instead printing literally or even erroring out because of
a non-escaped special character (a curly brace). Your instructor should be able to tell you
whether the interpreter that you are using supports f-strings to avoid immense frustration!
79 4.7. F-STRINGS
’. Putting a space between the f and the opening quote ’ is not syntactically
correct. Finally, our variable color is only enclosed in curly braces inside of
our f-string.
If we had a non-string value in a variable, like an integer or float, we cannot
just print that value. As a review, we end up with a TypeError.
1 print ( " The value is " + 1)
TypeError : can only concatenate str ( not " int ") to str
To correct this, we must typecast our integer, then print the typecasted value.
1 print ( " The value is " + str (1) )
The value is 1
Observe how we never typecasted value. The type of value is still an integer
when we print it in our f-string. However, it is implicitly typecast to a string
before being printed, meaning that we never need to explicitly typecast the
value.
F-strings are a great way to increase the readability of your code. They
decrease the amount of extra symbols, since we don’t need to explicitly con-
catenate, typecast, or close our string to still call a variable inside of a string.
Exercise Questions
These exercise questions cover chapter 4.7.
Exercise 57
1. What is a f-string, and how is a f-string different from a standard string?
Exercise 58
1. In a new Python script, write a input() statement that asks the user
for their name and puts the return into a new variable called name.
2. What is the datatype of name?
3. Print Your name is and the value of name using a standard string with
concatenation (not using a f-string).
4. Print Your name is and the value of name using a f-string.
Exercise 59
1. In a new Python script, create a variable named value and initialize the
variable to 10.60.
2. Print The value is and the value of value using a standard string with
concatenation (not using a f-string).
3. Print The value is and the value of value using a f-string.
Exercise 60
1. In a new Python script, create two variables named weight and height.
Set weight to be a float and height to be an integer.
2. Initialize the value of weight to be a weight in kilograms.
3. Initialize the value of height to be a height in centimeters.
4. Print The weight is, the value of weight, , and the height is, and
the value of height. Use a standard string with concatenation (not using
a f-string).
5. Print The weight is, the value of weight, , and the height is, and
the value of height. This time, use a f-string.
Exercise 61
1. When using string concatenation in a print() statement, what is the
default separator character?
2. When using a f-string, is there a default separator character from the
string literal to the variable? Provide an example to prove your point.
81 4.8. STATEMENTS AND EXPRESSIONS REVIEW
We can also use a function on the right side of an assignment operator, as long
as that function produces something (return something). For example, consider
the following code.
1 cats = input ( " Input your favorite cat : " )
However, if the function doesn’t produce something, then Python will create
a syntax error. The only way to prevent this from happening is to handle
the exception; using exception handling, you can assign the Nonetype to the
variable. For example, consider the following code.
1 cats = print ( " tabby cat " )
This code will result in a syntax error, since the print() function can take
arguments, but cannot return anything back.
We’ve also seen standalone statements, such as the print() function. State-
ments like these don’t use any operators, such as =. If you try to use an assign-
ment operator with a standalone statement, you’ll end up with a syntax error,
since this is not syntactically correct, according to Python.
Lastly, we’ve seen the block format, where certain code routines are run as
part of another block of code. Most recently, we’ve seen this used in try...except
blocks. To denote that a code routine should run as a part of another line of
code, like try or except, we should use an indentation. The exact style of
indentation isn’t terribly important, so you could use two spaces, four spaces,
a tab, or two tabs, but according to our whitespace rules, it’s important to be
consistent with whichever style we choose to use.
For example, this chunk of code will run a try...except. The underscores
represent spaces.11
1 # DO NOT RUN
2 try :
3 __price = input ( " Input a price : " )
4 __price = float ( price )
11 This code is not technically correct: the underscore might cause some interpreters to
misinterpret the purpose of the underscore. Avoid using more than one underscore unless
you have a very good reason to do so.
CHAPTER 4. GENERAL PYTHON PROGRAMMING 82
5 except :
6 __print ( " That ’s not a valid price . " )
This chunk of code will also run the same code, even though it uses more spaces.
The important thing is that we’re consistent with how many spaces we choose
to use.
1 # DO NOT RUN
2 try :
3 ____price = input ( " Input a price : " )
4 ____price = float ( price )
5 except :
6 ____print ( " That ’s not a valid price . " )
Again, assume that the underscores represent spaces. This application of in-
dentation will become more apparent and necessary in later chapters, such as
when we move onto loops and control structures.
83 4.9. ARITHMETIC OPERATIONS
Operation Description
+ Addition
- Subtraction
* Multiplication
** Exponentiation
/ Division
// Integer Division
% Modulo
x=2+4
This is a mathematically correct equation. We’re taking some value on the
right side and putting it into the variable x. Now, consider the following in a
mathematical context:
x=x+2
This doesn’t make much sense, and in fact, it is wrong mathematically. There
is no case where x and x + 2 can be the same. However, this is perfectly okay
in Python. Consider the following in a programming context:
CHAPTER 4. GENERAL PYTHON PROGRAMMING 84
1 x = 5
2 x = x + 2
In this case, we’re initializing the value of the variable x to be 5, then we’re
adding 2 to the value of x. Even though this is not mathematically correct, this
is correct in programming. However, it’s worth noting that the following is not
syntactically correct:
1 x = 5
2 x + 2 = x # WRONG
As we saw in chapter 4.2, our variable needs to be the only thing on the left
side. Again, this might be mathematically correct, but it’s incorrect in Python.
1 2 + 4 = x # WRONG
In this case, Python will evaluate your expression from left to right. So, in the
above example, the interpreter would add 2 + 3, then add 4 to the result.
Likewise, we can use - to subtract. You need to put one thing to the left
and one thing to the right of the subtraction operator, just like in addition. If
you second number is larger than your first number, the result will be stored
as a negative number.
1 y = 5 - 7
2 print ( y )
-2
The * operator multiplies the first number by the second number. Python will
respect the polarity of numbers, just as you were taught in math class.
1 z = -3 * -5
2 print ( z )
85 4.9. ARITHMETIC OPERATIONS
15
How can Python know the difference between a negative number and the sub-
traction operator? For one, the negation operator must come with something
on the left and the right, whereas a negative number can only have something
on the right (nothing on the left). Secondly, negative numbers must be written
with no space between the negation symbol and the number itself as shown.
Even with our whitespace rules, this is considered common practice.
The last of our common operators is the division operator, or /. The division
operator can take two elements, one before and one after. If the number that
results from the division is not an integer, the result will be stored as a floating
point number.
1 a = 15 / 5
2 print ( a )
3 b = 16 / 5
4 print ( b )
3
3.2
That covers the last of the four common operators. However, there are some
operators that are very useful for programmers. One of these operators is the
exponentiation operator. This operator is written as two asterisks with no
space in between in Python: **.12 Putting a space in between will confuse your
Python interpreter. The number that comes before the operator will be treated
as the base and the number that comes after will be treated as the exponent.
So, take a look at the following code.
1 c = 2 ** 4
2 print ( c )
16
c = 24
ˆ Versus **
12 Not all languages use two asterisks. Other languages use a carot or require a separate
Another operator that’s very useful is the integer division operator. Sim-
ilar to the exponentiation operator, the integer division operator is written as
two forward-slashes with no space in between in Python: //. Integer division
is similar to regular division, but the result is only the whole number portion
of the result; everything after the decimal point, including the decimal point,
is dropped.
1 b = 16 / 2
2 print ( b )
3 d = 16 / / 2
4 print ( d )
3.2
3
As its name suggests, integer division returns an integer, never a float, unlike
regular division.
Quite the inverse of integer division, the last operator is the modulo. The
modulo operator is represented with a percent sign %. While integer division
returns the whole number portion of the division as an integer, the module
returns the remainder portion of the division as an integer, without the leading
decimal point.
1 b = 16 / 2
2 print ( b )
3 d = 16 / / 2
4 print ( d )
5 e = 16 % 2
6 print ( e )
3.2
3
2
It’s also possible to combine different operations in one line. Python respects
the order of operations, and it groups its operations in a fairly standard way,
with the exception of the integer division and modulo, which aren’t a part of
the standard order of operations.
As a review, PEMDAS stands for Parentheses, Exponents, Multiplication,
Division, Addition, and Subtraction. In Python, all parentheses are evaluated
first. If there are multiple operations in a parentheses, it will evaluate them
according to the order of operations, but if all of the operations fall in the same
category, it will evaluate left to right. Python allows the nesting of parentheses,
just like in regular math.
87 4.9. ARITHMETIC OPERATIONS
Order of Operations
15
In the above example, Python will evaluate the innermost parentheses first
2+2 = 4, then the outer parentheses 3+4 = 7. If there are multiple parentheses
at the same depth, then Python evaluates by the most inner depth, from left
to right.
1 print (1 + 2 * ((1 + 2) + (2 + 2) ) )
15
The above code will evaluate the 1 + 2 = 3 first, then 2 + 2 = 4, then use those
results to evaluate 3 + 4 = 7. Next, Python will evaluate all exponents. All
exponents are evaluated from left to right.
Next, Python will evaluate multiplication, division, integer division, and
modulo, from left to right. It doesn’t matter which operation comes first within
this class of operations, all are evaluated from left to right.
1 print (2 * 2 / 8 * 4)
2.0
In this example, Python will evaluate the first 2 ∗ 2 = 4, since it’s the first
evaluation of the expression. Next, Python will evaluate 4/8 = 0.5, then 0.5∗4 =
2.0. Since the division result in the second step results in a float, the entire
result is printed as a float.
Lastly, Python will evaluate any addition or subtraction, from left to right.
Like multiplication, division, integer division, and modulo, it doesn’t matter
whether an addition or subtraction operator comes first. Both are evaluated at
the same priority level.
1 print (2 + 4 - 3 + 8)
11
CHAPTER 4. GENERAL PYTHON PROGRAMMING 88
Since all of the operations in the above example fall in the same class (addition
and subtraction), the entire expression is evaluated from left to right: first 2+4,
then 6 − 3, then 3 + 8.
If the code were as follows, then it would be evaluated differently.
1 print (2 + 4 * 3 + 8)
22
Exercise Questions
These exercise questions cover chapter 4.8.
Exercise 62
1. How many primitive arithmetic operations exist in Python?
2. Write out all of the arithmetic operations that Python offers.
Exercise 63
Consider the following code snips. Are they syntactically correct? Why or
why not?
1. x = x + 9
2. x = x + x + 6
3. x + 9 = x
Exercise 64
1. Say you wanted to write a negative number. Do you have to do anything
special to make Python understand that you want a negative number
instead of a subtraction arithmetic operation?
2. Does Python respect the order of operations?
3. Write a line of Python code to prove that Python respects or does not
respect the order of operations, including what the answer should be and
the output that Python actually gives you.
Exercise 65
89 4.9. ARITHMETIC OPERATIONS
Write the applicable code to evaluate the following lines in Python. Imagine
that you are typing directly into the Python interpreter as a calculator. That
means there’s no need to put the result into a variable.
1. Two plus two
2. Four plus three plus nine
3. Eight subtracted from ten
4. Negative seven times positive four
5. Twenty divided by four
6. Twenty divided by three, with a decimal remainder
7. Twenty divided by three, with no remainder
8. The remainder portion of eight divided by three, with no leading decimal
point
9. Eight squared
10. Eight cubed
11. Eight to the eighth power
12. Eight plus three, three minus two, and four times seven, all multiplied
together
CHAPTER 4. GENERAL PYTHON PROGRAMMING 90
Chapter 5
Complex Datatypes
It may seem strange that we aren’t covering complex datatypes right next to
simple datatypes, but it’s for a reason: knowing how to use simple datatypes and
basic Python functionality is key to understanding and utilizing these complex
datatypes.
Python has four complex datatypes that are available for you to use out-of-
the-box: lists, dictionaries, tuples, and sets. We will go over all four types, but
we will spend most of our focus on lists and dictionaries.
It’s also possible for us to make our own complex datatypes, and we’ll do
that when we cover classes in Chapter 7.
91
CHAPTER 5. COMPLEX DATATYPES 92
5.1 Lists
Let’s imagine for a moment: you want to process the statistics for each rider
on your favorite pro cycling teams. Sure, you could create variables for name,
functional threshold power (FTP), peak power, and watts per kilogram for
every rider. What might this look like?1
1 ath1name = " Peter Sagan "
2 ath1ftp = 470
3 ath1peak = 1230
4 ath1wpk = 6 . 7
5 ath2name = " Caleb Ewan "
6 ath2ftp = 471
7 ath2peak = 1903
8 ath2wpk = 7 . 0
Here we run into an inherent problem with individual variables: it gets really
messy, really fast. It’d be really easy to spend all day making individual vari-
ables for every single rider. Well, there’s a better way: lists. Lists are one of
the complex datatypes in Python, and they allow us to store multiple pieces
of individual data in a single data structure.
You can think of a list as a sequence of individual pieces of data. Instead
of storing our data as individual variables, we can store all of the names in one
list, all of the FTP data in a different list, all of the peak power figures in a
third list, and all of the watts per kilogram measurements in a fourth list. Let’s
look at how to create these lists and get the data out of them.
As you can see, we’re adding elements to our lists just as we would assign
individual values to a variable. We use the same syntax to indicate a string
(double quotes) or float (decimal points).
If your code isn’t working, you might want to double-check that you’ve
placed your commas correctly. The following code has just one misplaced
comma that will prevent the code from running properly.
1 names = [ " Peter Sagan , "
2 " Caleb Ewan " ,
3 " Mathieu Van Der Poel " ,
4 " Chris Froome " ,
5 " Mark Cavendish " ]
It’s also possible to mix the datatypes that go in your lists. For example, we
could also structure our data by creating one variable for each athlete, as shown
here.
1 saganData = [ " Peter Sagan " , 470 , 1230 , 6 . 7]
2 ewanData = [ " Caleb Ewan " , 471 , 1903 , 7 . 0]
3 vanDerPoelData = [ " Mathieu Van Der Poel " , 460 , 1653 , 6
. 7]
4 froomeData = [ " Chris Froome " , 480 , 1403 , 6 . 3]
5 cavendishData = [ " Mark Cavendish " , 465 , 1109 , 6 . 2]
When you’re writing your own programs, there isn’t necessarily a ”right” and
a ”wrong” way to represent your data, but you should consider what kind of
problem you’re trying to solve and construct your variables around that. There
are certainly ”worse” and ”better” ways to represent data. If you were trying
to create a program that calculated average stats for all of the athletes on a
team, then the first model might work better. It’s a lot easier to grab all of
CHAPTER 5. COMPLEX DATATYPES 94
the elements of just one array, rather than having to iterate through each of
the variables for each athlete. However, if you were trying to create a program
that showed all of the data for a certain athlete, then the second model would
be easier to work with. It would be a lot easier than grabbing one bit of data
from a bunch of variables.
Exercise Questions
Exercise 66
Exercise 67
1. Provide an example of a list with all of the same datatype from earlier
in this chapter.
2. For the list that you chose, what datatypes are in the list?
3. Can you put multiple datatypes into one list? If so, provide an example
of this from earlier in this chapter. If not, provide the error message that
Python generates.
Exercise 68
Exercise 69
1. Write a Python list that has the names of five of your professors or
instructors and put the list into a variable called instructors.
2. Write a Python list that has five breeds of dogs and put the list into a
variable called dogs.
3. Write a Python list that has the names of four cities and put the list
into a variable called cities.
4. Write a Python list that has the number of stories of three residence halls
on your campus and put the list into a variable called resHallHeights.
Follow the line with a comment with the name of the residence halls,
but do not store the names in the actual list.
95 5.1. LISTS
Chris Froome
There’s nothing special about the code that we just saw. But, what if we tried
to print a list?
Python very helpfully prints everything in the list. It can’t read our mind and
figure out that we only want the name, or the first element of the vector.34
Instead, we need to instruct Python to only give us a certain element in the
list. An element is one individual chunk of data in a list.
3 The term ”list” is a term very specific to Python. In other languages, the more commonly
used term is ”vector.” If you see the term ”vector,” understand it to mean what a Python
list is.
4 There is a difference between arrays and vectors, namely that arrays are assigned a specific
length in memory when they are declared, while vectors can be expanded and shrunken at
will. Since Python does not differentiate between arrays and vectors, we will not cover their
differences in depth in this book, but you’ll find more detail in a book on a language with
memory management, such as C++.
CHAPTER 5. COMPLEX DATATYPES 96
We can tell Python to give us the nth element in a list, where n is the index
of the element in the list, starting at 0. You can think of the nth element in
a list as you would think of an element in a sequence in calculus. The nth
element of the Fibonacci sequence F could be denoted as Fn , where n is the
element of the sequence. In this case, instead of sequences, we’re looking at
lists, and instead of numbers, any type of data can be represented. Let’s look
at our Chris Froome list in a table, along with the index5 numbers.
Index 0 1 2 3
Data "Chris Froome" 480 1403 6.3
If we wanted to get the name, then we could ask Python to give us the 0th
element of the array. In Python, we do this using square brackets, just not in
the same way that we used to initialize the array. Instead, we can write the
name of the variable that has the data in it, followed by square brackets with
the index number of the element of the array. Don’t add an extra space in
between the variable name and the square brackets with the index number.
Chris Froome
Since we’re printing a specific element of the array, it prints without the square
brackets, commas, or any quotation marks around strings.
5 In this book, we refer to the plural of ”index” as ”indices.” However, you may also see
It’s very easy for beginning programmers to forget that arrays start at zero in
almost every programming language.6 If you’re having issues getting a specific
element of a list, you should double-check that your index number is correct.
Just like how we’ve accessed an element in a list, we can also change that
item’s value. Changing the value of an element of a list is just like changing
an individual variable. However, instead of referring to the variable as a whole,
you should only refer to the specific index of data that is to be overwritten. For
example, say I got Chris Froome’s FTP incorrect: it should be changed to 483.
I know that the index of the FTP in the froomeData list is at location 1, so I
can get the old value easily.
1 print ( froomeData [1])
480
483
It might also be helpful to know how many elements are in the list. Python can
provide this via the len() function, which lets us know the length of the list.
len() takes one argument, the name of a list, and it returns an integer with
the number of elements in the list. Let’s say that we forgot how many athletes
were in the names list. We could use the following to print the length of the
list.
1 print ( len ( names ) )
gram) are real units for measuring cycling performance, even if the actual measurements are
made up. They represent the power that a cyclist can put out.
8 You can also pass in a tuple or a set, but these are topics that we won’t cover in depth
in this book.
99 5.1. LISTS
We begin by creating our two lists as normal. Then, we can put our list of
male names into a new list called allNames. We can then extend the allNames
list using the list of female names. After this, allNames has all of the athlete
names.
We can also remove by index by using the pop() method. The pop() method
can only take an integer, the index number of the element.
1 allNames . pop (4)
If you do not specify an argument for pop(), Python will remove the last element
of the array.
Finally, we can clear all of the elements of a list. Clearing the elements
only clears the contents; it does not remove the list itself. The list only has no
content. We do this using the clear() method. clear() takes no arguments.
1 maleNames . clear ()
CHAPTER 5. COMPLEX DATATYPES 100
Exercise Questions
Exercise 70
1. Where do indices start in Python lists?
2. Lists contain multiple objects. What is the formal name for these indi-
vidual objects?
3. If you wanted to access the nth element of a list, what index number
would you give Python in terms of n?
Exercise 71
Consider the instructors list that you made in chapter 5.1.1.
1. How long is the list? Give the Python code that gave you how long the
list is.
2. What is the third element of that list? Give the Python code that gave
you the third element. Remember, indices start at 0 in Python, so make
sure you’re getting the third element, not the fourth element!
3. Add two new professors or instructors to the instructors list. What
code did you use? Also give the new length of the list, along with the
code that you used to get the length of the list.
4. Print the instructors list. Then, remove the third professor or instruc-
tor from the instructors list by the value of the element (rather than
the index). What code did you use?
5. Without printing the list again, remove the element with index number
4 (not the fourth) from the instructors list using the index number
(rather than the element value). What code did you use?
Exercise 72
Consider the cities list that you made in chapter 5.1.1.
1. Change the third element (careful about the what the index number is!)
to ”Berlin” (it’s okay if you already have Berlin in your list, we’ll just
add it again). Provide your code.
2. Run the following in the Python interpreter: cities[2]. What is the
output?
3. Clear all of the elements from the cities list, then print the list. What
is the code that you used to clear the elements from the list? What is
the output?
4. Now, what datatype is cities: list, list of strings, or Nonetype?
101 5.2. DICTIONARIES
5.2 Dictionaries
Lists are just one way of storing multiple pieces of data. There are also dictio-
naries, and they operate in a fundamentally different ways compared to lists.
At first glance, the name makes sense, but what are the numbers next to the
names? We might be able to remember, but it’d be a lot easier if everything
was labeled. This is the perfect opportunity to use a dictionary. We could
represent the above data as shown in this code.
1 saganData = {
2 " name " : " Peter Sagan " ,
3 " ftp " : 470 ,
4 " peak " : 1230 ,
5 " wpk " : 6 . 7
6 }
CHAPTER 5. COMPLEX DATATYPES 102
7 ewanData = {
8 " name " : " Caleb Ewan " ,
9 " ftp " : 471 ,
10 " peak " : 1903 ,
11 " wpk " : 7 . 0
12 }
13 vanDerPoelData = {
14 " name " : " Mathieu Van Der Poel " ,
15 " ftp " : 460 ,
16 " peak " : 1653 ,
17 " wpk " : 6 . 7
18 }
19 froomeData = {
20 " name " : " Chris Froome " ,
21 " ftp " : 480 ,
22 " peak " : 1403 ,
23 " wpk " : 6 . 3
24 }
25 cavendishData = {
26 " name " : " Mark Cavendish " ,
27 " ftp " : 465 ,
28 " peak " : 1109 ,
29 " wpk " : 6 . 2
30 }
To create a dictionary (or five), surround the contents of each dictionary with
curly braces {}. Specify the key first, in quotes, as it is a string literal. Delimit
the key and the data using a colon :, then enter the data that should occupy
that key. Separate individual lines with commas ,. It seems complicated, but
with enough time, it’s actually pretty simple.
As you can see, you can use the same key names in multiple different vari-
ables. However, you cannot use the same key name within one dictionary. For
example, the following code is not syntactically correct, even though the values
are different.
1 saganData = {
2 " name " : " Peter Sagan " ,
3 " ftp " : 470 ,
4 " peak " : 1230 ,
5 " wpk " : 6 . 7
6 " wpk " : 6 . 9
7 }
1 saganData = {
2 " name " : " Peter Sagan " ,
3 " ftp " : 470 ,
4 " peak " : 1230 ,
5 " wpk1 " : 6 . 7
6 " wpk2 " : 6 . 9
7 }
480
If you’ve forgotten what you named your keys, you can also pull up a list of
keys for any given dictionary by using the keys() method, as shown below.
1 print ( froomeData . keys () )
new key and data pair for the dictionary. Let’s say that we wanted to add an
updated FTP value for Chris Froome in a key called updatedFTP.
1 froomeData = {
2 " name " : " Chris Froome " ,
3 " ftp " : 480 ,
4 " peak " : 1403 ,
5 " wpk " : 6 . 3
6 }
7 froomeData [ " updatedFTP " ] = 485
8 print ( froomeData . keys () )
As we can see, there is a new key named updatedFTP. We can then use this key
as we would any other.
However, dictionaries are not method-less. In order to modify existing data,
we can use the update() method. update() takes one argument, a dictionary
with new data. Let’s say that we wanted to change Chris Froome’s FTP, since
it was wrong; instead of 480, it should be 483. We can use update() to pass
in a dictionary specifying the key that should be changed and the value that it
should be changed to.
1 froomeData = {
2 " name " : " Chris Froome " ,
3 " ftp " : 480 ,
4 " peak " : 1403 ,
5 " wpk " : 6 . 3
6 }
7 froomeData . update ({ " ftp " : 483})
The brackets and braces might look confusing, but break it down and it becomes
quite simple. The outer parentheses () are for the update() method, and they
specify arguments for that method. The curly braces {} are for the dictionary,
and they specify that the data being passed in is a dictionary and what the
contents of that dictionary are.
Of note is that if you attempt to run the update() method on a dictionary
that doesn’t already have that key, Python will create the key and populate it
with the data that you specify. You can then use this key as if you had created
it by using the first method of appending to a dictionary.
Exercise Questions
These exercise questions cover chapter 5.2
Exercise 73
105 5.2. DICTIONARIES
1. What is a dictionary?
2. What are the differences between Python lists and dictionaries?
3. Lists store data in [?]-[?] pairs, whereas dictionaries store data in [?]-[?]
pairs. (Fill in the blanks)
4. Suppose you wanted to store the names of all of the NHL teams. Would
you use a list or a dictionary? Why?
Exercise 74
1. Consider a potential dictionary named falcons that would hold team
data from the Atlanta Falcons NFL team. What are some key names
that this dictionary might have?
2. Consider a potential dictionary that would hold information on one of
the dorms on your campus. What might you name the dictionary? What
are some key names that this dictionary might have?
Exercise 75
1. Create a new dictionary inside of a variable named eagles. Add four
keys: code, city, wins, and winrate. Initialize all of the keys to a
Nonetype.
2. In the eagles dictionary, change the code value to a string PHI.
3. In the eagles dictionary, change the city value to a string
Philadelphia.
4. In the eagles dictionary, change the wins value to an integer 4.
5. The Eagles played 16 games, of which they won four, tied one, and
lost eleven. Use Python to calculate the win percentage. Win percent-
age is calculated using the following formula: (2 × wins + ties)/(2 ×
total games played) × 100. Put the value in the eagles dictionary in the
winrate key as a float. Try to do this in a single line.
CHAPTER 5. COMPLEX DATATYPES 106
5.3 Tuples
Tuples are most often used to pass data between different data structures,
which we’ll cover in Chapter 7. You may have heard of tuples in precalculus,
calculus, linear algebra, or some other mathematics course, where the definition
of a tuple is a sequence of n elements, where n is some non-negative integer.
Tuples, by definition, are ordered (given that they are a sequence), and they
must be finite.
For example, (2, 3, 4, 5, 6) is a valid mathematical tuple. It has a n = 5,
which is greater than zero and is finite. The size of this tuple is immutable -
this is a 5-tuple, and it will always be a 5-tuple.
Tuple Notation
Tuples are very similar in Python. Like a mathematical tuple, a tuple must
have a size greater than 0 and must be finite. Unlike mathematical tuples,
tuples can have different datatypes in them. Like in a list, you can mix booleans,
integers, floats, and strings in a single tuple. You can also refer to tuple elements
by their index number using square brackets, just like in lists. However, you
cannot make a tuple larger or smaller after you create its size. If you create a
3-tuple, your tuple size is limited to 3. Tuples are also immutable. Once you
create a tuple, you cannot change its value.
Consider the following tuple.
1 leafygreens = ( " Romaine Lettuce " , " Iceberg Lettuce " , "
Arugula " )
We could refer to the second element (iceberg lettuce) by its index number 1.
Indices start at zero
Iceberg Lettuce
107 5.3. TUPLES
If I want to change an element in the tuple, I need to reinitialize the entire tuple
variable.
1 leafygreens = ( " Romaine Lettuce " , " Butterhead Lettuce "
, " Arugula " )
2 print ( leafygreens [1])
1 Butterhead Lettuce
Exercise Questions
These exercise questions cover chapter 5.3.
Exercise 76
1. What is a tuple?
2. How is a tuple different from a list?
3. In Python, how do we create a tuple?
4. Can tuples be changed after they are created? Provide an example back-
ing up your claim.
Exercise 77
Define a 4-tuple of strings with courses that you have taken. Do not put
the tuple into a variable.
CHAPTER 5. COMPLEX DATATYPES 108
5.4 Sets
Sets are the simplest of the complex datatypes, so if you understand lists,
dictionaries, and tuples, you already understand a set.
In mathematics, sets are simply a collection of elements. There are no
orders, sequences, or indices. Sets also cannot contain duplicate entries. Any
duplicate entries will be ignored, since in a set, all that matters is whether an
element is present in the set or not. This means that the set 2, 3, 4 and 2, 3, 4, 3
are the same, since the values 2, 3, and 4 are in both sets. In mathematics, we
write sets inside of curly braces: 2, 3, 4, 5, 6.
Exercise Questions
These exercise questions cover chapter 5.4.
No exercise questions exist for this section.
109 5.5. SUBSETTING
5.5 Subsetting
Now that we have seen indices in Python, we can subset based on those indices.
For example, consider this list.
1 names = [ " Peter Sagan " ,
2 " Caleb Ewan " ,
3 " Mathieu Van Der Poel " ,
4 " Chris Froome " ,
5 " Mark Cavendish " ]
We know that lists are indexed at 0, so if we wanted to get Mathieu Van Der
Poel out of the list, we could refer to names[2].
1 print ( names [2])
What if we wanted to get the first two elements from the list? We could
refer to the 0’th and 1’s element, sure. But, a better tool is something called
subsetting. Subsetting allows us to get any number of elements from a vector
of objects, including strings and lists.9
The subsetting operator is a colon inside of square brackets. The colon is
sort of like a ”to” operator, meaning that we can refer to index numbers i to j
as [i:j]. Consider the names list from before. If we wanted to refer to the first
three names, we really want names 0 to 3 (not including 3), so we can refer to
this as the following.
1 print ( names [0:3])
What if we wanted to refer to the last element of the vector? Python supports
something really cool called negative indexing. This essentially counts the
vector in reverse order, starting at -1. So, the last element is -1, the second-to-
last element is -2, the third-to-last element is -3, and so on. If we wanted to get
the last element of the names list, we could use negative indexing to get that
element.
9 They also work on non-native datatypes, like the Pandas dataframe and series or the
’ Mark Cavendish ’
’ Chris Froome ’
Subsetting also works on strings, where each character in the string is considered
its own element. So, let’s say we had the string:
1 text = " This will begin to make things right . I ’ ve
traveled too far , and seen too much , to ignore the
despair in the galaxy . Without the Jedi , there can
be no balance in the Force . Well , because of you
now we have a chance . "
This isn’t a short string persay, and we might not want to print the whole
string every time. If we want the first few characters of the string, we can
use the third type of subsetting: implicit subsetting. Implicit subsetting is
deliberately leaving out either the to or from marker. In implicit subsetting,
Python will assume that you want to subset from the beginning of the vector
to the set mark or from the set mark to the end of the vector. Where a blank
exists, Python fills the blank with either the first element (if the ”from” element
is blank) or the last element (if the ”to” element is blank). Consider the above
string. If we only wanted to print the first 30 characters of this string, we can
use implicit subsetting to tell Python that we want to print the beginning of
the string up to the 30th element.
1 print ( text [:30])
Similarly, if we wanted to print the all but the first 50 characters of this string,
we can use implicit subsetting to tell Python that we want to print from the
50th element to the last element.
1 print ( text [50:])
We can also throw negative indexing into our implicit subsetting for an extra
fun time! If we wanted to print only the last 50 characters, we could use negative
indexing and an implicit subset. This is essentially telling Python to give us
from the 50th-to-the-last element to the last element.
1 print ( text [ - 50:])
Similarly, we could tell Python to give us from the first element to the 50th-to-
the-last element.
1 print ( text [: - 50])
Negative indexing and implicit subsetting also work with lists. If we consider
our names list and we wanted from the third-to-the-last to the last element, we
could use implicit subsetting with a negative index to get those three elements.
1 print ( names [ - 3:])
Exercise Questions
These exercise questions cover chapter 5.5.
Exercise 78
1. What is subsetting?
2. What kinds of objects can be subsetted?
3. What is negative indexing?
4. What is implicit subsetting?
Exercise 79
1. Create a string with the following text: ”Let’s have a nice tree right here.
Nothing wrong with washing your brush. If I paint something, I don’t
want to have to explain what it is. Don’t be afraid to make these big
decisions. Once you start, they sort of just make themselves. Everything
is happy if you choose to make it that way.”
CHAPTER 5. COMPLEX DATATYPES 112
Exercise 80
1. Create a list with the following elements and put it into a variable called
pokemon:
”Bulbasaur”
”Ivysaur”
”Venusaur”
”Charmander”
”Charmeleon”
”Charizard”
”Squirtle”
”Wartortle”
”Blastoise”
”Caterpie”
”Metapod”
”Butterfree”
”Weedle”
”Kakuna”
”Pikachu”
”Arbok”
”Raichu”
”Nidorino”
2. How many elements are in this list? What method did you use to get
this length?
3. Print the first element of pokemon.
4. Print the last element of pokemon.
5. Print the third element of pokemon.
6. Print the first three elements of pokemon.
7. Print the last three elements of pokemon.
8. Print the third to the last element of pokemon.
Exercise 81
1. Create a list with the following elements and put it into a variable called
villans:
”Bane”
”Black Mask”
113 5.5. SUBSETTING
”Catwoman”
”Clayface”
”Deadshot”
”Firefly”
”Harley Quinn”
”Professor Hugo Strange”
”Hush”
”Joker”
”Killer Croc”
”Mr. Freeze”
”Man-Bat”
2. How many elements are in this list? What method did you use to get
this length?
3. Print the first element of villans.
4. Print the last element of villans.
5. Print the last seven elements of villans.
6. Print from the 7th element to the last element of villans.
7. Print the seventh element of villans.
8. Print the seventh-from-the-last element of villans.
CHAPTER 5. COMPLEX DATATYPES 114
5.6.1 Stripping
An important feature of string handling is the ability to remove characters
from a string. For example, let’s say you got some data and it was formatted as
&&data&& or it was surrounded by spaces. How could we get rid of this cruft?
The process of stripping removes unwanted characters from a string. Python
provides us with three stripping methods: left stripping, right stripping, and
dual stripping. As their names suggest, these methods can remove characters
from the left side of the string, the right side of the string, or from both sides of
the string at the same time, respectively. The functions to do this are lstrip(),
rstrip(), and strip().
By default, all three methods only remove whitespace. So, if you had extra
tabs or spaces at the beginning or end of your string, you could simply call the
strip method to remove the whitespaces.
1 print ( " data ")
2 print ( " data " . strip () )
data
data
We didn’t give the strip() method anything, so it only removed the spaces.
However, we could also give it letters, numbers, symbols, or any other valid
characters. For example, consider the following two print statements.
1 print ( " f p p p p p p p p p p p p p p p f f d a t a p f f f f f f p p p p p " )
2 print ( " f p p p p p p p p p p p p p p p f f d a t a p f f f f f f p p p p p " . strip ( " fp " )
)
fpppppppppppppppffdatapffffffppppp
data
apple
115 5.6. STRING MANIPULATION
The word ”apple” has the letter p in it, but it doesn’t get stripped out because
Python has stopped when it reached the a.
Let’s now consider the left stripping. Left-stripping will remove letters that
match stop words from the beginning of the string only.
1 print ( " p p p p p p p p p p p p p a p p l e p p pp " . lstrip ( " p " ) )
applepppp
Python removed all of the p’s on the left side, up to the first non-stripped
character. It didn’t remove anything from the right side, though. If we wanted
to only remove the right-side characters, we can use the rstrip() method.
1 print ( " p p p p p p p p p p p p p a p p l e p p pp " . rstrip ( " p " ) )
pp pp pp pp pp pp pa pp le
Exercise Questions
These exercise questions cover chapter 5.6.1.
Exercise 82
1. What were the three methods introduced?
2. What is the difference between each method?
Exercise 83
1. If you wanted to remove certain characters from the beginning of a string,
what method would you use?
2. If you wanted to remove certain characters from the end of a string, what
method would you use?
3. If you wanted to remove certain characters from the beginning and end
of a string, what method would you use?
Exercise 84
1. Define a new string named fruit and initialize it to 33333banana33333.
Exercise 85
Consider the string aaaaaaabababbbbbb for this exercise.
1. Without writing any code, strip the character a from both sides of the
string. Why did you stop removing characters where you did?
2. Without writing any code, strip the character b from both sides of the
original string (the string before part 1). Why did you stop removing
characters where you did?
3. Without writing any code, strip the characters ab from both sides of the
original string. Why did you stop removing characters where you did?
Exercise 86
1. Define a new string named actor and initialize it to
//// Chadwick Boseman ****. There are two spaces between the
forward slashes and ”Chadwick”, and there are four spaces between
”Boseman” and the asterisks.
2. In one line, remove the asterisk and forward slash characters from actor.
3. Remove the spaces from the beginning and end of actor. Did you specify
any arguments for the function that you used? Why or why not?
Regexes
For this section, let’s consider the following text.
He is headstrong and he has much to learn of the
living Force , but he is capable . There is little
more he can learn from me . Young Skywalker ’ s fate
will be decided later . Now is not the time for this
. the Senate is voting for a new Supreme Chancellor
and Queen Amidala is returning home , which will
put pressure on the Federation , and could widen the
confrontation . And draw out the Queen ’ s attacker .
The most simple regex is a direct match. To make a direct match, we simply
enter the string to match. Let’s try to match the word ”the”. The regex parser
will find 6 matches (”of the living,” ”not the time”, ”the Senate is,” ”on the
Federation,” ”widen the confrontation,” and ”out the Queen’s”).
117 5.6. STRING MANIPULATION
If we chose a direct match that occurs inside of other words, the regex parser
will grab those as well. For example, if we try to match the phrase ”ch”, the
regex parser will find 2 matches (”has much to” and ”home, which will”).
There are several characters that we can use to modify the regex search,
with the most greedy being the period. This character is something called a
wildcard, which means that it can match different things that adhere to a
certain condition. The period matches any single character, meaning that each
character in our text has been matched (”H”, ”e”, ” ”, ”i”, ”s”, ” ”, ”h”, ”e”,
”a”, ”d”, ”s”, ”t”, ”r”, ”o”, ”n”, ”g”, ” ”...).
What if we wanted to find a word character only? This is where the word
wildcard comes in: \w. The \w wildcard matches any word character, including
from A-Z, a-z, 0-9, or the underscore character. In our text, the \w wildcard
would match ”H”, ”e”, ”i”, ”s”,..., ”e”, ”r”, but skip spaces, periods, and
apostrophes. Similarly, the \d wildcard matches any digit from 0-9. It would
find no matches in the text above, since the text above has no digits. The \s
wildcard matches any whitespace, including spaces, tabs, newlines, and carriage
returns.
Case Sensitive
Regexes are case sensitive, meaning that \d and \D are not the
same thing.
The \w, \d, and \s wildcards also have negative versions, meaning that
it will search the opposite. The negative versions are the capital \W, \D, and
\S, which will find non-word, non-digit, and non-whitespace characters, re-
spectively. That means that using the \s (lowercase) wildcard catches all of
the spaces and the newline at the end of the passage, while the \S (upper-
case) wildcard catches all of the letters, numbers, and symbols but no spaces
or newlines.
We can also create our own custom character sets to select by. For example,
if we wanted only the letters a, b, c, d, and e to be captured, you can use
a custom character set. To create a custom character set, just enclose the
characters that you want to include in square brackets. So, to catch a, b, c, d,
and e, we would write [abcde]. This would catch every letter that is a, b, c, d,
or e, lowercase only. In our passage, it would catch the e in He, but not the H.
It also catches the e, a, and d in headstrong. If we wanted to include capitals,
we would need to explicitly set this in our regex: [ABCDEabcde]. This would
also catch the C in Chancellor.
Regexes also can span a range of letters by using the dash -. For example, if
we wanted to rewrite the previous regex ([ABCDEabcde]) using a dash, we could
write [A-Ea-e]. This works for [A-Z], [a-z], and [0-9]. So, the following
regex would capture every letter and number from A-z, 0-9, no matter the case:
CHAPTER 5. COMPLEX DATATYPES 118
[A-Za-z0-9].
To this point, we have only been covering single characters unless we specify
a literal match. What if we want to match multiple letters? This is called a
quantifier, and it allows us to match multiples of characters together as one.
For example, if we wanted to match an entire word (instead of matching letters
of a word), we would use a quantifier on a letter wildcard. There are six valid
quantifiers.
? matches zero or one of the previous token
* matches zero or more of the previous token
+ matches one or more of the previous token
{n} matches exactly n of the previous token
{n,} matches n or more of the previous token
{n,m} matches between n and m of the previous token
Let’s apply this to our passage. We’ll use the [A-Za-z] regex as our token to
quantify. If we apply the ? quantifier, our regex looks like [A-Za-z]?, and this
will match every single letter, as well as a None at the end of every word, since
the end of every word has zero of the [A-Za-z] token.
If we use the * quantifier, our regex looks like [A-Za-z]*, and this will
match zero or one capital or lowercase letters. This will match every single
word, but it will also match the end of every word, resulting in every other
match being None or null.
Instead, we might want to use the + quantifier, which will look like [A-Za-z]+.
This will match entire words separated by anything that isn’t a letter. We don’t
get any Nones or nulls.
If we use the {2} quantifier, our regex will look like [A-Za-z]{2}. This will
match all two letter pairs. So, it will match He and is, but it will also match
the chunks of ”headstrong” as he, ea, ds, ro, and ng.
If we use the {3,} quantifier, our regex will look like [A-Za-z]{3,}. This
will match all words that are three letters long or longer. So, it won’t match
He or is, but it will match headstrong and and.
If we use the {3,6} quantifier, our regex will look like [A-Za-z]{3,6}. This
will match all words that are three letters long or longer, but six letters long or
shorter. So, this will match and and much, but it won’t match with the entirety
of ”headstrong”. However, it will match headst and rong from headstrong.
If we wanted to match only entire words, we need to use something called
a word boundary. Word boundaries limit matches to immediately between
a character matched by the \w (lowercase) wildcard and a character by \W
(capital) wildcard (in either order). So, if we wanted to only find whole words
that are between 3 and 6 letters long that are lowercase or have uppercase
letters, we would have to put word boundaries around the beginning and end to
make sure that the space is present at the beginning and end of the word. This
119 5.6. STRING MANIPULATION
would look like \b[A-Za-z]{3,6}\b. This regex would match word like and
and Senate, but it would not match headstrong, since by the sixth character,
the next character is not a \W character.
This is a very basic introduction to regexes. Regexes can become incredibly
complex. For example, something as simple as an email address can require a
doozy of a regex.
(?:[ a - z0 -9!# $ %& ’*+/=?^ _ ‘{|}~ -]+(?:\.[ a - z0 -9!# $
%& ’*+/=?^ _ ‘{|}~ -]+) *|"(?:[\ x01 -\ x08 \ x0b \ x0c \ x0e -\
x1f \ x21 \ x23 -\ x5b \ x5d -\ x7f ]|\\[\ x01 -\ x09 \ x0b \ x0c \ x0e
-\ x7f ]) *") @ (?:(?:[ a - z0 -9](?:[ a - z0 -9 -]*[ a - z0 -9]) ?\.)
+[ a - z0 -9](?:[ a - z0 -9 -]*[ a - z0 -9])
?|\[(?:(?:(2(5[0 -5]|[0 -4][0 -9])
|1[0 -9][0 -9]|[1 -9]?[0 -9]) ) \.)
{3}(?:(2(5[0 -5]|[0 -4][0 -9])
|1[0 -9][0 -9]|[1 -9]?[0 -9]) |[ a - z0 -9 -]*[ a - z0 -9]:(?:[\
x01 -\ x08 \ x0b \ x0c \ x0e -\ x1f \ x21 -\ x5a \ x53 -\ x7f ]|\\[\
x01 -\ x09 \ x0b \ x0c \ x0e -\ x7f ]) +) \])
One of the most common methods inside of re is the match() method. The
match() method takes two arguments: pattern and string, which correspond
to the regex and the string to be tested. The match() method, as its name
suggests, matches at the beginning of the string.
In Python, when we write out a regex, we should specify to Python that we
are actually writing out a regex instead of just a regular string, and we do that
by fixing a r to the beginning of the string, before the opening quotation mark.
Let us consider the following string.
1 teststr = " Traveling through hyperspace isn ’t like
dusting crops , boy ! Without precise calculations we
could fly right through a star or bounce too close
to a supernova and that ’d end your trip real quick
, wouldn ’t it ? What ’s that flashing ? We ’ re losing
our deflector shield . Go strap yourself in , I ’m
going to make the jump to light speed . We ’ ve
entered the Alderaan system . Governor Tarkin , I
should have expected to find you holding Vader ’s
leash . "
CHAPTER 5. COMPLEX DATATYPES 120
Let’s say that we wanted to find all of the full words that were longer than 4 let-
ters. Our regex for this would look like \b[A-Za-z]{4,}\b. This regex matches
the letters from A-Z and a-z (case insensitive) in sequences of at least four let-
ters long between two word boundaries. We can apply this to the match()
method as follows.
1 matches = re . match ( r ’\ b [ A - Za -z ]{4 ,}\ b ’ , teststr )
Notice how the regex is prefixed with a r. This tells Python that this should
be treated as a regex, and normal escaping should be ignored.10
The match() method returns a Match object with the first valid match. We
can get the match by looking at the 0th element of the Match object, which
returns the entire match.
1 print ( matches [0])
Traveling
In this case, the first match is the word Traveling. There is no object at [1],
since the match() method only returns the first match. But what if we wanted
to get all of the matches?
This is where the findall() method comes in. The findall() method
scans through the entire string and puts the resulting matches into a list. Here
is the same regex, but using the findall() method instead of the match()
method. Again, this regex matches the letters from A-Z and a-z (case insensi-
tive) in sequences of at least four letters long between two word boundaries.
1 matches = re . findall ( r ’\ b [ A - Za -z ]{4 ,}\ b ’ , teststr )
2 print ( matches )
Now, let’s suppose that we had a comma-separated file and we wanted to sepa-
rate the individual elements into a list, without the commas. This is where we
10 If normal escaping is in effect, things like \n would be evaluated as newlines instead of
as a newline in a regex, and your regex would likely not work correctly.
121 5.6. STRING MANIPULATION
would use the split() method. The split() method splits a string at every
match and puts everything between the matches into list elements.
Let’s suppose that we had the following string, which represents comma-
separated values.
1 testcommas = " 172 , Austin Czarnik , czarnau01 ,29 , Eastern ,
Metropolitan , NYI , Off ,C ,12 ,95 ,86 ,52 . 5 ,5 . 4 ,72 ,61 ,54 .
1 ,7 . 1 ,16 . 9 ,91 . 8 ,108 . 8 ,51 . 7 ,48 . 3 ,11:02 ,10:22 ,4 ,4 , -0 .
1 ,16 ,62 . 5 ,0:41 ,10:23 ,3 . 5 ,4 . 2 ,1 . 6 ,0:27 ,7 .
7 ,14 ,0 ,0:11 , - 11 ,0 ,23 . 7 ,2 ,4 ,6 ,6 ,0 ,0 .
9 ,2 ,0 ,0 ,0 ,4 ,0 ,0 ,10 ,20 ,132 ,11:02 ,5 ,2 ,23 ,35 ,39 . 7 "
If we call the split() method to split on commas, we will split this string
into individual cells without the commas. For this, we will use the regex r’,’,
which matches on the literal comma character ,.
1 matches = re . split ( r " ," , testcommas )
2 print ( matches )
Python has split up the testcommas string on the commas without includ-
ing any of the commas themselves. Our regular expression was matching on
the literal comma character ,, so any matches to the regular expression were
considered to be a delimiting character.
Exercise Questions
These exercise questions cover chapter 5.6.2.
Exercise 87
1. What is a regular expression?
2. Why might we want to write a regular expression?
Exercise 88
Write a Python-compliant regular expression that captures the following
expressions.
CHAPTER 5. COMPLEX DATATYPES 122
Exercise 89
1. What are the methods that we introduced for using regular expressions
in Python?
2. Let’s say you were given a paragraph of prose text and you were told to
split it into individual words. What method would you use? You do not
need to write out the regular expression; only provide the method.
3. Let’s say you were given a stanza of poem text and you were told to split
it into individual lines. What method would you use? You do not need
to write out the regular expression.
4. Let’s say you were given a paragraph of prose text and you were told to
find the first instance of a phrase. What method would you use? You
do not need to write out the regular expression.
5. Let’s say you were given a line of delimiter-separated data and you were
told to split the individual data elements into their own elements in a
list. What method would you use? You do not need to write out the
regular expression.
Exercise 90
123 5.6. STRING MANIPULATION
For the following exercise questions, consider the following text passage.
1. Write a regular expression and choose the correct method to get the
word Rebel from the text.
2. Write a regular expression and choose the correct method to get the
phrase civil war from the text.
3. Write a regular expression and choose the correct method that puts all
words that start with a capital letter in a list.
Exercise 91
For the following exercise questions, consider the following text passage. In-
dented lines are part of the previous line (meaning that there are six total lines).
1. Write a regular expression and choose the correct method to put each
line of the poem into its own list element.
2. Write a regular expression and choose the correct method to put all of
the words that end in ing into a list. This should match words like
napping, tapping, and rapping.
CHAPTER 5. COMPLEX DATATYPES 124
3. Write a regular expression and choose the correct method to put all of
the phrases that start with a quotation mark and end with a qotation
mark into a list. For your convenience, all of the quotation marks have
been converted to standard quotes (instead of smart quotes).
4. Write a regular expression and choose the correct method to put each
phrase of the poem into its own list element. Each phrase starts with a
comma, dash, or period and ends with the next comma, dash, or period.
So, Once upon a midnight dreary would be considered one phrase, while I
pondered is another phrase, weak and weary is another phrase, and Over
many a quaint and curious volume of forgotten lore is another phrase.
Note that newlines are not necessarily an indicator of a new phrase.
125 5.7. RANDOM NUMBER GENERATION
Lava lamps
x = (a ∗ x + c) mod m
where a is the multiplier, c is the increment amount for each iteration, and
m is a modulus, which are all constant values. x is the seed. Every time a new
number is needed, the increment amount increases the mutiplier by c. Then,
to get the number in the right range, we can use
to get a number, where rupper is the upper range of our possible random
numbers, rlower is the lower range of our possible random numbers, and o is the
random number.
This is a very primative pseudo-random number generator. There are only
as many intermediate steps from the seed to the pseudo-random number as iter-
ations, and in general, more intermediate steps leads to better random numbers.
Because of this, LCGs are rarely used anymore, but their theory of operation
make them a useful tool for understanding why we get the random numbers
that we do.
Thankfully, Python has a simple library that allows us to use a much more
sophisticated pseudo-random number generator. Predicably, this library is
called random. There are three methods inside of random that will be of great
use to us: random(), randint(), and choice().
Let us start with random(). The random() method generates pseudo-
random floating-point values on a uniform real distribution from 0 to 1. As
opposed to the very simple linear congruential generator that we previously
used, random() uses the Mersenne Twister generator, which is a much more
complex and precise generator than the LCG.
127 5.7. RANDOM NUMBER GENERATION
Mersenne Twisters in C
To use the random() method in Python, we first need to import the random
library.
1 import random
0. 75 92 07 61 43 98 66 61
0 .2 2 5 57 1 6 5 06 5 7 17 1 4 4
0 .0 1 4 34 8 7 1 42 6 5 22 8 9 9
More often than not, we will want to generate a random integer. We could use
the random() method and pull a little bit from the output, but that’s a lot of
work. So instead, let’s use the randint() method. The randint() method
takes two arguments, a minimum and a maximum for the random number.
1 print ( random . randint (1 , 10) )
2 print ( random . randint (1 , 10) )
3 print ( random . randint (1 , 10) )
3
2
7
Now, let’s suppose that we have a list with two coin sides: heads and tails.
1 coin = [ ’ heads ’ , ’ tails ’]
heads
However, since we have a list already, this is the perfect scenario to use the
choice() method, which picks a pseudo-random choice from a list.
1 print ( random . choice ( coin ) )
heads
If your seed was set to the 215, the first random integer that you generate from
1 to 100 should be 6.
Exercise Questions
These exercise questions cover chapter 5.7.
Exercise 92
1. We introduced two methods for generating pseudo-random numbers.
What are they?
2. Which of the two methods does the random library utilize in Python?
Exercise 93
1. What is a pseudo-randomly generated number?
2. We have been using the term pseudo-random throughout this chapter.
Why?
129 5.7. RANDOM NUMBER GENERATION
Exercise 94
1. What is a seed?
2. What is a situation where we might want to set the seed?
3. If we do not explicitly set a seed, what will Python use as the seed?
Exercise 95
1. In a new Python script, import the random library.
2. Create a pseudo-random floating point number from 0 to 1 and place
the result into a new variable called randomfloat1.
3. Print randomfloat1.
4. Create another pseudo-random floating point number from 0 to 1 and
place the result into a new variable called randomfloat2.
5. Print randomfloat2.
6. Sum randomfloat1 and randomfloat2 and put the result into a new
variable called randomsum.
Exercise 96
1. In a new Python script, import the random library.
2. Create a new pseudo-random integer in the range of 0 to 1000 and place
the result into a new variable called dividend.
3. Create another pseudo-random integer in the range of 1 to 100 and place
the result into a new variable called divisor.
4. Print the quotient of dividend ÷ divisor.
5. Print the remainder value of dividend ÷ divisor as an integer.
6. Print the quotient of dividend ÷ divisor without the decimal/remainder
portion.
CHAPTER 5. COMPLEX DATATYPES 130
Chapter 6
Being able to use conditional logic and applying that logic to control the flow
of your programs is an important part of programs. If there’s something that
computers are really good at doing, it’s doing things procedurally, and loops and
conditional logic are the first step to this procedural approach to computing.
131
CHAPTER 6. CONDITIONAL LOGIC AND LOOPS 132
This expression can either evaluate to True or False. To manually evaluate how
the Boolean expression will evaluate, we can ask the question: is this statement
correct? Only consider whether the full statement is correct, not a part of the
133 6.1. COMPARISON OPERATORS
statement.
In the above example, we can ask the question: is 4 greater than 3? Yes, it
is! Therefore, the expression will evaluate to True.
Here’s another example.
1 5 != 6
Again, we’re using the not-equal-to comparison operator, so the question should
be: is 5 not equal to 5? No, so the expression will evaluate to False.
If you want to test multiple things at once, you can also do that in Python.
There are two main ways to test multiple things in programming: ”and” and
”or”. Using ”and” requires that all of the Boolean expressions evaluate to True
in order for the entire compound Boolean expression to evaluate to True. The
”or” only requires that one of the Boolean expressions evaluates to True in
order for the entire compound Boolean expression to evaluate to True.
In Python, we can denote the ”and” by writing and. Similarly, we can
denote the ”or” by writing or. When writing a compound Boolean expression,
you can stack multiple single Boolean expressions on top of each other, as shown
here.1
1 5 ! = 6 and 6 ! = 7
The statement shown would evaluate to True, since both of the single Boolean
expressions would evaluate to True: yes, 5 is not equal to 6, and 6 is not equal
to 7. You can also use more than two Boolean expressions.
1 5 ! = 6 and 7 ! = 7 and 8 = = 8
In this case, the compound Boolean expression would evaluate to False, since the
second condition 7 != 7 evaluates to False, so the entire expression evaluates
to False.
It is also possible to combine and and or. Consider the following expression.
1 5 ! = 6 and 7 ! = 7 or 5 ! = 6 and 2 ! = 3
If we evaluate the first half of the compound Boolean expression, we can see
that this would evaluate to False. However, the second half of the Boolean
expression would evaluate to True, since both of the conditions are True. The
two sides of the expression are separated by a or, so only one side needs to
1 Python is relatively special in that it uses the keywords ”and” and ”or”. Most other
languages use the symbols && (two ampersands) to mean ”and” and || (two vertical pipes)
to mean ”or.”
CHAPTER 6. CONDITIONAL LOGIC AND LOOPS 134
evaluate to True for the entire thing to evaluate to True. Essentially, when you
use and, only one condition needs to be false for the whole thing to evaluate
to False. When you use or, only one condition needs to be true for the whole
thing to evaluate to True.
Note that some languages have an additional operator called the strict equal-
ity operator, represented by three equal signs ===. In a strict equality, not only
must the values that are being compared match, but so must the datatype.
In JavaScript, another programming language that has the strict equal-
ity operator, you can see how the strict equality can change the result of the
Boolean expression.2
1 0 == ’0 ’ // TRUE
2 0 === ’0 ’ // FALSE
3 0 == ’’ // TRUE
4 0 === ’0 ’ // FALSE
Because the datatypes don’t match exactly (the first item is an integer, the
second is a string), JavaScript is evaluating a strict equality to be false, even
though loose equalities might evaluate to true (in the first line, JavaScript is
actually typecasting the string into an integer in order to test it).
While this seems useful, Python does not support the strict equality. The
closest thing to this in Python is a combination of Boolean expression tests:
1 a = = b and type ( a ) = = type ( b )
Exercise Questions
These exercise questions cover chapter 6.1.
Exercise 97
1. What is a comparison operator?
2. What can a Boolean expression evaluate to?
3. List the six comparison operators in Python and what they indicate.
Exercise 98
What do the following Boolean expressions equate to in Python?
1. 4 > 3
2. 4 >= 4
3. 4 > 4
4. "Python" == "Is Awesome"
5. 4 < 3
2 In Javascript, double forward-slashes indicate a comment, not integer division.
135 6.1. COMPARISON OPERATORS
6. "4" == 4
7. int("4") == 4
8. 4 != 4
9. 4 != 5
Exercise 99
Write the following Boolean expressions with the correct Python syntax.
1. 4 is equal to 8
2. 4 is not equal to 9
3. The variable k is less than 3 or greater than or equal to 5
4. The variable m is greater than 3 and less than 20
5. The variable p is greater than 3 or less than 0 but is not exactly 1
CHAPTER 6. CONDITIONAL LOGIC AND LOOPS 136
1 a is less than b
Let’s break down this chunk of code. In the first two lines, we’re defining and
initializing two integer variables, a and b. Then, we’re creating an if statement.
We can read this if statement as meaning ”if a is less than b, then execute this
chunk of code.” In an if statement, Python will evaluate the Boolean expression
inside of the parentheses. If the Boolean expression evaluates to True, then the
code inside of the if statement is run. If the Boolean expression evaluates to
False, then the code inside of the if statement will not run at all. Instead,
Python will skip the contents of the if statement and resume running the code
that is no longer indented.
We can see how this works here.
1 c = 7
2 d = 6
3 if ( c < d ) :
4 print ( " c is less than d " )
5 print ( " This is outside of the if statement " )
How can we tell that the code outside of the if statement will run? Notice how
the second print statement (print("This is...) is not indented at the same
level as the first print statement (print("a is less...).
137 6.2. IF, ELSE, AND ELSE IF STATEMENTS
If the Boolean expression evaluates to True, then the code inside of the
if statement will run, then Python will resume any code outside of the if
statement. Take a look at the following code.
1 a = 5
2 b = 7
3 if ( a < b ) :
4 print ( " a is less than b " )
5 print ( " This is outside of the if statement " )
1 a is less than b
2 This is outside of the if statement
The if statement can only dictate whether the code inside of the statement
can run. It cannot affect anything outside of the if statement.
We can also tell Python to test a condition, then run a certain chunk of
code if the condition evaluates to True or another chunk of code if the condition
evaluates to False. In Python, we do this using the if and else statements.
As their names suggest, if will test the condition, and like we’ve seen, we
must provide it with a Boolean expression to test. The else statement will
handle the code that’s run if the if statement evaluates to False. Unlike the
if statement, the else statement cannot take anything else. Let’s take a look
at some code.
1 a = 5
2 b = 7
3 if ( a < b ) :
4 print ( " a is less than b " )
5 else :
6 print ( " b is less than a " )
1 a is less than b
1 c = 7
2 d = 6
3 if ( c < d ) :
4 print ( " c is less than d " )
5 else :
6 print ( " d is less than c " )
1 d is less than c
1 y is the smallest
This code looks complicated, but let’s break it down. First, we’re declaring
three variables, x, y, and z. Each of these variables is being initialized to
a unique integer value. Next, we’re testing if z is the smallest of the three
variables by comparing it to both x and y. If it is the smallest, then we can
print that it’s the smallest and exit the group of if/else statements. If it’s
not, then we can test a second condition: if y is the smallest. If it is, then
we can print that it’s the smallest and exit the group of if/else statements.
Otherwise, x must be the smallest variable.
If we wanted to test more than just two, we can add multiple elif state-
ments. In fact, we can use as many elif statements as we need to test all of
the conditions that we need. Take a look at the following code, which works
similarly to the previous example, but compares four numbers.
1 x = 3
2 y = 2
3 a = 1
4 z = 5
5 if ( z < x and z < y and z < a ) :
6 print ( " z is the smallest " )
7 elif ( y < x and y < a and y < z ) :
8 print ( " y is the smallest " )
9 elif ( a < x and a < y and a < z ) :
10 print ( " a is the smallest " )
11 else :
12 print ( " z is the smallest " )
139 6.2. IF, ELSE, AND ELSE IF STATEMENTS
1 a is the smallest
In this example, we’re using two elif statements inside of our if/else state-
ment group. Because these elif statements are written in between the if
and else statements, they will be evaluated within this if/else statement
group. In fact, it is syntactically illegal to have an elif statement outside of
the if/else statement group. The following code is not syntactically valid.
1 # WRONG
2 x = 3
3 y = 2
4 z = 5
5 if ( z < x and z < y ) :
6 print ( " z is the smallest " )
7 else :
8 print ( " x is the smallest " )
9 elif ( y < x and y < z ) :
10 print ( " y is the smallest " )
Remember that within your if/else statement group, the else statement must
be the last thing that’s introduced into the group. You can think of it as the
”catch all” in the group. You can have as many elif statements as you need,
but your else statement must be the last thing.
We can also test whether an element is present inside of a complex datatype
as covered in Chapter 5. We can directly test whether an element is in a list,
tuple, or set.
Math Minded?
Exercise Questions
These exercise questions cover chapter 6.2.
Exercise 100
1. What are the three new keywords discussed in this section?
CHAPTER 6. CONDITIONAL LOGIC AND LOOPS 140
Exercise 101
1. Create an if statement that executes the code inside of it if the numDogs
variable is greater than or equal to 5.
2. Create an if statement that executes the code inside of it if the cats
variable is exactly equivalent to the string "orange tabby".
3. Create an if statement that executes the code inside of it if the isDone
variable is exactly equivalent to False.
Exercise 102
For this exercise, consider the if statement that tests whether the numDogs
variable is greater than or equal to 5.
1. Inside of the initial if statement, create a print statement that outputs
the following string: We have more than five dogs!
2. Create an elif statement that tests whether the numDogs variable is less
than or equal to 3. Inside of this elif statement, create a print statement
that outputs the following string: We have less than three dogs..
3. Create a final else statement that outputs the following string and ex-
plain why this would execute in context of the other if and elif state-
ments: We have exactly four dogs..
Exercise 103
For this exercise, consider the if statement that tests whether the cats
variable is exactly equivalent to the string "orange tabby".
1. Inside of the initial if statement, create a print statement that outputs
the following string: The tabby cat is orange!.
2. Create an elif statement that tests whether the cats variable is exactly
equivalent to the string "grey tabby" and if so, outputs the following
string: The tabby cat is grey!.
3. Create an elif statement that tests whether the cats variable is exactly
equivalent to the string "brown tabby" and if so, outputs the following
string: The tabby cat is brown!.
4. Create a final else statement that outputs the following string and ex-
plains why this would execute in context of the other if and elif state-
ments: "The tabby cat is something else".
Exercise 104
141 6.2. IF, ELSE, AND ELSE IF STATEMENTS
Write a short script that asks the user to input a number from 1 to 100.
Test whether that number is between 1 and 100, then tell the user whether the
number is greater than 50, or less than or equal to 50.
Exercise 105
Cast your mind back to when we covered arithmetic. Write the Boolean
expressions to evaluate the following. Write some code to prove your Boolean
expressions, and for each, test the following values for i, j, and k: 39, 43, 44,
50.
1. Whether the variable i is divisible by 5.
2. Whether the variable j is a multiple of 4.
3. Whether the variable k is a divisor of 3.
CHAPTER 6. CONDITIONAL LOGIC AND LOOPS 142
This range function would produce the list [0, 1, 2, 3, 4]. If we pass in
two arguments, then we can specify the beginning and the end of the list.
Counting will begin exactly at the first number and end at one less than the
second number that you specify, just like if you had only specified one argument,
except if the first argument was 0. In fact, when you only pass one argument,
Python assumes that the first argument is 0! Take a look at the following code.
1 range (2 , 6)
This range function would produce the list [2, 3, 4, 5]. Notice how there
are as many elements in the array as b − a, where b is the second argument and
a is the first element.
If you do not provide a third argument, the second argument must be larger
than the first argument, since the iteration direction goes up. But, what if we
wanted to produce a list that counted down? Well, we could provide a third
argument, the iterator. The iterator will tell Python how large the steps should
be and in which direction the steps should go while generating the list. The
third argument can be a positive or negative integer.
1 range (2 , 6 , 2)
This range function would produce the list [2, 4]. Notice how the range of
the list is the exact same as in the previous range example with two arguments.
However, since we’ve specified that the third argument as 2, Python will only
use every other element from the previous array. If we had specified the third
143 6.3. FOR LOOPS
element as 3, Python would only use every third element from the previous
array.
If your third argument is positive, then your second element must be greater
than your first element. If your third argument is negative, then your second
element must be less than your first element. Consider the following code and
the lists that they would produce.
1 range (6 , 2 , - 1)
Let’s break down this for loop. The first thing in this statement is the keyword:
for lets Python know that we want to use a for loop. The next thing is a variable
declaration. We need this variable to keep track of how many times we’ve run
the loop. For this, just name the variable name that you want to use. Python
will declare a new variable named whatever you specified, so it must follow all of
our variable naming rules. The type of the variable depends on the last element
inside of your loop definition, which we’ll get to later. Next is another keyword
that lets us know that the variable that we just declared will be iterating inside
of something else. The last thing is the list that we’ll be iterating through. You
can also use the variable inside of the loop. Take a look at the following loop:
1 for i in range (5) :
2 print ( i )
1 0
2 1
3 2
4 3
5 4
In this loop, we can see that we’re using i as the variable that we’re using to
iterate through this loop. We can then generate a list using our range(), and
according to the range() function documentation, we know that the list that
we’re using is [0, 1, 2, 3, 4]. Since we’re using the range() function, the
variable type of i (our iterator variable) will be an int. Lastly, inside of the
for loop, we are printing the iterator variable with each iteration of the loop.
CHAPTER 6. CONDITIONAL LOGIC AND LOOPS 144
This is the most common way to run a loop a specified number of times.
This is really useful if you know exactly how many times you’ll need to run a
loop. For example, say you’re writing a program that calculates the average
of four numerical grades from 0 to 100. While you could write four input()
statements that put the values that the user inputs into a specific variable, we
could also use a for loop. This would allow us to put the value that the user
inputs into a list of our naming. We could do this as shown in the following
sample code.
1 grades = [0 , 0 , 0 , 0] # declare a new variable of type
list
2 for i in range (4) :
3 grades [ i ] = int (
4 input ( " Input grade number " + str ( i + 1) + " of
4: " )
5 )
6 average = ( grades [0] + grades [1] + grades [2] + grades
[3]) / 4
7 print ( " Average is " , str ( average ) )
Again, we can break this code down line-by-line. In line 1 (grades = [0...),
we’re declaring a new variable as a list with four integers in it. Next, we’re cre-
ating a for loop that will iterate through another list of integers [0, 1, 2, 3],
which has been created by our range() function. The current loop iteration
number is stored in the variable i, which is an integer, as decided by the list
from the range() function. Reminder: the iteration variable (like i) will take
on whatever datatype the iteration list is. Inside of our for loop, we’re us-
ing the input() function to take in four scores. We know that the input()
function will always return a string, so we’re also typecasting the value of the
user input into an integer before storing it in the grades list. You’ll also notice
in the input() function that we’re asking the user for score number i + 1,
instead of i. Again, remember that indices in Python start at 0, not 1. If we
asked for i, we’d be asking the user for score 0, 1, 2, and 3, instead of the more
reasonable request of 1, 2, 3, and 4. Finally, we’re using some arithmetic to
calculate the average of the scores that our user gave us, storing it in the new
variable average, and printing average. Because our division operation would
almost certainly cause us to end up with a decimal, average will probably be a
float. If the average just happens to be a whole number, then Python will use
the integer datatype.
So far, we’ve only used the range() function to produce a list of integers
that would be iterated over in our for loop. However, notice how we’ve always
referred to the result of the range() function as a list. So, can we also just
pass in a list as our iterator for a for loop? Yes, we can! Python allows us to
iterate through a list, whether we use a list literal or a variable that contains
a list. We also mentioned above that the iterator variable would take on the
datatype of the iteration list, and here’s where we can put that to use.
145 6.3. FOR LOOPS
Let’s try to create two for loops, one using the range() function and one
with a list literal filled with integers.
1 for i in range (4) :
2 print ( i )
0
1
2
3
1 for i in [0 , 1 , 2 , 3]:
2 print ( i )
0
1
2
3
If it wasn’t clear what range() was doing before, it should be clear now:
range() is creating a list, and it’s a shortcut to writing out each element that
we want to print. This is really useful if you need to iterate a bunch of times
(100, 1000, or even more!).
However, we’re not limited to passing in integers in our lists. We can also
pass in other datatypes, including floats and strings.
1 for i in [ " Ronaldo " , " Messi " , " Neymar " ]:
2 print ( i )
1 Ronaldo
2 Messi
3 Neymar
1 for i in [1 . 34 , 2 . 71 , 3 . 14]:
2 print ( i )
1 1.34
2 2.71
3 3.14
In the first example, we’re iterating through three strings, and in the second
example, we’re iterating through three floats. Both of these are being passed
in as list literals, but there’s nothing preventing us from passing in a variable
with a list in it.
CHAPTER 6. CONDITIONAL LOGIC AND LOOPS 146
1 cities = [ " New York " , " Atlanta " , " Columbus " ]
2 for i in cities :
3 print ( i )
1 New York
2 Atlanta
3 Columbus
When iterating through an explicitly defined list (whether it’s a list literal or a
variable with a list), Python will always loop through the list according to the
index number in ascending order. It won’t sort by alphabetical order or string
length, and you’ll need to do some extra computing if you want your Python
script to do such.
Exercise Questions
These exercise questions cover chapter 6.3.
Exercise 106
Consider the following for loop declarations. Are they syntactically correct?
Why or why not?
1. for i in range(5):
2. for i in range(1, 5):
3. for i in range(1, 6, 2):
4. for i in 5:
5. for (i in range(5)):
6. for (i in 5):
7. for i in [1, 3, 4, 6]
8. for i in [1, 2, 3, 4]
9. for i in ["1", "2", "3", "4"]
10. for i in ["apple", "strawberry", "banana"]
11. for i in fruits where fruits is a variable with the list of strings:
["apple", "strawberry", "banana"]
Exercise 107
1. Using exactly two lines of code, write a loop that prints the following
lines:
pear
papaya
pomelo
grape
147 6.3. FOR LOOPS
2. Using exactly two lines of code, print the numbers from 1 to 100.
3. Using exactly two lines of code and the variable k as the iterator in your
loop, print a asterisks, with each set of k asterisks on their own line,
with the shortest line being just one asterisk and the longest line being
ten asterisks. It should look like this:
*
**
***
****
*****
******
*******
********
*********
**********
4. Using exactly four lines of code and the variable k as the iterator in your
loop, make a diamond out of pound signs/hashtags (#). This means that
you’ll need to add both spaces and pound signs to the beginning and end
of the print statements.
CHAPTER 6. CONDITIONAL LOGIC AND LOOPS 148
1 0
2 1
3 2
4 ...
5 97
6 98
7 99
As you can see, this code runs exactly 100 times, starting at 0 and ending at
99. Once run is equal to 100, the condition is no longer met, so the code inside
of the while loop is no longer run. If we wanted to print a different range of
numbers, we can just change the run variable initialization and the condition
of our while loop. If we wanted to print from 1 to 100 instead of 0 to 99, we
could run the following code instead.
1 run = 1
2 while ( run < 101) :
3 print ( run )
4 run = run + 1
1 1
2 2
3 3
4 ...
5 98
149 6.4. WHILE LOOPS
6 99
7 100
Similarly, if we wanted to change how much run increased with each iteration
of our while loop, we could change the last line in the while loop.
1 run = 0
2 while ( run < 101) :
3 print ( run )
4 run = run + 2
1 0
2 2
3 4
4 ...
5 96
6 98
7 100
In each iteration of our while loop, we’re increasing the value of run by 2
instead of 1, which ends up giving us all of the even numbers.
As you can see, to change the value of run, we’ve been writing that the
value of the variable should be the value of the variable itself plus some other
number: run = run + n, where n is the number to increase by. Since increasing
(or decreasing) the value of a variable by 1 is such a common practice across so
many different applications, Python actually has a built-in mechanism for this
exact purpose.
1 run = 0
2 while ( run < 100) :
3 print ( run )
4 run + = 1
it. Other languages, like C++, C#, Java, and JavaScript use double plus signs ++ or double
negative signs – instead (variable++ or variable- -). This is equivalent to variable += 1 or
variable -= 1 in Python.
CHAPTER 6. CONDITIONAL LOGIC AND LOOPS 150
suggests, while loops are great at testing conditions. It’d be a more realistic
use case to test whether a specific variable is a certain value to determine if
a certain chunk of code should be run again. We can use this property of
while loops to take in an indeterminate number of variables. For example,
consider the example from the for loop section where we took in four scores
and computed the average. What if we didn’t know how many scores the user
needed to input? We could use a while loop to just keep taking in new scores
until the user inputs something specific. Take a look at the following chunk of
code.
1 grades = []
2 newGrade = 0
3 while ( newGrade ! = - 1) :
4 newGrade = int ( input ( " Input another grade or -1 to
stop : " ) )
5 if ( newGrade ! = - 1) :
6 grades . append ( newGrade )
7 average = sum ( grades ) / len ( grades )
8 print ( " The average is " , str ( average ) )
Let’s break this code down line-by-line. In the first two lines, we’re creating
two new variables: grades, which is an empty list, and newGrade, which is an
integer. Next, we’re creating a while loop. We know that a grade can never be
negative, so we can let our user know to use -1 to stop asking for new scores.
Next, we are having our user input the new grade and are putting that value
into the variable newGrade. Then, we’re testing whether newGrade is our special
-1 value to figure out whether we should add the newGrade to the grades list,
as we don’t want to add the -1 value itself. This code will continue asking the
user to input another grade until they enter a -1 value. Finally, it will calculate
the sum of the grades list divided by the length of the grades list, then print
the result.
It’s also worth mentioning that it’s possible to create an infinite loop in
Python. Infinite loops can be dangerous, since they can take up all of the
processor cycles and make halting execution difficult. The difficulty of this is
has to do with the specific Python interpreter that you’re using, but it’s a good
idea to just not chance things. For example, the following example creates an
infinite loop.
1 while True :
2 print ( " Hello , World ! " )
1 Hello , World !
2 Hello , World !
3 Hello , World !
4 ...
151 6.4. WHILE LOOPS
This code has nothing to ever turn the condition false, so this loop will run
forever (or more specifically, until the program runs out of memory). Avoid
writing infinite loops unless you have a really, really good reason to!
Exercise Questions
These exercise questions cover chapter 6.4.
Exercise 108
1. What is the difference between a for loop and a while loop?
2. If a condition is never met in a while loop, what will happen?
Exercise 109
1. Write an example of an infinite while loop.
2. Write an example of an infinite for loop. Consider what needs to happen
for an infinite loop to occur and replicate this in a for loop.
CHAPTER 6. CONDITIONAL LOGIC AND LOOPS 152
Exercise 110
Create a small script that asks the user for a sequence of numbers over and
over again, until they put a negative number (like -1). Store all of the results
in a list with the variable name numbers. The end list should look something
like numbers = [33, 48, 9, 6, 83].
Exercise 111
Provide two examples of why you might want to create an infinite loop. For
each example, provide a brief code snippit that demonstrates how you might
use the infinite loop to your advantage, as well as how you would control the
execution to make sure the loop stays controlled. In your code snippit, only
provide the necessary lines to demonstrate the loop itself and how you intend
on controlling the loop; everything else can just be a comment on what you
would put there.
153 6.5. SCOPE
6.5 Scope
One important concept that’s difficult for many introductory programming stu-
dents to grasp is scope. Scope allows us to keep our variables restricted to a
certain block, and being able to effectively use scope to write clean code is an
essential skill in higher levels of programming. Misusing scope or not knowing
how it works is the source of a lot of frustration, and at worst, it can cause
some pretty serious security issues.
Consider the following code.
1 # MARKER A
2 for i in range (5) :
3 # MARKER B
4 print ( i )
5 # MARKER C
We can tell what this code will do, but direct your attention to the markers in
each of the comments. Where can we access the variable i? We know that we
can’t use it at marker A, since the variable hasn’t even been declared yet. We
know that we can use it inside of the for loop, though; we’ve seen this done
plenty before. But what about marker C? In fact, we cannot use i here, since
it is outside of the scope of where the variable was first declared.
Scope dictates where we can use a variable. In Python, a variable can be
used at the level it was declared at and anywhere deeper, but never outside,
unless it has been declared to be a global variable. This means that if you were
to use a nested for loop inside of an if statement, you could use the iterator
variable only inside of the for loop, not just anywhere in the if statement.
However, if we declared a variable inside of the if statement, it can be used
anywhere inside of the if statement, including inside of the nested for loop.
Consider the following code.
1 a = 1
2 if a > 5:
3 b = 7
4 while ( c < 10) :
5 c = 3
6 c += 1
In the above code, we can use a anywhere inside of the program. We can
use b anywhere inside of the if statement, but not outside of that specific if
statement. We can use c only within the while loop.
Furthermore, if two statements are at the same level, they cannot share
variables.
1 while ( a < 10) :
2 print ( a )
3 a += 1
CHAPTER 6. CONDITIONAL LOGIC AND LOOPS 154
In the above example, a can only be used within the first while loop, and b
can only be used within the second while loop. Attempting to use a variable
outside of scope may result in a NameError, as the variable doesn’t technically
exist outside of its own scope.
Reinitialization Warning
Let’s say that you define a variable in the outermost scope level, as shown above.
Then, inside of a function (covered in the next chapter), you want to use the
exact same variable and memory space. You can use the global keyword to
specify that Python should reference the exact same memory space that the
variable label points to.
1 a = 0
2 def someFunction () :
3 global a
4 # do something
5 ...
In the above function, the scope of a is accessible inside of the function, and its
value is the exact same as outside of the function. This doesn’t terribly matter
for loops and conditional logic, but it does for classes and functions.
Exercise Questions
These exercise questions cover chapter 6.5.
155 6.5. SCOPE
Exercise 112
1. In your own words, define scope.
2. If a variable is ”out of scope,” what does that mean?
3. If you try to call a variable that is out of scope, what type of error might
Python throw?
Exercise 113
1. What is global scope?
2. What are some advantages of global scope?
3. What are some disadvantages of global scope?
Exercise 114
For this exercise, consider the following code.
1 # MARKER A
2 i = input ( " Input a number " )
3 # MARKER B
4 if ( i > 5) :
5 # MARKER C
6 print ( i )
7 else :
8 # MARKER D
9 if ( i = = 5) :
10 # MARKER E
11 k = True
12 else :
13 # MARKER F
14 k = False
157
CHAPTER 7. PYTHON DATA STRUCTURES 158
7.1 Functions
Functions can seem somewhat useless. After all, if we wanted to run the same
code, we could just copy and paste it! However, this isn’t terribly efficient, and
there’s something to be said about code cleanliness that makes it easier to read.
Functions can keep you from writing the same code over and over again. Think
back to loops. Sure, we could just copy and paste a certain chunk of code to
be run 100 times, but that’s a lot of code for something that can be done with
literally just one extra line of code. Functions can save you a lot of time when
programming by allowing you to essentially reference another chunk of code
with a single line of code.
The easiest way to think of functions in programming is as functions in alge-
bra. When you first learned about mathematical functions, you were probably
told that something goes in and something (probably different) comes out. For
example, let’s look at the function y = 2x. This is pretty simple: the input
value x is multiplied by 2, and the output is y. If you put in the value 0, then
you’ll get 0 out, but if you put in anything else, it will be different. The input
4 will yield the output 8. The input 6 will yield the input 12. The input a will
yield the output 2a.
Let’s consider a more complex function: y = 2x + 3z − 9. In this function,
we have two coefficients (2 and 3, next to the x and z, respectively), a constant
(9), and an output (y). If we give the values 5 for x and 42 for z, we get an
output of 125 (125 = 2 × 5 + 3 × 42 − 9). We could also set a preemptive value
for x or y, if the user doesn’t tell us what x or z should be, like 0. So if we gave
only the value 5 for x, then we can assume that z should be 0, so we would get
an output of 1 (1 = 2 × 5 + 3 × 0 − 9).
Programming functions aren’t that dissimilar in principle, but their inner
functionality is a little bit more complicated than Algebra I functions. In pro-
gramming, a function can, but doesn’t always have to, take something in. Sim-
ilarly, a programming function can, but doesn’t always, have an output. Unlike
math functions, functions in Python do need to have their own unique names.
√
−b ± b2 − 4 × a × c
x=
2×a
In Python, a, b, and c would be referred to as parameters. You should make a
mental note that parentheses () are almost always associated with arguments
of a function.
There is a difference between arguments and parameters. Parameters are
put into a function’s definition, while an argument is the actual value passed
in during a function call. So, if we defined a, b, and c to be our parameters,
then the real values 4, 8, and 3 would be the arguments for the parameters.
Essentially, you define parameters in a function definition and arguments in a
function call.
Similarly, the x in the quadratic equation above would be the output. In
programming, the output of a function is called the return value. It’s im-
portant that you don’t confuse your return with the print() function. In the
past, when we’ve referred to ”output,” it’s typically meant that we were print-
ing something to the console. However, now we’re using ”output” to mean
something different. It’s possible to return a value from a function without
printing it.
In Python, we need to give each of our functions a name. Giving our func-
tions a name allows us to refer to the functions later. In programming, we
typically name our functions using alphabetical characters only, in camel-case.
It is unusual to see a function named using symbols, such as dashes or under-
scores. Like variables, it’s important that your functions are named something
concise yet descriptive. Use your experience in naming variables to create good
function names!
The contents of a function can manipulate any of the variables passed in
as arguments. Inside of the function, any of the variables that you listed as
arguments have already been declared and initialized, so you can just start
using them!
Let’s look at a function definition. In this function, we’re going to calculate
x from the quadratic formula, as shown above.
1 def quadraticFormula ( pol , a , b , c ) :
2 negB = b * -1
3 top = 0
4 if ( pol = = " plus " ) :
5 top = negB + sqrt ( b * * 2 - 4 * a * c )
6 elif ( pol = = " minus " ) :
7 top = negB - sqrt ( b * * 2 - 4 * a * c )
8 bottom = 2 * a
9 x = top / bottom
10 return x
Notice how the first thing in this function is def. This lets Python know
that we’d like to define a new function. The def is short for definition, and
CHAPTER 7. PYTHON DATA STRUCTURES 160
writing it lets Python know that the next thing is the function’s name, followed
by the parameters. Our function quadraticFormula takes four parameters:
the polarity of the top plus/minus (pol), a (a), b (b), and c (c). Inside of
the function, we’re creating a new variable that can’t be accessed outside of
the function negB, which is the negative value of the parameter b. Next, our
function calculates the top and the bottom of the function. The top uses the
polarity parameter to determine whether we should add or subtract the square
root. Finally, we divide the top by the bottom and return that value. This
function doesn’t print anything on its own. Instead, its result needs to be
printed by the function call, which we’ll get to in the next section.
This is also the perfect opportunity to examine how we can use functions
inside of other functions. Take a look at the lines that compute top. Inside,
we see that most of the line is just simple arithmetic, but there’s also another
function call: sqrt(). sqrt() is a function in Python that calculates the square
root, and we’re able to use this function inside of our own function. In Python,
we can actually use any function inside of our own functions, including other
functions that we’ve made ourselves. It’s even possible to call your own function
in a technique called recursion, although this is a technique out of the scope of
this book.
You can also see in the above code that we’re returning x, which is a float or
an integer. However, we don’t need to just return a number; we can return any
type of data that we want to, including strings, Boolean values, or compound
datatypes, like lists and dictionaries. Let’s look at a slightly simpler function
that takes in a list as an argument and returns a simple Boolean variable. This
function will test whether the list is longer or shorter than 10 elements. If it’s
longer than 10 elements, it will return True, otherwise it will return False.
1 def isLongerThan10 ( listToTest ) :
2 if ( len ( listToTest ) > 10) :
3 return True
4 else :
5 return False
In the previous example, we can see that our function name is isLongerThan10
and that it takes one argument listToTest. Inside of the function, we see
that we’re using the variable listToTest that came as one of our arguments.
We don’t need to declare or initialize this variable, as it is an argument to the
function that already has a value. Within our function, we’re running an if
statement to determine whether we should return True or False.
When we return a variable from a function, we need to put the return value
into a variable in our function call. We’ll see how to do this when we learn
about function calls in the next section.
Functions that return something are called returning functions. It is also
possible to create a function that doesn’t return anything. Functions that don’t
return anything are called void functions. When writing a void function, you
don’t need to specify any return value. The function can still take arguments,
161 7.1. FUNCTIONS
but it doesn’t have any output. For example, say you wanted to write a function
that printed out an ASCII cow.1
1 ( __ )
2 ( oo )
3 / - - - - - - -\/
4 / | ||
5 * || - - - -||
6 ^^ ^^
We could write each print statement out, line-by-line, every time you wanted to
print out a cow. This would take some time, though, if we wanted to be able to
print out a cow whenever we wanted to. Instead, we could write a void function
that prints a cow, but that doesn’t return anything. Again, remember that the
print() function and the return value from a function are two different things.
1 def cow () :
2 print ( " ( __ ) " )
3 print ( " ( oo ) " )
4 print ( " / - - - - - - - \\ / " )
5 print ( " / | || " )
6 print ( " * || - - - - || " )
7 print ( " ^^ ^^ " )
In the above example, there’s not a single return statement like we’ve seen
in the previous examples that return floats, integers, or Boolean values. In-
stead, calling this function will just run the code inside of the function, literally
printing a cow to the console.
Void functions can also take arguments. Our previous example was named
cow(), and it only printed a cow, but we could create another function named
animals() that printed one of three animals, which would be passed in as an
argument of type string.
1 def animals ( animal ) :
2 if ( animal = = " cow " ) :
3 print ( " ( __ ) " )
4 print ( " ( oo ) " )
5 print ( " / - - - - - - - \\ / " )
6 print ( " / | || " )
7 print ( " * || - - - - || " )
8 print ( " ^^ ^^ " )
9 elif ( animal = = " pig " ) :
10 print ( " n..n")
11 print ( " e___ ( oo ) " )
12 print ( " ( ____ ) " )
1 Reminder: Watch out for backslashes - you need to escape backslashes by adding another
As you can see, the animals() function takes in one argument animal. Again,
this function has no return type; it’s a void function, even though it takes in
an argument.
1 ( __ )
2 ( oo )
3 / - - - - - - -\/ - moo
4 / | ||
5 * || - - - -||
6 ^^ ^^
1 |\ _ /| , , _____ ,~~ ‘
2 (.".)~~ ) ‘~))
3 ring - ding - ding - ding - ding - \ o /\ / - - -~\\ ~))
4 _ // _ // ~)
163 7.1. FUNCTIONS
In this function, we’re not returning anything, so we don’t need to put the
contents of the function return value into a variable. But, what if we are
returning something? In this case, we need to do something with the return
value. We can either pass it into another function, such as print(), or we can
put its result into a variable for later use.
Consider the following function, which finds the y-value, given a slope, a
x-intercept, and a x-value.
1 def linear (m , x , b ) :
2 result = m * x + b
3 return result
We can then call this function, passing in the three arguments specified and
either printing them (using them in another function) or storing their result in
a variable, since the function does return something.
1 print ( linear (7 , 4 , 4) )
2 print ( linear (3 , 6 , 8) )
3 result1 = linear (3 , 9 , 2)
4 result2 = linear (6 , 2 , 2)
1 32
2 26
Some IDEs also have the ability to gather special bits of information from
your function definition that can be useful in larger projects. This is typically
done inside of a block comment just after the function definition. You should
consult with your IDE’s developer to learn how your IDE can help you with
your function declarations.
In Anaconda Spyder, the comment format is as follows. You can have Spyder
automatically generate the format of the block comment by typing your opening
quotes directly after the function definition.
CHAPTER 7. PYTHON DATA STRUCTURES 164
While this is a block comment and the Python interpreter won’t use anything
inside of it, it’s easier for you to read. As a bonus, any time you click on a
function call in your code and press Ctrl/Cmd + I, Anaconda will show you
exactly what arguments you need to pass in its integrated help window! This
feature and its specific functionality is specific to Spyder.2
Other IDEs might handle the block comment slightly differently. repl.it, a
popular online IDE, doesn’t prepopulate your block comment with anything,
but it will show anything in your block comment if you hover over the function
call, so this is a great place to put what this function does, what its parameters
are, and what it returns. A simple template is provided for you here, if your
IDE doesn’t provide you with one.
1 """
2 Description : Write a description here for what your
function does .
3 Parameters :
4 arg1 : ( int ) Description of the parameter
5 arg2 : ( string ) Description of the parameter
6 Returns :
7 returnVar : ( float ) Description of what the function
returns
8 """
If your IDE supports a specific template, it will auto-generate it once you create
your block comment after your function definition. So, you can simply define
2 As of Spyder v.4.1.4
165 7.1. FUNCTIONS
Exercise Questions
These exercise questions cover chapters 7.1.1 and 7.1.2.
Exercise 115
1. What is a function?
2. What is the difference between a definition and a call?
3. What are function parameters?
4. What are function arguments?
5. What is the difference between a function parameter and argument?
Exercise 116
1. Is there a keyword to define a function? If so, what is it?
2. Is there a keyword to call a function? If so, what is it?
Exercise 117
1. What is a function return?
2. What is the difference between a function return and a print?
3. Is it possible for a function to not return anything?
4. If a function does not return anything, what can we call it?
Exercise 118
1. In a new Python script, define a function called bobross() that has no
parameters and prints the following string: I thought today we would
make a happy little stream that’s just running through the woods
here. Here we’re limited by the time we have. You can bend
rivers. But when I get home, the only thing I have power over
is the garbage. Isn’t that fantastic that you can create an
almighty tree that fast?
2. What is the function call for your function? (This isn’t a trick question!)
Exercise 119
1. In a new Python script, define a function called got() that has one
parameter language.
2. If the user passes in the string dothraki as the argument, print Shieraki
gori ha yeraan!.3
3. If the user passes in the string valyrian as the argument, print Skoros
morghot vestri?.4
4. If the user passes in the string english as the argument, print Winter
is coming..
5. If the user passes in no string or anything other than dothraki, valyrian,
or english, print a message that the user’s argument is invalid.
6. Write four function calls: one for each of the languages and one with a
nonexistent language.
Exercise 120
1. In a new Python script, define a function called got() that has one
parameter language.
2. If the user passes in the string dothraki as the argument, print either
Yer affesi anna or Me zisosh disse.5
3. If the user passes in the string valyrian as the argument, print either
Valar morghulis or Valar dohaeris.6
4. If the user passes in the string english as the argument, print either Our
fathers were evil men. All of us here. They left the world
worse than they found it. We’re not going to do that. We’re
going to leave the world better than we found it. or Give my
regards to the Night’s Watch. I’m sure it will be thrilling.
And if it’s not, it’s only for life..
5. If the user passes in no string or anything other than dothraki, valyrian,
or english, print a message that the user’s argument is invalid.
6. Write four function calls: one for each of the languages and one with a
nonexistent language.
tively.
6 The Valyrian translates to All men must die and All men must serve.
167 7.1. FUNCTIONS
so you don’t have to type in every argument, every single time. Python sup-
ports default function parameters when we define our function, and this can be
extremely useful for quickly calling commonly used functions.
You have already taken advantage of default function arguments when you
have called functions in the past, like the print() function. You already know
how to use the print() function: you put a string argument in it and it prints
it to the console. But, what if we’re concatenating and we want to change what
the separator value is? We can choose to specify the sep argument. What if we
want to change how the print() function ends a line? We can choose to specify
the end argument. sep and end have default values that will be used if these
arguments are not specified when called. Because of these default arguments,
the following two lines are equivalent.
1 print ( " Cats " + " and dogs " , sep = " " , end = " \ n " )
2 print ( " Cats " + " and dogs " )
This means that whenever we need to call the print() function, we don’t
need to write out every single possible argument, only the ones that we want
to change from the default. When you’re writing your own functions, you
can also choose default values to be used if the function call doesn’t specify
the argument. We do this by initializing the variable in the function definition.
That is, when we specify an argument when we specify the function’s definition,
we also initialize a default value for that argument to be used if the function
call doesn’t specify something else. Consider the following function, which
calculates the percentage of faceoffs won, given the number of faceoffs won and
the number of faceoffs lost (this could be used to help analyze hockey or lacrosse
data).
19 if ( fowins + folosses ) = = 0:
20 return 0 . 0
21 else :
22 return float ( fowins / ( fowins + folosses ) )
Python was expecting two arguments in our function call, and since it had no
default value, it results in an error. Now, let’s redefine this function, but let’s
initialize fowins and folosses to 0 in our definition line to assign a default
value of 0 to each of these arguments.
1 def fopct ( fowins = 0 , folosses = 0) :
2 """
3 Calculates faceoff win percentage .
4
5 Parameters
6 ----------
7 fowins : int
8 Number of faceoffs won .
9
10 folosses : int
11 Number of faceoffs lost .
12
13 Returns
14 -------
15 float
16 The faceoff percentage from 0 to 1 .
17 """
18
19 if ( fowins + folosses ) = = 0:
20 return 0 . 0
21 else :
22 return float ( fowins / ( fowins + folosses ) )
Now, if we were to not specify either fowins or folosses when we called the
function, the function would use the default values of 0 and 0 for fowins and
folosses, respectively.
1 print ( fopct () )
169 7.1. FUNCTIONS
0.0
In the above function, if the player argument is not specified, the function will
use the default value of "" (an empty string). However, if it is specified, the
function will use the value that is passed in.
CHAPTER 7. PYTHON DATA STRUCTURES 170
Let’s look at some examples of this function in action. First, let’s consider
if we don’t pass in any of the required arguments.
1 printfopct ()
We still get an error that we’re missing required arguments. However, notice
how even though we specified three possible arguments, Python is only erroring
out on two arguments that are considered required because we never specified a
default value in our function’s definition. The player argument was initialized
to "", since the argument was never specified.
Now, let’s consider if we only pass in the required arguments.
1 printfopct (132 , 71)
Again, player was initialized to "", which was used for the remainder of the
function, since we didn’t specify a value for it in our function call. What if we
specify a value for player?
1 printfopct (132 , 71 , " TD Ierlan " )
In this case, we specified that we don’t want to use the default value of "" for
player and instead specified our own player name ("TD Ierlan").
Exercise Questions
These exercise questions cover chapter 7.1.3.
Exercise 121
1. What is a default parameter?
2. How do we define a default parameter in a function definition?
Exercise 122
1. When calling a function with a default parameter, how do we choose to
use the default parameter?
2. When calling a function with a default parameter, how do we choose to
overwrite the default parameter and use our own argument?
Exercise 123
171 7.1. FUNCTIONS
1. In a new Python script, define a function called got() that has one pa-
rameter language. Set the default parameter for language to valyrian.
2. If the user passes in the string dothraki as the argument, print either
Yer affesi anna or Me zisosh disse.7
3. If the user passes in the string valyrian as the argument, print either
Valar morghulis or Valar dohaeris.8
4. If the user passes in the string english as the argument, print either Our
fathers were evil men. All of us here. They left the world
worse than they found it. We’re not going to do that. We’re
going to leave the world better than we found it. or Give my
regards to the Night’s Watch. I’m sure it will be thrilling.
And if it’s not, it’s only for life..
5. If the user passes in anything other than dothraki, valyrian, or english,
print a message that the user’s argument is invalid.
6. Write five function calls: one for each of the languages, one with no
argument, and one with a nonexistent language.
In this function, we know that the first argument to call is fowins, then
folosses, then player, and whenever we’ve called this before, we’ve always
used this order. But, what if we wanted to change the order? Let’s say we
wanted to specify the player first? This is where we can use keyword argu-
ments to specify exactly which value is which.
1 printfopct ( player = " Max Adler " , fowins = 81 , folosses
= 85)
7 The Dothraki quotes translate to You make me itch and It’s just a flesh wound, respec-
tively.
8 The Valyrian translates to All men must die and All men must serve.
CHAPTER 7. PYTHON DATA STRUCTURES 172
The function call is still correct, even though we specified the arguments out
of order. This is because we specified what value each of the arguments should
be. This works for any function, regardless of whether the argument is required
or not.
You can also mix positional and keyword arguments. You’ve already done
this, actually! Whenever you’ve called the print() function and passed in
an optional argument, you’ve mixed positional and keyword arguments. The
value argument, which is required, is almost always passed in as a positional
argument. We could write
1 print ( value = " Hello , World " )
Exercise Questions
These exercise questions cover chapter 7.1.4.
Exercise 124
1. What is a positional argument?
2. What is a keyword argument?
3. What is the difference between positional and keyword arguments?
4. Can you mix positional and keyword arguments in one function call?
Provide an example to prove your point.
Exercise 125
1. Write a function definition for a new function called students(). Leave
the body of the function empty.
2. Edit your function definition to take three arguments: print, classroom,
and idnum.
3. Set the default parameter for print to be True. Set the default param-
eter for classroom to 100, and idnum to be 0.
4. What would happen if you called students() with no arguments? Would
there be errors? What behavior would you expect? What would be
173 7.1. FUNCTIONS
7.2 Classes
However, this requires us to use two more variables than just storing the infor-
mation in a single string. We could even try to use a dictionary, but we run
into the same representation issue.
1 date1 = { " year " : 1980 , " month " : 1 , " day " : 2}
2 date2 = { " year " : 1980 , " month " : " January " , " day " : 2}
175 7.2. CLASSES
Strictly speaking, both of these are correct, but only one would be valid unless
we managed to code in every single option. This is where classes come in.
Now, let’s look at a class definition that has the above function definition.
1 class Player :
2 """
3 A player on a lacrosse team .
4 """
5
6 def fopct ( self ) :
7 """
8 Calculates faceoff win percentage .
9
10 Returns
11 -------
CHAPTER 7. PYTHON DATA STRUCTURES 176
12 float
13 The faceoff percentage from 0 to 1 .
14 """
15
16 if ( self . fowins + self . folosses ) = = 0:
17 return 0 . 0
18 else :
19 return float ( self . fowins / ( self . fowins + self .
folosses ) )
As you can see, the process of defining classes is remarkably similar to that
of functions. We have a keyword (class) followed by the name of the class
(in this case, Player). More importantly, though, we have actually created a
new datatype: the Player datatype. That’s right: creating a class effectively
creates a datatype that you can then use much more extensively. Let’s say that
we were tabulating lots of athlete information; the Player datatype that we
just created can help us arrange our data within our script much more tidily.
Consider the Player class above. We can actually create a new object of
type Player by simply calling the object.
1 ratlisc01 = Player ()
Now, we have an object called ratlisc01 that we can assign attributes to.
Attributes are characteristics of an object that store information about that
object. You can think of an attribute as a variable inside of a variable. For
example, the ratlisc01 object might have some attributes name, gamesplayed,
and groundballs that are specific to the player Scott Ratliff.
1 ratlisc01 . name = ’ Scott Ratliff ’
2 ratlisc01 . gamesplayed = 9
3 ratlisc01 . groundballs = 19
Since we initialized the name attribute for the ratlisc01 object, we can now
call it as if it were a regular variable.
1 print ( ratlisc01 . name )
1 Scott Ratliff
The ability to put information inside of classes will allow you to abstract out
your programming. By hiding the really gritty stuff and covering it with a tidy
facade (essentially what classes allow you to do), you can write incredibly clean
and efficient code. More importantly, if we’re consistent with our attribute
names, it will allow us to get the data for similar objects easily. If we know
that we’re always creating an attribute Player.name, then we know that we
can find the player’s name in that attribute for an object of type Player.
177 7.2. CLASSES
Exercise Questions
These exercise questions cover chapter 7.2.1.
Exercise 126
1. What is a class?
2. When you create an object of a class, what type is the object?
Exercise 127
1. What is an attribute?
2. To call an attribute of an object, what symbol do you use?
3. What is the difference between a variable and an attribute?
4. What is the difference between a key and an attribute?
Exercise 128
1. What is the Python keyword to create a class?
2. When you create a class in Python, can the class take any parameters
like a function can?
3. How can we determine what is inside of the class?
If we wanted to get Joe Nardella’s faceoff percentage, we already have his win
and loss figures in the object with his information. They’re in the fowins and
folosses attributes inside of the nardejo01 object, so all we need to do is call
the method on his object.
CHAPTER 7. PYTHON DATA STRUCTURES 178
0.5746 60 63 34 84 16 29
Notice how we didn’t need to pass in the fowins or folosses attributes to the
nardejo01 object. The class method pulls that information from the attribute
that we assigned. But how did we tell the method that we wanted to be able to
access attributes inside of that method? Take a careful look at the arguments
that we passed the method definition.
1 # DO NOT RUN
2 class Player :
3 def fopct ( self ) :
Instead of giving the method individual arguments, we are instead giving the
method access to the entire object and letting the method pick and choose
attributes from that object. When we refer to a specific object inside of a
method call, we use the self keyword. That tells Python that we don’t just
want any player. We specifically want Joe Nardella’s faceoff data when we call
fopct() on the nardejo01 object. In this situation, self is a reflection of the
nardejo01 object within a class object call. More broadly, self represents the
specific object that the method is being called on.
So, if we make a new object for Conor Gaffney, we could call the same
fopct() on his object representation, and the fopct() method would take his
faceoff attributes.
1 gaffnco01 = Player ()
2 gaffnco01 . name = " Conor Gaffney "
3 gaffnco01 . fowins = 9
4 gaffnco01 . folosses = 14
5 print ( gaffnco01 . fopct () )
0.3913043 478260 87
Exercise Questions
These exercise questions cover chapter 7.2.2.
Exercise 129
1. What is a method?
2. Where are methods defined for a class?
3. What is the keyword to create a class method?
179 7.2. CLASSES
4. What happens if you try to call a class method on an object that the
method is not a part of?
Exercise 130
1. What does self do inside of a class method?
Exercise 131
Consider the object celery in the hypothetical Vegetable class.
1. What is the type of celery?
2. The Vegetable class has a method called weight() that takes the weight
of a vegetable and prints a string with the weight and the units. How
would you call weight() on celery?
3. If you called the weight() method on celery, what keyword refers to
the celery object inside of the class?
Exercise 132
1. Define a new class House.
2. Inside of the House class, create a new method address() that prints
the street attribute, the city attribute, the state attribute, and the
zip attribute, all concatenated together.
3. Outside of the House class, create a new object of type House and put
the object into a new variable called whitehouse.
4. Assign the whitehouse object the following attributes:
street: 1600 Pennsylvania Avenue NW
city: Washington
state: District of Columbia
zip: 20500
5. Call the address() method on whitehouse. What is the output?
Exercise 133
1. Define a new class TVShow.
2. Inside of the TVShow class, create a new method avgrating() that takes
the ratings attribute, which is list of ratings out of five stars and returns
the average rating.9 As a reminder, the average is calculated by summing
all of the elements and dividing the sum by the number of elements.
3. Outside of the TVShow class, create a new object of type TVShow and put
the object into a new variable called atlanta.
4. Create a new attribute for atlanta called ratings and put in five inte-
gers inside of a list into the ratings attribute.
9 Note that this is a return, not a print.
CHAPTER 7. PYTHON DATA STRUCTURES 180
5. Create a new attribute for atlanta called title and put in Atlanta as
a string into the title attribute.
6. Create a new object of type TVShow and put the object into a new variable
called squidgame.
7. Create a new attribute for squidgame called ratings and put in five
integers inside of a list into the ratings attribute.
8. Create a new attribute of squidgame called title and put in Squid Game
as a string into the title attribute.
9. Print the title attribute from atlanta along with its rating attribute.
10. Print the title attribute from squidgame along with its rating at-
tribute.
We know that there are at least two attributes in the Player class: Player.fowins
and Player.folosses. We can see that we’re using these attributes in the
fopct() method, and if we were to call fopct() on an object of the Player
class that doesn’t have these attributes, we’ll get an error. For example, con-
sider the ratlisc01 object that we defined above. Again, object initialization
has been provided below for your convenience.
181 7.2. CLASSES
Notice how this object doesn’t actually have a fowins or a folosses attribute;
we never initialized these attributes, so if we call the fopct() method on it,
we’ll be calling the method on nonexistent attributes.
1 print ( ratlisc01 . fopct () )
1 class Player :
2 """
3 A player on a lacrosse team .
4 """
5
6 def __init__ ( self , fowins = 0 , folosses = 0) :
7 self . fowins = fowins
8 self . folosses = folosses
9
10 def fopct ( self ) :
11 """
12 Calculates faceoff win percentage .
13
14 Returns
15 -------
16 float
17 The faceoff percentage from 0 to 1 .
18 """
19
20 if ( self . fowins + self . folosses ) = = 0:
21 return 0 . 0
22 else :
23 return float ( self . fowins / ( self . fowins + self .
folosses ) )
Notice how the parameters in the __init__ definition are different from the
class attributes. Sure, the attribute and the variable are named the same,
but the attribute is specific to the class, while the variable is specific to the
global class. So, in our initialization function, we actually need to assign the
variables from the global class (fowins and folosses) to the object attributes
(self.fowins and self.folosses).
What variable?
Now, our Player class has a default constructor that will initialize the
183 7.2. CLASSES
fowins and folosses to 0 if they are not set when the object is created. Let’s
consider the several ways to change these values by creating some more objects.
First, let’s create an object called witheja01, which will be a Player object
for the player, Jake Withers. For this example, we will not pass any attributes
into our class initialization.
1 witheja01 = Player ()
2 print ( witheja01 . fopct () )
1 0.0
When we ran line 1 to create the Player object and put it into witheja01, the
class was created like before, but the __init__ method was also automatically
run, which set the fowins and folosses attributes to 0. Because we already
declared and initialized these attributes to a valid value, we can call the fopct()
method on the witheja01 object and get a valid output of 0.0.
Now, let’s change the attributes in the witheja01 object.
1 witheja01 . name = " Jake Withers "
2 witheja01 . fowins = 117
3 witheja01 . folosses = 98
Now, when we call the fopct() function again, we’ll be using the updated
values.
1 print ( witheja01 . fopct () )
0. 54 41 86 04 65 11 62 79
Now, let’s bypass the default values and just initialize the values when we create
the object. We’ll create a new object adlerma01 and initialize the values for
fowins and folosses directly in our object initialization.
1 adlerma01 = Player ( fowins = 81 , folosses = 87)
2 print ( adlerma01 . fopct () )
0 .4 8 2 14 2 8 5 71 4 2 85 7 1 5
Because we initialized the values according to the names that we defined in our
__init__ function, we can immediately start using these values.
We can also change these values just like we changed the values for witheja01
by reassigning the attribute value. For example, we got the folosses attribute
wrong - it should be 86, not 87. We can just reassign it, and while we’re at it,
let’s also give adlerma a name attribute.
1 adlerma01 . folosses = 86
2 adlerma01 . name = " Max Adler "
CHAPTER 7. PYTHON DATA STRUCTURES 184
0.485 02 9 9 4 01 1 9 76 0 4 7
Exercise Questions
These exercise questions cover chapter 7.2.3.
Exercise 134
1. What is a default constructor?
2. What is the Python keyword to create a default constructor?
3. What symbols surround the keyword that create a default constructor?
Exercise 135
1. Consider the variable size and the attribute banana.size. Is size
referring to the same thing in these two calls? Why or why not?
2. What type of error might we get if we try to call a nonexistent attribute
of a class?
3. Suppose we create a default constructor for a class, but we don’t set any
default values for a class attribute that was defined in the constructor.
Then, we try to create a new object without defining the attribute in
our object creation. That is, we do something like this.
1 class ClassName :
2 def __init__ ( self , attribute1 ) :
3 self . attribute1 = attribute1
4 newObject = ClassName ()
Exercise 136
1. Define a new class Food.
2. Define a default constructor inside of the Food class with the name and
calories attributes. Do not set a default value for either name or
calories.
3. Create a new object of type Food with the name attribute Carrot and
the calories attribute 65, and put the object into a new variable called
carrot.
4. Create a new object of type Food with the name attribute Banana and the
calories attribute 100, and put the object into a new variable called
banana.
185 7.2. CLASSES
5. Attempt to create a new object of type Food with the name attribute
Soup, but do not give a calories attribute, and put this object into a
new variable called soup. Do you get an error message? If so, present
the error message. If not, what is the value of calories?
Exercise 137
1. Define a new class Farm.
2. Define a default constructor inside of the Farm class with the name and
the crop attributes. Set the default crop to be corn, but do not set a
default for the name.
3. Without creating a new object, which attribute is required? Which
attribute is not required?
4. Create a new object of type Farm inside of the new variable sunset.
Set the name of the farm to be Sunset Farms and set the crop to be
strawberries.
5. What is the value of sunset.name? Where did this value come from?
6. What is the value of sunset.crop? Where did this value come from?
7. Create a new object of type Farm inside of the new variable myers. Set
the name of the farm to be Myers Family Farms, but do not set the
crop.
8. What is the value of myers.name? Where did this value come from?
9. What is the value of myers.crop? Where did this value come from?
x is odd
This is a contrived example for the sake of demonstration, but the idea still
holds: conditional logic statements in Python can be nested. The same concept
CHAPTER 7. PYTHON DATA STRUCTURES 186
holds for Python classes. For example, let’s consider the Player class from the
previous section. It is printed below for your convenience.
1 class Player :
2 """
3 A player on a lacrosse team .
4 """
5
6 def fopct ( self ) :
7 """
8 Calculates faceoff win percentage .
9
10 Returns
11 -------
12 float
13 The faceoff percentage from 0 to 1 .
14 """
15
16 if ( self . fowins + self . folosses ) = = 0:
17 return 0 . 0
18 else :
19 return float ( self . fowins / ( self . fowins + self .
folosses ) )
This class has one method in it that returns the faceoff win percentage. But,
let’s say we wanted to split the first and last name apart and have a method
that put them together. We could just have two separate attributes for first
and last name, but we could also make a nested class called Name that has its
own concatenation method that returns the full name. Consider the following
script.
1 class Player :
2 """
3 A player on a lacrosse team .
4 """
5
6 class Name :
7 """
8 A name of a person .
9 """
10 def __init__ ( self , first = " " , middle = " " , last =
""):
11 self . first = first
12 self . middle = middle
13 self . last = last
14
187 7.2. CLASSES
We still have our fopct() method in our class, but we also now have a sub-
class called Name with its own method fullname(). Because fullname() is
nested inside of the Name class, it would not be syntactically correct to call the
fullname() method on Player, but we could call it on Name. How would this
look?
Let’s consider how we’ve always seen classes before. When something is an
attribute or a method of a class, it’s been prefaced by its parent class and a
period. This is a one-level nest, but we can have multi-level nests, too, like our
updated Player class now has. So, let’s create a new Player object that stores
a name inside of the Player.Name class.
1 simondr01 = Player ()
2 simondr01 . name = Player . Name ()
3 simondr01 . name . first = " Drew "
4 simondr01 . name . last = " Simoneau "
5 simondr01 . fowins = 31
6 simondr01 . folosses = 43
Let’s break this down. Line 1 should look very similar by now. We’re simply
CHAPTER 7. PYTHON DATA STRUCTURES 188
making a new Player object and assigning it to the simondr01 variable. Inside
of the Player class, there is a subclass called Name that has three attributes:
first, middle, and last. The Name class also has an default constructor that
will initialize any empty attribute to an empty string. We have then placed
the Name object inside of the name attribute inside of the simondr01 object.
Then, we assigned the values "Drew" and "Simoneau" to the first and last
attributes of our Name object inside of our Player object. Finally, we assign
the fowins and folosses to the Player object, just as before.
If we want to refer to something inside of the Player class (which we’ll call
the parent class), we can just reference that attribute from the parent class. If
we want to refer to something inside of the Name class (which we’ll call the child
class), we can reference that attribute from the child class inside of the parent
class.
We can call the fopct() method, just like before.
1 print ( simondr01 . fopct () )
0.4189 18 91 89 18 91 89
Now, however, we can call the fullname() method on the Name class inside of
the Player class.
1 print ( simondr01 . name . first )
2 print ( simondr01 . name . last )
3 print ( simondr01 . name . fullname () )
Drew
Simoneau
Drew Simoneau
In this chunk, we’re calling the first and last attributes from the name class,
then we’re calling the fullname() method from the name class. Take note of the
two periods to get to the nested class. We have the period in between simondr01
and name, which gets the parent object, and another period in between name
and first, which gets the child object.
Exercise Questions
These exercise questions cover chapter 7.2.4.
Exercise 138
1. What is a nested class?
2. Give two reasons why we might want to nest a class.
Exercise 139
189 7.2. CLASSES
Exercise Questions
These exercise questions cover chapter 7.3.
No exercise questions exist for this section yet.
CHAPTER 7. PYTHON DATA STRUCTURES 192
File Handling
In chapter 4.2, we went over variables, and it was noted that ”all data that
is worked on in variables is stored in memory when you’re running a Python
script.” ”Since all of your variables are stored in memory, they can only persist
while the program itself is running. After your program is terminated, the
memory spaces is marked as free by the operating system, meaning that any
other program is now free to overwrite that memory.” But what if we want to
persist data between sessions? The simplest way to do this is by interacting
with files that are stored at the secondary or tertiary levels. Python has the
ability to both read and write to those files in several different ways, each of
which has its own advantages and disadvantages.
193
CHAPTER 8. FILE HANDLING 194
In this code, we’re opening the file named fileToOpen.txt in read mode, as
indicated by the second argument of the open() function. We’re storing the
195 8.1. READING FILES
file object that the open() function returned in the variable file. By itself,
file isn’t very useful, so we need to read() the contents of the file into
fileContents. We can then do whatever we’d like to fileContents. No
additional changes will or can be made to the original fileToOpen.txt.
However, when you run the read() function, you’ll probably notice that
it puts everything on one line. This is because as the string is returned from
the read() function, unnecessary whitespace is discarded ”as a courtesy” to
the programmer. If you want to keep the whitespace, you’ll need to use the
readline() method instead of the read() method. readline() reads just one
line from the file object. Consider the following code.
1 file = open ( " fileToOpen . txt " , " r " )
2 print ( file . readline () )
This code would actually only read the first line of the text file, which might
be useful, but you probably want the entire text file. Instead, we can run
readline() multiple times.
1 file = open ( " fileToOpen . txt " , " r " )
2 print ( file . readline () )
3 print ( file . readline () )
This code will read the first two lines of the text file. Again, this might be useful
if you know that you only have two lines. However, it’d probably be easier to
read the entire file line-by-line. We can do this by iterating through the file
object using a for loop, where the iteration variable is any of your choosing
and the list to iterate through is the file itself. Each element of the file is a line.
Consider the following code.
1 file = open ( " fileToOpen . txt " , " r " )
2 for i in file :
3 print ( i )
This will print each line in the file. We could also store the contents of the
file in a list. Let’s say that you had a list of words, with one word per line.
You could read your list of words line-by-line and put each word into the next
element of your list.
1 file = open ( " fileToOpen . txt " , " r " )
2 listOfWords = []
3 for i in file :
4 listOfWords . append ( i )
When you’ve finished working with a file, it’s important to close the file. Even
if you open the file in read-mode, closing the file allows the memory that was
used to point to the file to be freed, and it’s especially important when editing
and writing to files. Regardless of how you’re working with a file, you should
get into the habit of closing a file when you’re done with. To indicate that
you’d like to close the file, you can use the close() method on the file object.
CHAPTER 8. FILE HANDLING 196
1 file . close ()
It’s worth noting that Python will look for the file in the current working di-
rectory. If you’re using a local IDE, like Anaconda Spyder, you can run the
command pwd in the interactive Python shell, then put your file in that same
directory in order for it to be found by your Python script. You can also
navigate around your current working directory by using relative filepaths:
. means your current directory and .. means one directory up. Alterna-
tively, you can specify an absolute directory. An absolute directory path is
one where the entire filepath is written, from the root to the file itself. If
./fileToOpen.txt is using the relative filepath, the absolute filepath might
be Users/Guest/Downloads/fileToOpen.txt. You should be able to view the
properties of a file in your operating system to view its absolute filepath.
If you’re using an online IDE, like repl.it, you can just reference the file
by name unless it’s in a subfolder. When using online IDEs, your file is al-
most always stored in the same directory as your Python script. If your file
is in a subfolder from your script, you can just specify the subfolder before
naming the file, along with a forward slash. For example, if your file is in the
TestDocuments directory, you can specify that your file should be read from
TestDocuments/fileToOpen.txt.
Exercise Questions
These exercise questions cover chapter 8.1.
Exercise 140
1. What function did we use to access files in Python?
2. What does the r stand for in the following line?
Exercise 141
For this exercise, consider the following text which is stored in a file called
officequote.txt.
Exercise 142
1. Download the movies.txt file from the textbook materials and save it
to your Downloads folder.1
2. Write some Python code to read the file from your Downloads folder and
put the file object into a variable called movies.
3. Write some Python code to print just the first line of the movies.txt
file.
4. Write some Python code to put each line of the movies.txt file into a
list element, then print the first 50 elements of that list.
5. Write some Python code to print the entire file.
1 https://pythonforscientists.github.io/data/data/movies.txt
CHAPTER 8. FILE HANDLING 198
1 This is a file
2 It has stuff in it
3 Something to add to the end
As you can see, we’re opening the file that we’ve specified in append mode.
We’re then using our read() function to read the current contents of the file
before adding a new line to the end of the file using the write() method on the
file object. Like we mentioned above, it’s important to close your file once
you’re done with it, but it’s extra important when you’re writing or reading to
a file. While you have a file open, it means that no other process can touch
that file.
Overwriting a file works in very much the same way. In fact, all we’re going
to do is change the file mode.
1 This is a file
2 It has stuff in it
The code is identical, except for the writing mode on the first line. However,
if we were to try and open this file in a text editor, we’d find none of its old
contents - they’ve all been overwritten by our new write line.
Exercise Questions
These exercise questions cover chapter 8.2.
Exercise 143
1. What is a situation where we would want to open a file in read-only
mode?
2. What is a situation where we would want to open a file in read-write
mode?
CHAPTER 8. FILE HANDLING 200
Exercise 144
For each of the following questions, provide the code you used to achieve
the answer.
1. Using Python, create a new file in overwrite mode. Name the file
candy.txt. Write the text Snickers in the file, then close the file.
Exercise 145
1. Download the pokemon.txt file from the textbook materials and save it
to your Downloads folder.2
2. Using Python, read the pokemon.txt line-by-line into a list called pokemon.
3. There is a very popular Pokémon missing from the list. Who is it? (Hint:
Ryan Reynolds voiced this Pokémon in a 2019 movie.)
4. Add this missing Pokémon to this pokemon list.
5. Write out a new file to your Downloads folder called allpokemon.txt
that includes the entire pokemon list, including the one you just added.
Each Pokémon should be on its own line.
2 https://pythonforscientists.github.io/data/data/pokemon.txt
201 8.3. DIFFERENT KINDS OF FILES
Exercise Questions
There are no exercise questions for chapter 8.3.
®
3 Please don’t take this as an endorsement to start faking your own .gpx files and uploading
them to Strava . You’ll probably get flagged, and that’s not my problem.
CHAPTER 8. FILE HANDLING 202
Chapter 9
Jupyter Notebooks
Jupyter Notebooks are a very useful data analysis tool, especially among data
scientists and Python programmers, since they allow you to run code inline
with your document. There are many tools that allow you to create and edit
Jupyter Notebooks, such as Google Colab or Anaconda.
203
CHAPTER 9. JUPYTER NOTEBOOKS 204
When you’re writing in a Jupyter Notebook, you can either write in a mark-
down block or a code block. Your markdown code will only be rendered, not
executed. Even if you specify that a chunk of code should have its syntax high-
lighting in Python, it will not be executable. Essentially, your markdown code
is read-only. However, your code blocks are read/write/executable. You can
execute any of the Python code inside of a dedicated Jupyter Notebook code
block.
205 9.2. BASIC MARKDOWN SYNTAX
Jupyter Notebooks support standard markdown syntax, such as what you might
use on an Internet forum or on GitHub in .md files (like READMEs or Con-
tributing files). If you’ve never worked with markdown before, it’s not too
difficult. Markdown is just a way to change the appearance of plain-text while
writing in plaintext. When writing markdown, you’ll typically write it in a
plain-text form, then open the same document in a markdown renderer. When
you’re working with Jupyter Notebooks, your notebook will be your markdown
renderer.
The most basic of the markdowns is paragraph text. Paragraph text is
written with no extra or special symbols.
You can add additional emphasis styling to your paragraph text, just like
you would in any other word processor: italics and bold. To italicize something,
use single asterisks to mark the beginning and end of the italicized area. To bold
something, use double asterisks to mark the beginning and end of the bolded
area. You can also use combinations of asterisks to denote both italicized and
bolded text. To create a line break, add an extra blank line, as a single line
break won’t add an extra line. Your symbols cannot span multiple lines, so if
you want more than one line to be emphasized, you need to write more than
one set of symbols for the respective emphasis symbol sequence.
1 // C ++ syntax highlighting
2 std :: cout < < " Hello , World ! " < < endl ;
3 std :: cout < < " This is C ++ ! " < < endl ;
1
The exact coloring and style of your Jupyter Notebook code will depend on
the renderer that you’re using.
At the top of your project and to divide each of the subsections, you might
want to include a header. There are six header sizes, which are determined by
how many # symbols are added before your heading text.
1 These colors have been simulated.
207 9.2. BASIC MARKDOWN SYNTAX
You can also include links in your code. To create a hyperlink, wrap the link
text in brackets [] followed by the URL in parentheses ().
We ’ re writing code in [ Python ]( https :// www . python . org
/) .
The best way to get to know how to use and write markdown is to just write
more text in markdown. Before long, it will become second nature!
Exercise Questions
These exercise questions cover chapter 9.2.
No exercise questions exist for this section yet.
CHAPTER 9. JUPYTER NOTEBOOKS 208
Exercise Questions
These exercise questions cover chapter 9.3.
No exercise questions exist for this section yet.
Chapter 10
In all honesty, Python makes a bad web development language and an even
worse desktop development language.1 However, one of Python’s biggest strengths
is in scientific computing and in statistics. If you’ve used R before, you’ll find
that Python isn’t that much different. In fact, both R and Python are built on
the C programming language! It does use a different set of packages, but you’ll
find that with a little bit of statistical translation, you can do everything that
you could do with a dedicated statistics programming language, such as R or
SAS.
1 This is not a fact, the author just really hates Python for desktop and web development.
209
CHAPTER 10. DATA ANALYSIS WITH PANDAS 210
While R has much of its functionality baked right into Base-R (as a dedicated
statistics language), Python requires you to load in the appropriate libraries.
The hands-down most popular library is Pandas, and it’s an incredibly powerful
package that’s appropriate for all sorts of data analysis. There are many other
packages out there that are actually built on Pandas, like NumPy (pronounced
Num-Pie, not num-pee), TensorFlow, Scikit-Learn, Matplotlib, Seaborn, and
many others. Much of the inner workings of economics services are built using
Pandas, too, like Robinhood, Quandl, Morningstar, and Google Finance.
Pandas’s Etymology
Take your mind back to chapter 8, when we were working with a new library
called csv. Well, like csv, Pandas is also a library. In fact, we can use the same
import statement as we used with csv.
1 import pandas
However, when you look at most code that uses Pandas, you’ll notice that
it’s aliased. That means that the library name pandas has been assigned a
nickname pd that can be used to reference library methods at any point in that
script. The common pandas alias is pd, and we can create a library alias by
using the Python keyword as.
1 import pandas as pd
Now, we can access the pandas method by using pd. For example, instead of
typing out pandas.DataFrame(), we can just call pd.DataFrame().
Exercise Questions
These exercise questions cover chapter 10.1.
Exercise 146
Consider the following block of code.
1 import pandas
2 df = pandas . DataFrame ()
What are the keywords? How are these two code chunks the same? How are
they different?
Exercise 147
1. What is a library alias?
2. At least three aliases were introduced to you in this section. Give some
examples of aliases.
3. Why should you not use any other alias other than pd for Pandas?
Exercise 148
1. Import a hypothetical class called Dragon with the alias dr.
2. Import a hypothetical class called LongLibraryNameGoesHere with the
alias longlib.
3. Suppose there were a method inside of the Dragon library called breath().
How could you refer to the breath() method if the Dragon library were
aliased with dr?
CHAPTER 10. DATA ANALYSIS WITH PANDAS 212
10.3 Series
Series aren’t used nearly as much as dataframes are in Python data analysis.
Series are most akin to lists in Python, but they differ in the way that the data
is stored and the methods that are available to use. However, it is possible to
typecast from a Python list to a Pandas series and vice-versa.
That being said, series are still important to cover, as they are the fun-
damental building blocks of dataframes, which we’ll cover in the next section.
Cast your mind back to when we looked at lists. A list might look like the
following.
1 var1 = [56 , 52 , 38 , 62]
If we look at the type of this list, we see that it’s a type list.
1 type ( var1 )
Now, let’s make this Python list into a Pandas series. Remember, a list looks
like a series.
1 var1 = pd . Series ( var1 )
2 type ( var1 )
1 < class ’ pandas . core . series \ index { Pandas ! series }. Series ’ >
We have actually typecast the var1 object from a Python list into a Pandas
series. Observe how both series and lists are one-dimensional. However, what
makes series special is how their indices are inherent and modifiable.
To make this list, we used the Pandas Series method. Because Series is
in the Pandas class, we need to call Series as a method on Pandas, but as
shown, we’re using its alias pd instead of typing out pandas. We could just as
easily type out the full class as pandas instead of pd.
1 var1 = pandas . Series ( var1 )
However, since the pd moniker is so well known, we’ll just use its alias.
In order to make a series, we will always use the Series method from the
Pandas class. If we wanted to create an empty series, we would just pass nothing
into the method.
1 emptySeries = pd . Series ()
emptySeries is of type series, but it has nothing in this series. We can also
pass in a single list, as we did above.
1 var2 = pd . Series ([122 , 139 , 185 , 115])
CHAPTER 10. DATA ANALYSIS WITH PANDAS 214
If we tried to print the list before we typecast var1, we would end up with
something like this.
1 [56 , 52 , 63 , 38]
It prints just like any other list, complete with square brackets and commas.
However, when we print our series, we see something a little bit different.
1 print ( var1 )
1 0 56
2 1 52
3 2 63
4 3 38
5 dtype : int64
In our second column, we see the values that we typecast from our list into our
series, but in the first column, we also see index values, starting at zero. Using
this index, we can actually extract data from the series just as we did when we
were working with lists. We will use square brackets to indicate which index
we want.
1 print ( var1 [1])
1 52
We could also create our own indices for a series. Recall how we made our
var1 series. We passed in a list only. In order to create an index, we can also
pass in the argument index, which should be of type list.
1 var3 = pd . Series ([3 , 2 , 2 , 3] , index = [ ’a ’ , ’b ’ , ’c ’ ,
’d ’ ])
215 10.3. SERIES
Now, if we attempt to look at var3, we’ll notice how it still has our values in
the second column, but the first column has the index that we specified as a
list.
1 a 3
2 b 2
3 c 2
4 d 3
5 dtype : int64
Make sure that your index has the same number of elements as
the data that you are putting into the array. If your indices are
mismatched, you will end up with a syntax error.
Getting data out of a series with an explicitly set index is similar to how we
get data out of a dictionary. We still use square brackets, and we put the index
that we defined. For example, if I wanted the c’th element of the var3 series,
I could just refer to the c’th element as a string.
1 print ( var3 [ ’c ’ ])
1 2
Because c is of type string, we had to put our index into our square brackets
inside of quotes, similar to how we need to put keys of a dictionary into quotes.
After learning how to change the index, you may have realized that a series
can act like both a list or a dictionary! If not, now you know. When we
referenced our series without an explicit index, we could just refer to a series
element by that index number.
1 print ( var1 [1])
1 52
1 3
CHAPTER 10. DATA ANALYSIS WITH PANDAS 216
Let’s look at a more concrete example. Consider this list which has been type-
cast to a series.
1 speeds = pd . Series ([84 , 93 , 66 , 89 , 58 , 59])
If we wanted to refer to the n’th element of the speeds series, we would just
refer to the index number as n.
1 print ( speeds [ n ])
Now, let’s consider a list which has been typecast to a series, but which has an
explicit index.
1 agility = pd . Series ([90 , 90 , 96 , 86] , index = [ ’
Cristiano Ronaldo ’ , ’ Lionel Messi ’ , ’ Neymar ’ , ’ Luis
Suarez ’ ])
In order to get any of the data out of the agility series, we need to know the
indices or we need to just print the entire series.
1 print ( agility )
2 print ( agility [ ’ Neymar ’ ])
1 Cristiano Ronaldo 90
2 Lionel Messi 90
3 Neymar 96
4 Luis Suarez 86
5 dtype : int64
6 96
If we didn’t want to create a dictionary-like series as two distinct lists (one list
with the data, one list with the indices), we can actually pass in a dictionary
into the Series method and leave out the index argument altogether. Consider
the following dictionary.
1 composure = { ’ Cristiano Ronaldo ’: 86 ,
2 ’ Lionel Messi ’: 94 ,
3 ’ Neymar ’: 80 ,
4 ’ Luis Suarez ’: 84}
If we pass this dictionary into the Series method, Pandas will typecast the
dictionary into a series using the keys as the index and the values as the data.
1 composure = pd . Series ( composure )
2 print ( composure )
1 Cristiano Ronaldo 86
2 Lionel Messi 94
217 10.3. SERIES
3 Neymar 80
4 Luis Suarez 84
5 dtype : int64
Just like in a dictionary and like above, we can refer to a data value in this
series by its index (the equivalent to a key in a dictionary, if a index were a
string).
1 print ( composure [ ’ Lionel Messi ’ ])
1 94
By now, you should have also noticed that when we print a full series, we also
have a little line at the bottom that starts with dtype. This tells us the type
of data that is in our series. If all of the datatypes in a series are the same, the
dtype will represent that. Pandas tries to store data in the simplest method
possible, just like Python. So, if you pass it in an integer, it’ll try to store that
value as an integer, rather than a float.
Like a list, we can mix our datatypes between each element in a series. You
can mix floats with integers, strings with floats, booleans with strings, and every
other combination out there. However, when we mix datatypes, the dtype that
is shown somehow has to represent all of the data. Because of this mixed data,
Pandas will just return a datatype of ”object.” However, Pandas will maintain
the datatype in that series element, meaning that filling an element with a
string will keep it a string, even if the dtype of the entire series is listed as
object.
1 mixeddtype = pd . Series ([ ’ spinach ’ , 48 , 9 . 1])
2 print ( mixeddtype )
3 print ( type ( mixeddtype ) )
4 print ( mixeddtype [0] , mixeddtype [1] , mixeddtype [2])
5 print ( type ( mixeddtype [0]) , type ( mixeddtype [1]) , type (
mixeddtype [2]) )
1 0 spinach
2 1 48
3 2 9.1
4 dtype : object
5 < class ’ pandas . core . series \ index { Pandas ! series }. Series ’ >
6 spinach 48 9.1
7 < class ’str ’ > < class ’int ’ > < class ’ float ’ >
Observe how in the first output (where we print the entire series), we see that
the dtype is of type object. This means that the elements inside of our se-
ries are mixed. However, the complex datatype is still a series. Just how we
CHAPTER 10. DATA ANALYSIS WITH PANDAS 218
grabbed data out of our series before, we can do the same by referring to indi-
vidual elements by index number. We can also print the datatypes of individual
elements by using the type function, where we see that all of the datatypes that
we initially gave Python are maintained (string, integer, and float).
What if we want to add a single element to a series? On the surface, this
seems like an easy enough topic. We’ve already seen an concat() method
used for lists, so it must not be too different, right? Pandas actually has an
concat() method for the Series class that will add an object of type Series
to an existing series. However, there is a key difference in the way that the
list.concat() works and the way the Series.concat() method works. When
we called the lists’s append() method, we simply called the method on the list,
which modified the list. However, the Pandas series concat() method does not
inherently modify that series like the list method does. Instead, it only returns
the modified list. If we want to save this modified list, we need to pass it into
a variable. The most common way to do this is by passing the return of the
appended series back into itself, which updates the value of that variable with
the appended series. Consider the following list, which has three players on the
Vancouver Canucks NHL team in the 2021-2022 season.
1 players = [ " Justin Bailey " , " Brock Boeser " , " Madison
Bowey " ]
2 players . concat ( " Guillaume Briseboise " )
3 print ( players )
If we try to do the same with a Pandas series, it won’t work quite the same
way.
1 players = pd . Series ([ " Justin Bailey " , " Brock Boeser " ,
" Madison Bowey " ])
2 players . concat ( " Guillaume Briseboise " )
Above, we mentioned that the concat() method on a Series will add an ob-
ject of type Series to an existing series, but we just tried to pass a string
("Guillaume Briseboise") into the concat() method, causing a TypeError
to be thrown. This is easy enough to remedy. We simply have to pass in
”Guillaume Briseboise” as an element of a Pandas series.
1 players = pd . Series ([ " Justin Bailey " , " Brock Boeser " ,
" Madison Bowey " ])
2 players . concat ( pd . Series ([ " Guillaume Briseboise " ]) )
219 10.3. SERIES
In this case, we are passing in a Pandas series with one element (”Guillaume
Briseboise”) into the concat() method. This is especially useful if we want to
pass more than one element into our series. For example, consider the following.
1 players = pd . Series ([ " Justin Bailey " , " Brock Boeser " ,
" Madison Bowey " ])
2 players . concat ( pd . Series ([ " Guillaume Briseboise " , "
Kyle Burroughs " ]) )
However, if we try to print the series in this state, it won’t look quite right.
1 print ( players )
0 Justin Bailey
1 Brock Boeser
2 Madison Bowey
dtype : object
Where are Guillaume and Kyle? Recall how we mentioned how only calling the
concat() method on the series object doesn’t actually change the series object;
it only returns a series object that has been modified. Instead, we need to put
the returned series with the appended player names back into the players
series.
1 players = pd . Series ([ " Justin Bailey " , " Brock Boeser " ,
" Madison Bowey " ])
2 players = players . concat ( pd . Series ([ " Guillaume
Briseboise " , " Kyle Burroughs " ]) , ignore_index =
True )
3 print ( players )
0 Justin Bailey
1 Brock Boeser
2 Madison Bowey
3 Guillaume Briseboise
4 Kyle Burroughs
dtype : object
Recall how series can be indexed using a manual index. However, in this case,
we never assigned a manual index when we created the series, hence why our
players object was assigned an auto-incrementing integer index, starting at 0.
Because of this, we want Pandas to ignore a possible manual index and continue
to index the new row as it has indexed all of the other rows. We do this by
passing in the ignore_index argument in the concat() method. The default
is to use a null index, which we don’t want, so we should specify that we want
to ignore the index by setting the ignore_index argument to True.
CHAPTER 10. DATA ANALYSIS WITH PANDAS 220
0 Justin Bailey
1 Brock Boeser
2 Madison Bowey
0 Guillaume Briseboise
1 Kyle Burroughs
dtype : object
Exercise Questions
These exercise questions cover chapter 10.3.
Exercise 149
1. What Python datatypes do a Pandas Series mimic?
2. What is a Pandas series?
3. What is the Pandas method to create a series?
Exercise 150
1. By default, how are Pandas series indexed?
2. If you use the default index, is a Pandas series more akin to a Python
list or a Python dictionary?
3. What argument can you pass to the Series method to set the index?
4. If you set an explicit index, is a Pandas series more akin to a Python list
or a Python dictionary?
Exercise 151
1. Create a variable fruits of type list. Inside of the list, add the elements
banana, apple, and orange.
2. Typecast fruits to a Pandas series.
3. Print the second element of the series (apple).
Exercise 152
221 10.3. SERIES
Exercise 153
1. Create a variable bikeweights of type pandas.Series. Inside of the
series, add the indices Canyon Ultimate CF, Specialized Tarmac, and
Ridley Helium with the values 5.68, 6.7, and 7.57, respectively. Do
not create a dictionary to do so; instead, add your values directly to the
Series method.
2. Print the value for a Specialized Tarmac.
3. Print the value for a Canyon Ultimate CF, and append KG to the end of
the printed value.
4. Print the index and value of a Ridley Helium.
5. Add an index Trek Domane with the value 8.82 to the bikeweights
series.
6. Print the index and value of a Trek Domane, then append KG to the end
of the printed value.
CHAPTER 10. DATA ANALYSIS WITH PANDAS 222
10.4 Dataframes
Now that we’ve seen series, we can begin to explore dataframes. A dataframe
is yet another complex datatype, but it is among the most powerful of complex
datatypes out there because of its flexibility and the sheer number of methods
that exist to manipulate that data.
Most data comes to data scientists as a CSV file. Whether it’s been cleaned
or not, they tend to follow the same form.
The general form behind most datasets is that you have some ID to keep
track of your entries and a set of variables that hold the data for each entry.
This wouldn’t really fit neatly into any preexisting data structure, and while we
could probably write our own class to handle this data, we don’t have to, since
Pandas comes with its own data structure that was designed to hold CSVs and
other datasets: the Pandas dataframe.
A dataframe is made up of Pandas series (hence why we covered those
first). Each column of a dataframe is composed of a single Pandas series, and
the process of putting series side-by-side in a certain order creates a dataframe.
However, there are two major things to note: because a dataframe is a composite
structure, you cannot edit the index of one column without altering the index
of all of the columns, and the column names do not correspond to variables as
they do in a pure series. This means that in the above series, we cannot simply
call the var1 series independently of the dataframe. We instead need to call
the column as a part of the dataframe.
You can think of a dataframe as a data structure that emulates the format
of a two-dimensional spreadsheet. If you are familiar with other programming
languages such as C++ or C#, then you can associate the dataframe with a
two-dimensional array.
Pandas dataframes require you to use a certain data format in order for
the data to be read and associated properly. When you import your data, you
should set it up with your individual variables on row 1 and each of your trials
or data entries indexed at column 1. This will allow Python to determine the
datatype of an entire column. On the surface, this seems trivial, but making
sure that your data is formatted correctly before you import it will mean that
you’ll be able to analyze it in a somewhat standard manner.
If your data isn’t stored in the format that you required but there is some
standardization to the format of the data, then consider using your already-
known standard file reading and writing skills to read the file into memory,
223 10.4. DATAFRAMES
As you might have inferred, this is not that dissimilar for how we might have
created an empty list or dictionary by calling the list() or dict() functions
in Python. When we first create an empty dataframe, it is assigned no size in
either the row or column direction. That means that, strictly speaking, it is a
zero-dimensional object.
The way dataframes are structured make them conducive to adding rows
over columns, just like how when we make a spreadsheet in Microsoft Excel or
Google Sheets, we will create the columns, then fill rows with different data.
Because of this, it is much easier to define the column-wise dimension of your
dataframe than the row-wise dimension when we create the dataframe in the
first place. We can do this by passing a list type into the columns argument
of the DataFrame() method. This will tell Pandas to create columns with the
names of the elements in the list that you passed in. It won’t tell Pandas any-
thing about the datatypes, but at least we’ll have our column-wise dimensions
for our dataframe.
For example, let’s create a dataframe for some ice hockey data with the col-
umn names name, goals, and toi, which will stand for player name, number of
goals scored, and the amount of time on the ice that they’ve spent, respectively.
CHAPTER 10. DATA ANALYSIS WITH PANDAS 224
We know what our columns names will be, so we can pass these in as elements
of a list into the columns argument.
1 hockey = pd . DataFrame ( columns = [ " name " , " goals " , " toi
" ])
2 print ( hockey )
1 Empty DataFrame
2 Columns : [ name , goals , toi ]
3 Index : []
Our dataframe is still empty because there isn’t any data in the row-wise
dimension, but we now see that we have three columns when we print out the
dataframe.
Reading a CSV
Pandas has its own method for reading in a comma-separated values, or CSV,
file. Using the Pandas method will allow you to put the data directly into a
Pandas dataframe, rather than having to shoehorn it into a Python datatype,
then typecast it to a Pandas dataframe.
The built-in Pandas method for reading a CSV file is .read_csv(), and it
returns a Pandas dataframe. The default arguments for the read_csv() method
are to read a CSV file that was created from a Microsoft Excel, Libreoffice
Calc, or Google Sheets file. All three pieces of software create CSV files that
adhere to some form of ”typical” (although there is no standard for CSV files).
For example, some CSV files may use different demarcations for strings, cells,
headers, and any number of other changes.
The header default is to infer whether there is one. Pandas will look
at the datatypes of the potential column and evaluate what datatype it
is compared to the datatype of the potential header. If the datatypes
match with the header, it will infer that you have no header, but if all of
your header row is all strings, Pandas will likely infer that you do have a
header.
Pandas will assume that you have no indexing column and it will make
its own for you.
Because these are the default arguments for the read_csv() method, we don’t
need to explicitly set these arguments when we use the method. In fact, we
only need to pass in one argument: the actual CSV file itself.
Because this is a required argument, we don’t even need to specify the
position of the argument in the method call. We can instead just pass in the
location of the file as a string in the first position of the method call. Let’s say
that my file was called skaters.csv and it was located in the current working
directory. We could just run the following line to put skaters.csv into a
dataframe called skaters, then print the head of the skaters dataframe.
[5 rows x 62 columns ]
CHAPTER 10. DATA ANALYSIS WITH PANDAS 226
Where am I?
To find what your current working directory is, you can just run
the pwd command in the Python shell. pwd stands for ”print
working directory.”
Absolute Filepaths
Sometimes, it’s easier to just give the entire file string. This is
called an absolute filepath. On Unix-like systems, this is easy:
just preface the first directory with /. Your home directory is
located in /home/yourname/. On Windows, start with C:/ in-
stead. Your home directory is located in C:/Users/yourname/.
Reading a Pickle
Pickles are a method of storing a data in non-volatile memory that represent
an entire Pandas object. When a Pandas object is pickled, it is stored and
recalled exactly the same at the time of pickling. Consider some of the ways
that we can fail to read a CSV file. If the data were created with tabs instead
of commas, it would be possible to read the data incorrectly. Plus, we can’t tell
Pandas that the first column is an index, meaning that anyone that imports
the data down the line might create a new index, thus breaking your code. A
pickle avoids these problems by storing the object as a whole, including indices,
column names, and other metadata. This data is read when someone attempts
to read the pickle, and the process of reading the pickle recreates the object
227 10.4. DATAFRAMES
Exercise Questions
These exercise questions cover chapter 10.4.1.
Exercise 154
1. What is a Pandas dataframe?
2. What is a Pandas dataframe made out of ?
3. If you wanted to look at the datatype of a dataframe column, what
object’s datatype are we actually examining?
Exercise 155
1. What is the Pandas method to create a dataframe?
2. If you call this method without passing in any arguments, what is the
size of the dataframe that is created?
3. If you call this method with the columns argument but with no values,
what is the size of the dataframe that is created?
Exercise 156
1. We introduced two methods for reading two different data files. What
are these methods, and what kind of files can they read?
2. If you wanted to read something other than these two data files, what
would you do?
Exercise 157
1. What is the command to print your working directory?
2. What is the difference between a relative and an absolute filepath?
Exercise 158
CHAPTER 10. DATA ANALYSIS WITH PANDAS 228
1. Consider the method to read a CSV file. What is the Pandas method
that reads a CSV file and returns a Pandas dataframe?
2. What is the separator or delimiter character that is used by default in
this method?
3. Pandas dataframes need an index. If you do not specify an index in
when you create a dataframe from a CSV, will Pandas create an index
or will Pandas infer the index column from the data by default?
Exercise 159
1. Consider the fifa.csv file, which is located at
https://pythonforscientists.github.io/data/data/fifa.csv. The dataset con-
tains basic scores for players in the FIFA 19 video game.
2. Read the CSV file into Pandas with the default settings into a variable
called fifa.
3. Take a peek at the dataframe. What column would you use as the index?
4. Re-read the CSV file into Pandas, but this time, tell Pandas that you
want to use that column as the index.
Dataframes Only
The to_csv() method takes one optional argument, which is the file to
229 10.4. DATAFRAMES
write. If no argument is passed in, Pandas will not write any file. It will,
however, return the data in a CSV string.
Let’s consider the skaters object, which we read in as a pickle. Suppose we
wanted to get the form of this data as if it were a CSV without actually writing
out a CSV file. To do this, let’s call the to_csv() method on the skaters
object and put the result into a variable called skaterscsvstring.
Now, let’s suppose that we wanted to save this object in a new file called
skaters.csv as a CSV string. Using the file handling methods that we covered
in Chapter 8, we could now write the skaterscsvstring object that we just
created into a new file called skaters.csv. However, we could also use pass in
the file argument into a new to_csv() call in Pandas.2
So, let’s pass in an argument "skaters.csv" into the to_csv() method.
This time, we are not going to put the return of to_csv() into any object.
Rather, we passed in the argument "skaters.csv", so Pandas will attempt to
write out a CSV file with the contents of the return value on its own. Because
we specified no filepath (relative or absolute), Pandas will save skaters.csv in
our current working directory.
2 This is generally regarded as good practice, since naively writing a file is more error-prone
Permissions Pitfall
Exercise Questions
No exercise questions exist for this section.
Pickling an Object
Like writing a CSV, we may also want to write a pickle. Pickles are only
readable by Pandas, but they have some advantages over reading a plaintext
file. For one, picklefiles contain more than just the data. They also contain
data about the data, like what the indices are, the datatypes, and the shape of
the data. If we wanted to share a Pandas object with someone else who was
also using Pandas, we can write a picklefile to save that object in the exact
state that we left it.
In this section, we will cover four main methods of viewing our data: viewing
231 10.4. DATAFRAMES
the data itself, viewing the data’s description, viewing a subset of the first few
rows, and viewing a subset of the last few rows.
If we know that the size of our dataset is managable, it may make sense
to just print the entire dataset. The easiest way to do this is just by using
the naive print() function in Python. When we print a Pandas dataframe,
the dataframe is organized and tabulated before being printed, so it looks like a
plaintext table. When we print just a dataframe, Pandas will return every single
row, but it might not return every single column. If you have more than just
a few columns, Pandas won’t be able to print all of them, so it will abbreviate
by using an ellipses to represent rows that aren’t printed.
However, more often than not, we will be working with large datasets that
don’t make sense to print the entirety of. For example, let’s consider the
skaters dataset that we pulled in previously. As a review, we pulled this
dataset into Python as a pickle using this code.
1 skaters = pd . read_pickle ( " https : / / p yt h o n fo r s ci e n ti s t s .
github . io / data / data / skaters . pkl " )
Now that we’ve imported the data, let’s look at what shape this dataframe is
in. Conveniently, Pandas has an attribute called shape that we can call on
dataframe objects that will give us the shape, or dimensions, of our dataframe.
Cast your mind back to when we covered classes, and make sure
you’re not confusing your methods and your attributes. Meth-
ods are used to process data within a class, while an attribute
is a feature of an object. In this case, ”shape” is an attribute
of the DataFrame class, not a method, so it doesn’t get any
parentheses.
(943 , 65)
This gave us the shape, or dimensions, of our dataframe. In this tuple, the
first figure represents the number of rows, and the second figure represents the
number of columns. We can see that in the skaters dataset, we have 943 rows
across 65 columns.
However, this doesn’t actually give us what the column names are. But,
guess what! There’s another attribute of the DataFrame class that has the
names of all of the columns, and it’s named columns (shocker, I know!).
1 print ( skaters . columns )
CHAPTER 10. DATA ANALYSIS WITH PANDAS 232
So now, we know what our columns are, but what types are they? We could
call the type() function on an individual data cell in a column (which we’ll
cover how to do in a later section), or we could just look at the datatype that
column is. In Pandas, we can do that by looking at the dtype attribute of the
dataframe, which will give us the datatype of all of the columns.
sid int64
player object
playerid object
age int64
conference object
...
blocks int64
hits int64
fowin int64
foloss int64
fopct float64
Length : 65 , dtype : object
Because of how many columns we have, Pandas has truncated our output to
only include the first and last five columns. Since we know the names of our
233 10.4. DATAFRAMES
columns (thanks to our columns attribute), we can also look at the datatype
of an individual column.
1 print ( skaters . dtypes [ " ff " ])
dtype ( ’ int64 ’)
This is how Pandas represents the data, but it might not be how Python rep-
resents the data. For example, let’s look at what type the player column is.
We would expect this to be a string. It contains entries like "Michael Amadio"
and "Connor McDavid". However, when we call our dtypes attribute for that
column, we don’t get that it’s a string.
1 print ( skaters . dtypes [ " players " ])
dtype ( ’0 ’)
str
We’ll get into more depth on data location in the data modification section,
but for your purposes now, all you need to know is that we’re trying to get
the type of the element from the skaters dataframe in the "player" column
in the 0th row. Now that we know what the datatypes are, let’s actually take
a look at some of this data. Remember, this is a large dataset, so we don’t
necessarily want to print all of it to the console. Thankfully for us, Pandas has
two methods for dataframes that can give us the first n rows and the last n
rows: head() and tail().
Pandas borrows from the Unix commands to view the first and
last lines of a text file, which also use head and tail.
We can call the head() and tail() methods on a dataframe object to view
the first or last five rows in a dataframe, respectively.
1 print ( skaters . head () )
CHAPTER 10. DATA ANALYSIS WITH PANDAS 234
[5 rows x 65 columns ]
[5 rows x 65 columns ]
The default value for the head() and tail() methods is 5, meaning that by
default, calling either method will return 5 rows. However, we can also pass in
a value for n that will return n rows. This is the only attribute that the head()
or tail() method can take.
1 print ( skaters . tail (10) )
Exercise Questions
These exercise questions cover chapter 10.4.2.
235 10.4. DATAFRAMES
Exercise 160
Exercise 161
Recall
We can read the CSV from the Internet using the read_csv method. This
CSV is formatted as most CSVs are, meaning that we don’t need to pass any
extra arguments into the read_csv method: strings are in quotes, the delimiter
is a comma, and there is a header.
1 hockey = pd . read_csv ( " https : / / p yt h onf o r sc i e nt i s ts .
github . io / data / data / hockey . csv " )
2 print ( hockey )
When we print the data, we can see that the last player listed is Brayden Tracey,
and we notice that the data are organized by the player’s last name. The hockey
buffs among you may have noticed one player’s data is missing: Trevor Zegras.3
So, let’s add a new row for Zegras’s data. His data are provided below for your
reference.
3 This missing data is by design. Don’t worry, Python and Pandas did its job.
CHAPTER 10. DATA ANALYSIS WITH PANDAS 238
Column Value
player "Trevor Zegras"
role "Off"
position "C"
gamesplayed 43
goals 12
assists 21
fowin 145
foloss 204
Consider how we adding data to lists. As it turns out, Pandas has its own
concat() method for the DataFrame class that will add an object of type dict
(a dictionary) to a Pandas dataframe, where the keys of the dictionary align
with the column names in the dataframe. So, let’s make a dictionary with
Zegras’s data.
1 zegras = {
2 ’ player ’: [ ’ Trevor Zegras ’] ,
3 ’ role ’: [ ’ Off ’] ,
4 ’ position ’: [ ’C ’] ,
5 ’ gamesplayed ’: [43] ,
6 ’ goals ’: [12] ,
7 ’ assists ’: [21] ,
8 ’ fowin ’: [145] ,
9 ’ foloss ’: [204]
10 }
Observe how the values in our dictionary are actually lists. This is because the
concat() method allows us to add multiple rows. If Sebastian Aho were to join
the Anaheim Ducks and we wanted to add him to our dataframe, we could put
his data alongside Trevor Zegras’s data.
1 # NOT RUN
2 zegras = {
3 ’ player ’: [ ’ Trevor Zegras ’ , ’ Sebastian Aho ’] ,
4 ’ role ’: [ ’ Off ’ , ’ Off ’] ,
5 ’ position ’: [ ’C ’ , ’F ’] ,
6 ’ gamesplayed ’: [43 , 43] ,
7 ’ goals ’: [12 , 21] ,
8 ’ assists ’: [21 , 27] ,
9 ’ fowin ’: [145 , 299] ,
10 ’ foloss ’: [204 , 268]
11 }
For the sake of demonstration, we won’t create the dictionary with Sebastian
Aho, only the one with Trevor Zegras. Now that the zegras object has Trevor’s
239 10.4. DATAFRAMES
data, we can use the concat() method to concat this dictionary to the hockey
dataframe.4
The differences between the list.append() and the Series.concat() meth-
ods also exists for the DataFrame.concat() method. Like when appending to
a series, we need pass the return of the concat() method back into the object.5
1 hockey = pd . concat ([ hockey , pd . DataFrame ( zegras ) ] ,
ignore_index = True )
2 print ( hockey . tail () )
Now that we put the return of the concat() method into the hockey object,
hockey was updated with the appended data and now has Trevor Zegras’s data.
Now, let’s say that we wanted to calculate the faceoff percentage, which is
calculated as nfaceoff wins /(nfaceoff wins + nfaceoff losses ). We have the arithmetic
skills to calculate this for any one player, but how can we make a new column
called fopct that has the faceoff percentage for that player?
The naive method to add a new column is to assign a list to an empty
column. So, let’s make a list of everyone’s faceoff percentage. We’ll have to
be careful when doing our division, since some of the players have never had a
faceoff, so we would get a divide by zero error. We’ll just assign their percentage
to 0% automatically.
As you might have guessed, our best method for getting all of the data is
to iterate over the rows of the dataframe. However, iterating over a dataframe
isn’t as simple as iterating over a list or a dictionary. The following, while it
looks right, is not syntactically correct.
1 # DO NOT RUN
2 for element in hockey :
3 print ( hockey [ element ][ " player " ])
You’ll end up with a KeyError, since the dataframe, by itself, is not iterable.6
Instead, we have to turn our dataframe into a series of series, which are iterable.
Luckily for us, Pandas comes with a method for doing so. This method can
4 Pandas also has a method called append() for series and dataframes that uses different
syntax from concat(). The append() method has been deprecated, so you should avoid using
it and switch to using concat(). You may still find documentation for append() out in the
wild, but official Pandas documentation has that append() has been deprecated in bright red.
Just because it works doesn’t mean that you should use it.
5 Assume that the zegras dictionary object still exists here.
6 Depending on Pandas version, you may get a TypeError instead, stating that the
Simon Benoit
Sam Carrick
Max Comtois
Nicolas Deslauriers
Jamie Drysdale
Cam Fowler
Ryan Getzlaf
Derek Grant
Benoit - Olivier Groulx
Brendan Guhle
Adam Henrique
Max Jones
Bryce Kindopp
Jacob Larsson
Vinni Lettieri
Hampus Lindholm
Isac Lundestrom
Josh Mahura
Josh Manson
Mason Mctavish
Sonny Milano
Danny O ’ Regan
Greg Pateryn
Jacob Perreault
Rickard Rakell
Buddy Robinson
Kevin Shattenkirk
Jakob Silfverberg
Sam Steel
Troy Terry
7 From here on out, we will assume that you have already added Trevor Zegras’s data back
Brayden Tracey
Trevor Zegras
Now that we can iterate through a dataframe, let’s grab the fowin and foloss
columns, calculate the faceoff win percentage, and put the result into a Pandas
series for us to add into our dataframe later. Our series will use default indices.
1 fopct = pd . Series ()
2 for index , row in hockey . iterrows () :
3 if row [ " fowin " ] + row [ " foloss " ] = = 0:
4 fopct = pd . concat ([ fopct , pd . Series ([0 . 0]) ] ,
ignore_index = True )
5 else :
6 fopct = pd . concat ([ fopct , pd . Series ([ row [ " fowin " ]
/ ( row [ " fowin " ] + row [ " foloss " ]) ]) ] ,
ignore_index = True )
Recall that we need to put an extra check for players who have never faced off
(otherwise, we’ll end up with a divide-by-zero error), hence the if statement.
Now, we can add the series values to the dataframe by assigning it to a new col-
umn in the dataframe. To make a new column, we can simply call a nonexistent
column, and Pandas will make it on-demand.
1 hockey [ " fopct " ] = fopct . values
Note how because we’re adding a series, which is a compound datatype with
some more abstraction than a list, we need to explicitly say that we want
the values out of the fopct series object. Remember, series have a lot more
data than just a list, like an index. If we try to just put the series into a
dataframe column without saying that we want the values, we’ll end up with
a TypeError, since Pandas won’t know what part of the series we want to put
into the dataframe column.
Exercise Questions
These exercise questions cover chapter 10.4.3.
Exercise 162
1. Consider the fifa.csv file, which is located at
https://pythonforscientists.github.io/data/data/fifa.csv. The dataset con-
tains basic scores for players in the FIFA 19 video game.
2. Print just the names, separated by a newline, using dataframe row iter-
ation.
3. Print the names and the vision score, with each name separated by a
new line, using dataframe row iteration.
CHAPTER 10. DATA ANALYSIS WITH PANDAS 242
Exercise 163
1. Consider the cereal.csv file, which is located at
https://pythonforscientists.github.io/data/data/cereal.csv. The dataset
contains price data for wheat, rice, and corn from 1992 to 2021.
2. Read the CSV file into Pandas with the default settings into a variable
called cereal.
3. What is the last row of the cereal dataframe? Hint: Print the dataframe
using negative indexing.
4. 2021 is missing the December entry. Add the following data for each of
the columns.
Year: 2021
Month: Dec
Price wheat ton: 332.06
Price rice ton: 400.0
Price corn ton: 264.54
Inflation rate: -1.29
Price wheat ton infl: 327.78
Price rice ton infl: 394.84
Price corn ton infl: 261.13
column fopct.
243 10.4. DATAFRAMES
The drop() syntax is very similar to the concat() syntax. We have a label
that corresponds to an index or a column label, and we have an axis which
corresponds to whether we want to delete a row (0) or a column (1). The
default value for the axis is to drop a row, so if we only wanted to remove
Benoit’s observation from our dataset, we would only have to pass in the index
number 0.
[5 rows x 9 columns ]
(31 ,9)
player role ... foloss fopct
1 Sam Carrick Off ... 90 0.497207
2 Max Comtois Off ... 15 0.482759
3 Nicolas Deslauriers Off ... 3 0.250000
4 Jamie Drysdale Def ... 0 0.000000
5 Cam Fowler Def ... 0 0.000000
[5 rows x 9 columns ]
(30 ,9)
After running the drop() method, we dropped Simon Benoit’s observation and
the size of our dataframe decreased by one.
CHAPTER 10. DATA ANALYSIS WITH PANDAS 244
The process to drop a column looks remarkably similar. We will still use
the drop() method on a dataframe. This time however, we will pass in the
name of the column that we want to specify as a string, and we’ll specify that
we want to delete on the column axis by specifying the axis argument to be 1.
Let’s say that we wanted to delete the fopct column that we created earlier.
1 print ( hockey . columns )
2 hockey = hockey . drop ( " fopct " , axis = 1)
3 print ( hockey . columns )
Exercise Questions
These exercise questions cover chapter 10.4.4.
Exercise 164
1. Consider the cereal.csv file, which is located at
https://pythonforscientists.github.io/data/data/cereal.csv. The dataset
contains price data for wheat, rice, and corn from 1992 to 2021.
2. Read the CSV file into Pandas with the default settings into a variable
called cereal.
3. Remove every row from before the year 2000.
4. Remove the Inflation_rate column.
245 10.4. DATAFRAMES
Exercise 165
1. Consider the cycling.csv file, which is located at
https://pythonforscientists.github.io/data/data/cycling.csv. The dataset
contains race results for the 2021 Tour of Flanders cycling race.
2. Read the CSV file into Pandas with the default settings into a variable
called tourflanders21.
3. Remove all of the rows where the rider last name starts with A. The
rider column is formatted as LAST First. Hint: Use string subsetting
to determine whether the first letter of the rider column is exactly equal
to A.
str
What we did here was actually look at the type of a cell’s data using Python.
If we were to remove the type() function, we would just get the data that is
stored in that cell.
1 print ( skaters [ " player " ][0])
’ Calen Addison ’
This is the most primitive method of selecting data, and we can actually use
this to edit this one cell. Let’s break down how we’re referencing this cell.
There are three components to a dataframe cell call: the dataframe, the
column, and the index. To reference a cell, we provide the dataframe on its own,
then the column name (provided as a string) surrounded in square brackets,
then the row index in another set of square brackets. When we provide the row
index, our index must match whatever Pandas acknowledges the index to be.
CHAPTER 10. DATA ANALYSIS WITH PANDAS 246
Most of the time, the index will be numeric, but if you manually specified that
you wanted to use a custom index when you declared your dataframe, you must
continue to use this index.
So, let’s look at the skaters dataframe object again. If we wanted to get
the player ID of the player at index 80 (the 79th entry), we would use the
following.
The object at this location is just like any other object in Python, so if we want
to output it, we need to print it. So, let’s put the result into a print statement.
boqvije01
Let’s do this again. This time, let’s print the number of hits that the player at
index 100 scored.
The player at index 100 is Connor Brown. Let’s say that we want to update
the value for hits for Brown. This is as simple as reassigning the value at that
location using the assignment operator, just as if the value was in a nested list.
10
Equipped with our data modification skills, we can actually edit entire columns.
For example, let’s say we wanted more accurate faceoff percentages (beyond
float 1). We have the integer values for fowin and foloss, so we can calculate
the win percentage fairly easily.
With that, we have just modified an entire column of data by calculating that
value naively.
One of the major strengths of Python is its ability to clean data. The very
structure of Python means that it’s great at very program-oriented tasks, such
as procedurally cleaning and sorting data.
The simplest of data cleansing tasks is data normalization. Normalization
is the process of making all of the data of a certain variable into a similar form.
For example, if you had a variable called ”age” that contained both integer and
floating point values, then you might want to normalize the data so that all of
the data is an integer (if you only care about the integer value of the age) or so
that all of the data is a float (if you need more specificity).
Data normalization can clarify confusing things in your data. Let’s go back
to our ”age” variable. Let’s say you had the value 7 as one of the values. Does
this mean 7 years exactly or some time between 7 and 8 years? Rather, we
could specify that the value should be 7.0 - this much more clearly indicates
that we’re looking at exactly 7 years, not the latter.
Using your data modification skills, you can now do things like typecast
cells in a Pandas dataframe.
Exercise Questions
These exercise questions cover chapter 10.4.5.
No exercise questions exist for this section yet.
CHAPTER 10. DATA ANALYSIS WITH PANDAS 248
[8 rows x 51 columns ]
The keen-eyed among you have already noticed that the return
type of describe() is a dataframe when we called the method
on a dataframe. That means that we can use naive methods to
grab any one of the cells out of this dataframe.
We do know that there are other columns hidden behind those ellipses,
and we can view the summary statistics for any column by simply calling the
describe() method on that column. When we call a column from a dataframe,
Pandas returns a series, and we are running the describe() method on that
series to return another series with the summary statistics.
count 943.000000
mean 4.774125
std 5.789375
min 0.000000
25% 0.000000
50% 3.000000
75% 7.000000
max 33.000000
Name : goals , dtype : float64
CHAPTER 10. DATA ANALYSIS WITH PANDAS 250
The keen-eyed among you have already noticed that the return
type of describe() is a series when we called the method on a
series. That means that we can use naive methods to grab any
one of the cells out of this dataframe.
This is nicely arranged data, but if we only wanted to get the median out of
this, we’d have to pull the median out of the series that Pandas returned from
the describe() method. This isn’t impossible, and it looks like this.
1 print ( skaters [ " goals " ] . describe () [ " median " ])
3.0
However, this just isn’t very readable. We’re taking the value of a non-numeric
index ("median") from the return of the describe() method which is being
run on a single column "goals" as a series inside of the dataframe skaters.
Phew! Thankfully, Pandas has a few methods that’ll help us out. For all of the
metrics above, there is a corresponding method that returns only that value as
a floating-point value. That means that instead of calling describe(), then
picking what we need out of that series, we can just get the value by itself to
begin with.
Let’s start with mean, median, and standard deviation. For these, the
methods correspond to the names of the indices in the above returned series.
1 print ( skaters [ " goals " ] . mean () )
2 print ( skaters [ " goals " ] . median () )
3 print ( skaters [ " goals " ] . std () )
4.7741251 325556 74
3.0
5.7893753 493354 61
The same exists for the count (n), minimum, and maximum.
1 print ( skaters [ " goals " ] . count () )
2 print ( skaters [ " goals " ] . min () )
3 print ( skaters [ " goals " ] . max () )
To get the quantiles, we need to specify exactly which quantiles we want. The
most common quantiles to request are the 25% and 75% quantiles, since the
difference between the two makes up the interquartile range. To get the quan-
tiles, we can use the quantile() method from Pandas on a series. We can pass
in a list of quantiles that we want for as many quantiles as we want.
251 10.5. CALCULATING SUMMARY STATISTICS
0.25 0.0
0.50 3.0
0.75 7.0
Name : goals , dtype : float64
Exercise Questions
These exercise questions cover chapter 10.5.
Exercise 166
1. Consider the fifa.csv dataset, which is located at
https://pythonforscientists.github.io/data/data/fifa.csv. The dataset con-
tains basic scores for players in the FIFA 19 video game.
2. What is the mean and median of the vision column?
3. What is the lowest strength score?
4. What is the highest agility score?
5. Who is the player with the lowest acceleration score?
6. Who is the player with the highest speed score?
CHAPTER 10. DATA ANALYSIS WITH PANDAS 252
Prerequisites
We want to import a specific subclass from the statsmodels library, since it’s
quite large and we don’t want to have to type out the parent classes every
single time. The proportions_ztest() method is inside of the proportion
class, which is inside of the stats class, which is inside of the statsmodels
class. If we import the deepest class, we won’t have access to any of the higher
classes, but if we don’t need them, we can avoid typing them out all the way.
1 from statsmodels . stats . proportion import
pro portio ns_zte st
What’s for?
We didn’t cover the NumPy arrays much, but you can think of them as
NumPy-fied lists or as a less sophisticated version of a Pandas series.
Pandas’s Parents
For our proportion test, there are two arguments that the proportions_ztest
will take: count and nobs. count is the number of ”successes” for each inde-
pendent sample (so in this case, it would be the number of people in the group
whose favorite Pokémon is Pikachu. nobs is the number of observations. Op-
tionally, we can also pass the value, which is the value of the null hypothesis
9 For this, let us assume that the conditions for normality, independence, and sample size
0.3164 37 84 24 57 76 82
If you are switching to Python, make sure you use ’two-sided’, ’smaller’, or ’larger’
instead!
11 Like with the Z-test, we will assume that the conditions for independence, sample size,
and equal variance have been met. We will check the normality condition using a normal QQ
plot below.
255
10.6. BASIC HYPOTHESIS TESTS WITH STATSMODELS (OPTIONAL)
And again, our method will be imported from a specific subclass in the statsmodels
library. The ttest_ind() method is inside of the weightstats class, which is
inside of the stats class, which is inside of the statsmodels class.
1 from statsmodels . stats . weightstats import ttest_ind
For our T-test, there are two required arguments: x1 and x2. Each of these are
vectors corresponding to one of our independent groups. That is, x1 will have
the data for one helmet, and x2 will have the data for the other helmet. We
can also specify an alternative hypothesis out of ’two-sided’, ’larger’, and
’smaller’.
For this test, we will be using the alternative hypothesis that the mean for
x1 (the old helmet) is greater than the mean for x2 (the new helmet), so we
will specify the ’larger’ alternative hypothesis.
Our proportion test is being run under the null hypothesis H0 that the mean
G-force between the old helmet and the new helmet are the same versus the
alternative hypothesis HA that the mean G-force on the old helmet is higher
than the mean G-force on the new helmet.
Like the Z-test for differences in proportions, the ttest_ind() method re-
turns multiple objects. However, because we are running a T -test, we have
degrees of freedom for our test statistic, so the method returns a tstat, a
pval, and df. The tstat object is a float with the test statistic, and it is
interpreted using the df, which is an int or float with the degrees of freedom
used in the t-test. The pval is the precomputed probability that the means are
the same.
Now, let’s run the independent T-test.
1 tstat , pval , df = ttest_ind ( helmets [ " helmet1 " ] ,
helmets [ " helmet2 " ] , alternative = ’ larger ’)
2 print ( pval )
3 print ( tstat , df )
0.049345648191665814
1. 68 39 15 39 33 10 83 15 48.0
Exercise Questions
There are no exercise questions for chapter 10.6.
CHAPTER 10. DATA ANALYSIS WITH PANDAS 256
The most glamorous part of statistics is probably the graphs. So far, almost
everything that we’ve done has been done in the console, and while it’s great
for data density, it doesn’t look great if we’re being perfectly honest. However,
we’re about to leave the realm of the console and start to play with a new library:
Matplotlib. This library will introduce us to tools that will open windows with
graphs, since trying to view graphs in a terminal just isn’t fun.
Matplotlib’s Etymology
The subplots call returns two objects, which we’ve placed into fig1 and ax1.
The fig1 object is a Figure object, and this is a top-level container for all of
our plot elements. We won’t touch the Figure object right now. The ax1 object
is an array of axes, and this is what we’ll be operating on to make a pie chart.
Next, we need to create our list of values. Matplotlib will calculate the
percentages for us if we give it a list of values that correspond to the relative
size of each of our pie wedges. So, let’s create a list of values.
1 sizes = [10 , 33 , 9 , 31]
257 10.7. BASIC GRAPHS WITH MATPLOTLIB
The pie chart method is pie, and it can be called on a subplot object. So, let’s
make this subplot object. The subplot object is simply a place where plots can
be made.
Now, let’s actually generate a plot. There are three parts to generating a
pie plot: figuring out the sizes and generating the shapes, fixing the shapes,
then showing the shapes. These are done with the pie() method on the Axes
object, then the axis() method on the Axes object, then the show() method
on the plt object. Every time we want to create a plot, we’ll call some method
on part of our subplots objects, then we’ll update the display by calling the
show() method.
Plotting Pitfall
If we didn’t run the fixing step (the axis() method), we’d end up with
a lopsided plot, and the axis() method helps us by fixing the directions and
aligning all of the wedges with each other.
CHAPTER 10. DATA ANALYSIS WITH PANDAS 258
To help us read our pie plot a little easier, we can have Matplotlib insert per-
centages on each of the slices by specifying a format for the autopct parameter
in pie(). This parameter uses POSIX escapes, and here, we want to specify
how many digits we want, along with a percent sign. We’ll use the argument
%1.1f%%, which will give us at least the ones place (the %1, a decimal point (.),
at least one decimal place (1f), and an escaped percent sign (%%).
1 fig1 , ax1 = plt . subplots ()
2 ax1 . pie ( sizes , labels = labels , autopct = ’ %1 . 1 f %% ’)
3 ax1 . axis ( ’ equal ’)
4 plt . show ()
Pie plots are great for a quick comparison, but they’re not very useful outside
of just exploring data. Why don’t we look at a histogram instead. Histograms
provide more information on the distribution of the data, which is a very useful
259 10.7. BASIC GRAPHS WITH MATPLOTLIB
feature to statisticians.
Let’s consider the skaters dataset once again. What does the assists
distribution look like compared to the goals distribution? We could run a
statistical test to compare distributions, but we can also use our eyeballs to see
what the data look like. Histograms bin data, then count frequencies within
that bin and display that as a bar.
Let’s start by pulling the skaters dataset in. It is provided below for your
convenience.
Like when we generated pie plots, we will need to generate subplots, but instead
of the pie() method on the Axes object, we will use the hist() method. Let’s
make our subplots, then making our histograms for the goals and assists
columns.
We also threw in another method on ax1 and ax2 in this code chunk:
set_title(), which predictably, sets a title for the plot. It takes a single
string argument.
This gives us some idea of what our data looks like, but what if we could
view our two plots side-by-side? We can do so by specifying four parameters.
The first two parameters are the shape that we want in rows, then columns. We
want one row and two columns, so we’ll specify the first two arguments to be 1
and 2. the parameter sharey to be True when we create our subplot. In this
case, we only want one subplot, since both of our histograms will be displayed
by ax1. We will also specify that we want to crunch the plots together by
specifying the tight_layout parameter to be True.
We can also specify the number of bins that we want to use by specifying
the bins argument in the hist() method. bins takes an integer value.12
1 fig1 , ax1 = plt . subplots (1 , 2 , sharey = True ,
tight_layout = True )
2 ax1 [0] . set_title ( " Goals " )
3 ax1 [0] . hist ( skaters [ " goals " ])
4 ax1 [1] . set_title ( " Assists " )
5 ax1 [1] . hist ( skaters [ " assists " ] , bins = 10)
Histograms are great for getting an idea of the shape of our distribution, but
if we want to look at the data compared to the normal distribution, a better plot
is a normal Q-Q plot. The normal Q-Q plot is a scatterplot created by plotting
sample quantiles against theoretical quantiles from the normal distribution. If
12 In the below plot, we’ll only set bins for the Assists plot for the sake of demonstration.
CHAPTER 10. DATA ANALYSIS WITH PANDAS 262
the sample quantile points are also normal, the line formed will be roughly
straight, but if the sample quantile points are not normal, the line formed will
not be straight and may have bends, curves, tails, and other abnormalities.
In Python, we’ll use a combination of tools from the statsmodels library
and the matplotlib library. Let’s consider the helmets data from earlier in
this chapter. As a convenience, the data import line is provided here.
1 helmets = pd . read_csv ( " https : / / p yt ho nf o r s ci e n ti s t s .
github . io / data / data / helmets . csv " )
Then, we can use the qqplot() method from the api class inside of statsmodels.
Then, we’ll show the plot that qqplot() made using Matplotlib’s show()
method.
Because we have a non-zero mean, it makes sense for us to use a standardized
line instead of a 45-degree line. The standardized line is created by scaling the
263 10.7. BASIC GRAPHS WITH MATPLOTLIB
expected order statistics by the standard deviation of the sample and having
the mean added to them. If know your data are centered at zero, then you can
also use ’45’ instead of ’s’.13
1 helmet1norm = statsmodels . api . qqplot ( helmets [ " helmet1 "
] , line = ’s ’)
2 plt . show ()
3 helmet2norm = statsmodels . api . qqplot ( helmets [ " helmet2 "
] , line = ’s ’)
4 plt . show ()
But what if our data are not normal? Let’s consider the assists variable
from the skaters dataset. The histogram looks like the following.
13 If you create normal Q-Q plots in R, the default is to create a standardized line.
CHAPTER 10. DATA ANALYSIS WITH PANDAS 264
When we create our normal Q-Q plot, it severely twists away from the
standardized line.
1 assistsnorm = statsmodels . api . qqplot ( skaters [ " assists "
] , line = ’s ’)
2 plt . show ()
Exercise Questions
There are no exercise questions for chapter 10.7.
Chapter 11
As we begin to work with Pandas and data, so will the need to scrape data
emerge. Data scraping is the act of pulling data off of a webpage without
having to manually copy that data over. Instead, scraping data allows you to
automate the process for a relatively low investment. Since you are now familiar
with Pandas, you can use the full power of its data structures, plus the power
of Beautiful Soup to scrape webpages efficiently.
265
CHAPTER 11. REQUESTS, WEB SCRAPING, AND BS4 266
11.1.1 HTML
The most obvious part of is its nested nature. Objects in HTML are nested
inside of larger objects, and these larger objects can dictate how a sub-object
is rendered. HTML describes the structure of pages using this nesting concept.
In the above markup code, take a look at the tags, written in blue. The tags
include things like <p> and <body>. Notice how all of the tags start with a < and
end with a >. Also observe how every tag that is created has a corresponding
tag at the end. This is called a closing tag, and it can be distinguished because
the first character after the opening left angle bracket < is a forward slash:
</body>, </p>, and </h3>. All of the text is shown in black.
The above code renders to the following:
Elements are usually made up of two tags: an opening tag and a closing tag.
Each tag tells the renderer something about the information that sits between
each of the tags. You can think of a tag like a container. The material that falls
between the opening tag and the closing tag is put inside of the container that
the tag created, and it is rendered according to the rules that the tag dictates.
Some common tags include:
CHAPTER 11. REQUESTS, WEB SCRAPING, AND BS4 268
Tag Description
<html> Encloses the entire HTML document
<head> Encloses special document details
<body> Encloses the actual rendered document
<h1> through <h6> Headers 1 through 6
<i> Italicize
<b> Bold
<a> Hyperlink
<img> Image
<ul> Unordered (bullet pointed) List
<ol> Ordered (numbered) List
<li> List element
<table> Encloses a table’s contents
<thead> Encloses a table header
<tbody> Encloses a table body
<tr> Table row
<td> Table data, or one cell in a table
Not all elements have two tags, and that’s okay. However, these are special
cases. For example, the <input> and <br> elements only have one tag: their
opening tag. This is because all of the information that is needed to render the
tag are included in the tag’s attributes. HTML tag attributes give additional
information to the renderer about how the tag should be rendered, what the
tag is, and if there’s any data that needs to be dealt with. Attributes can be
placed on any element, regardless of whether the tag has a closing tag or not,
but they can only be placed on opening tags, never on closing tags.
There are some common attributes, including id, class, and name. How-
ever, as you can see in the above HTML code snip, you don’t strictly need to
include any attributes.
Exercise Questions
These exercise questions cover chapter 11.1.1.
No exercise questions exist for this section yet.
Minification
When you write inline CSS, you are doing something called
”minification”. Unlike Python, CSS doesn’t rely on new lines
to tell when one statement has ended and another has begun.
Instead, it uses semicolons. So you can just stack a bunch of
style rules next to each other, separated by semicolons.
In CSS, every group of rules is placed inside of a CSS class, which can be
applied to HTML elements using the class attribute. If all of the elements on
your page are styled in a similar way, it might be possible to select what you
need using the class attribute.
Exercise Questions
With that, it might seem like we’d only want to use GET requests. However,
the inner workings of modern web frameworks are much more complicated, and
they often involve asking for specific amounts of data. It’s a negotiated process:
you might ask for the format of the data, the web server responds with the
format, you then ask if you need permissions and the web server responds with
a yes, so you send the authentication token and the web server asks you what
data you want, you finally send a request for what you want and the web server
responds with the data. In this highly simplified scenario, we’ve already placed
eight requests. Modern webpages might involve dozens of requests!
The requests library abstracts all of this for us. We don’t need to know
all of the details of which requests to place and what kind of data to expect in
response, since requests already does this for us.
address.
The only required parameter is url, which is almost always passed in as the
first positional argument. The url parameter is of type string, so we can just
pass in a string literal. In an address, there is also something called a query
string. Not all addresses have a query string, but query strings can deliver state
information to a page. You can tell if an address has a query string by if it has a
question mark ?. The question mark indicates the beginning of a query string,
and a query string can contain multiple queries, separated by ampersands &.
Every query string has a key and a value (just like a Python dictionary), which
are separated by an equal sign.
Let’s look at an address with a query string.
https :// www . nba . com / stats / player /203999? SeasonType =
Playoffs & PerMode = Per48
Can you find the question mark? Everything after the question mark is part
of the query string, and the query string lasts from the question mark to the
very end of the address. In the address above, there are two query strings:
SeasonType and PerMode. Nothing before the question mark is part of the
query string, including what falls after the slash (in this case, 203999). In this
address, we see that the keys are SeasonType and PerMode, and the values are
Playoffs and Per48, respectively.
URL Encoding
Python will encode any special characters using their ASCII format if it is
needed to send the query string.
The get() method also supports many more parameters, including
allow_redirects, which allows you to disable redirects (the default is True),
auth and cert, which are for HTTP authentication or specifying a certificate
file or key, cookies for specifying a dictionary of cookies to use when making
CHAPTER 11. REQUESTS, WEB SCRAPING, AND BS4 272
the request, proxies for proxy servers, stream for whether the response data
should be downloaded immediately or streamed (the default is to download
immediately), timeout for whether the client should wait for a response, and
verify for certificate verification. For the most part, you can leave all of these
parameters set to their default by not specifying the parameter in the function
call.
One argument that you may want to specify is header, which sends HTTP
headers to send to the specified URL. We’ll cover HTTP headers in a couple of
sections.
Let’s consider the following URL, which we’ll make a GET request to.
There are no query strings in this address, so we’ll just make our request as-is
after importing the requests library. We’ll put the result of our request into a
variable called response.
1 import requests
2 response = requests . get ( " https : / / bundesliga . com / en /
bundesliga / table " )
If our request was successful, we should be able to print response and see the
response code.
1 print ( response )
A 200 response code means that everything is good, and the response was
successful. There are several response codes that you should recognize if you
plan on doing a lot of scraping.
200: OK
410: Gone
503: Unavailable
273 11.1. WEBPAGE STRUCTURE
If we look at the type of our response variable, it’s not what you might
expect.
1 print ( type ( response ) )
utf -8
200
https :// www . bundesliga . com / en / bundesliga / table
<! DOCTYPE html > < html lang =" en " > < head > < meta charset ="
utf -8" > < title > Bundesliga | Table | 2021 -2022 </ title
> < script type =" text / javascript " > let
s hou ld Us eD ar kT he me ="0"; const availibleThemes =["
light " ," dark "]; localStorage && availibleThemes .
includes ( localStorage . getItem (" bl - force - theme ") ) ?
s hou ld Us eD ar kT he me =" dark "=== localStorage . getItem ("
bl - force - theme ") ?"1":"0": window . matchMedia && window .
matchMedia ("( prefers - color - scheme : dark ) ") . matches
&&( sh ou ld Us eDa rk Th em e ="1") , window . document .
documentElement . setAttribut
For the bulk of the work that we’ll be doing with the requests library, you’ll
want the text attribute, since it contains the actual contents of the webpage
that we want to parse in a string.
CHAPTER 11. REQUESTS, WEB SCRAPING, AND BS4 274
HTTP Headers
We promised that we’d go over what HTTP headers are, and we have gotten
to that point at long last! Some web servers have more aggressive checking to
stop robots from accessing their contents. The more sophisticated web servers
will actually reject traffic from clients that don’t meet certain requirements,
including a valid header.
The header contains information about the requestor, or your computer. It
doesn’t contain any information about you, but rather about the equipment that
you’re using. This information is most often used to give you the proper data
for your platform. For example, you know how when you download a piece of
275 11.1. WEBPAGE STRUCTURE
software, sometimes the website can detect whether you’re on Windows, macOS,
or Linux and offer you the correct download package for your computer? The
website is actually reading the header from the request.
Headers contain information on the request and the client that the request
is coming from. This information includes:
Whether the request is a GET or POST request
What the filepath is on the remote server
What the scheme is (typically HTTP or HTTPS for web traffic)
What kind of data is accepted (like whether the client will accept certain
types of images)
What the accepted encodings are (like UTF-8)
The system language
The operating system
The web browser
The user agent
Depending on the website, you might need to pass certain header parameters
with your request. The most common header parameter is the user agent, since
this is the most often checked by a remote server to validate regular traffic.
The user agent header bundles the application, operating system, vendor, and
version numbers of the requesting user agent. Reading the user agent string
helps remote sites identify whether the device is a mobile phone, tablet, desktop,
or even a TV before any data is sent back.
The Mozilla Firefox query string looks like this for Firefox version 47.0 on
a modern version of macOS that’s running an Intel processor.
Mozilla /5.0 ( Macintosh ; Intel Mac OS X x . y ; rv :42.0)
Gecko /20100101 Firefox /42.0
The Google Chrome query string looks like this for Chrome version 99 on a
modern version of Windows.
Mozilla /5.0 ( Windows NT 10.0; Win64 ; x64 ) AppleWebKit
/537.36 ( KHTML , like Gecko ) Chrome /99.0.4844.51
Safari /537.36
Exercise Questions
These exercise questions cover chapter 11.1.3.
No exercise questions exist for this section yet.
277 11.2. PARSING WITH BEAUTIFUL SOUP
The next step is to strain the soup and find things that we want out of the
soup. We do this by calling the find() method on the soup object that
BeautifulSoup() returned. The find() method looks for HTML keywords
(like <td> or <option>.
The find() method can find the high-level objects, but it has trouble finding
children that match the given criteria, so we then use the find_all() method
on the PageElement object that find() returned.
The find_all() method returns all of the matching strings, so the next
step would be to clean up the matching strings using stripping methods and
regular expression matching, which we covered in Chapter 3.4.
Exercise Questions
These exercise questions cover chapter 11.2.1.
No exercise questions exist for this section yet.
CHAPTER 11. REQUESTS, WEB SCRAPING, AND BS4 278
If we grab just the first 500 characters of our data, we can see what form it’s
in.
1 print ( response . text [:500])
<! DOCTYPE html > < html lang =" en " > < head > < meta charset ="
utf -8" > < title > Bundesliga | Table | 2021 -2022 </ title
> < script type =" text / javascript " > let
shou ld Us eD ar kT he me ="0"; const availibleThemes =["
light " ," dark "]; localStorage && availibleThemes .
includes ( localStorage . getItem (" bl - force - theme ") ) ?
shou ld Us eD ar kT he me =" dark "=== localStorage . getItem ("
bl - force - theme ") ?"1":"0": window . matchMedia && window .
matchMedia ("( prefers - color - scheme : dark ) ") . matches
&&( sh ou ld Us eDa rk Th em e ="1") , window . document .
documentElement . setAttribut
It looks like we have a valid response, so let’s move on. We start by soupifying
our text using the lxml parser.
1 soup = BeautifulSoup ( response . text , ’ lxml ’)
Note that for our argument with the response, we want to give Beautiful Soup
the text attribute, not the response. The response itself is just a 200 code,
which isn’t useful to Beautiful Soup’s parser.
The soup() method returns a BeautifulSoup object that can then be
searched through to find a table, then table rows, then table cells. We’re also
going to set a condition that the class must match ’table’. We start by finding
the table.
1 table = soup . find ( ’ table ’ , attrs = { ’ class ’: ’ table ’ })
Now that we have a table, we can find all of the rows inside of this table using
the find_all() method. Had we used find_all() on the entire document, it
would have likely found non-matches or at the very least been slower.
1 rows = table . find_all ( ’ tr ’)
279 11.2. PARSING WITH BEAUTIFUL SOUP
This has returned all of the rows in the table and placed the result inside of the
rows variable. Now, we can find the cells. However, continuing to use the same
find_all() method will detabulate our data, since there are multiple cells per
row. Instead, we’re going to split up our table into rows and sort our rows
piecewise. Let’s start by making a place for our data called data.
1 data = []
Now we can iterate through all of our rows and pull out the cells. We’ll use a
nested loop to pull out the contents of our cells within our rows. Essentially,
we want to iterate over our rows, and in each row iteration, we want to iterate
through each cell horizontally.
Why are we going this way? Recall that HTML tables are rows
of individual cells, not columns of rows. So, we have to go this
way. If we want to get all of just one column, we have to parse
through the entire sheet, then grab that column from the final
dataframe.
1 for tr in rows :
2 td = tr . find_all ( ’ td ’)
3 row = [ tr . text for tr in td ]
4 data . append ( row )
Now that we have a list of lists, we can put the results into a Pandas dataframe,
specifying the source data and the columns when we make the dataframe. It’ll
be easier to drop rows from our dataframe than to drop every useless cell one-by-
one, so let’s shove everything into a dataframe with some placeholder columns,
then drop those columns and print the final dataframe.
1 bundesliga = pd . DataFrame ( data , columns = [ ’ drop1 ’ , ’
rank ’ , ’ drop2 ’ , ’ drop3 ’ , ’ team ’ , ’ drop4 ’ , ’ played ’ ,
’ points ’ , ’w ’ , ’d ’ , ’l ’ , ’ goals ’ , ’ goaldiff ’ ])
2 bundesliga = bundesliga . drop ( " drop1 " , axis = 1)
3 bundesliga = bundesliga . drop ( " drop2 " , axis = 1)
4 bundesliga = bundesliga . drop ( " drop3 " , axis = 1)
5 bundesliga = bundesliga . drop ( " drop4 " , axis = 1)
6 bundesliga = bundesliga . drop (0 , axis = 0)
7 print ( bundesliga )
Exercise Questions
These exercise questions cover chapter 11.2.2.
No exercise questions exist for this section yet.
CHAPTER 11. REQUESTS, WEB SCRAPING, AND BS4 282
11.3 APIs
An application programming interface, or API, returns formatted data
based on a request that is made to it. While on the surface, an API call looks
like any other web resource call, its return is very different, typically an XML
or JSON form. You can experiment with open APIs using an API testing tool,
like Postman. Postman can also help you form authentication strings, if your
API of choice requires authentication.
If an API will give you the information you need, use this instead, since it’s
much faster to develop for.
There is an API for everything, and many of these APIs are free (though
some still require you to get a free authentication key)! Here, we’ll look at
Animechan, an API that returns a random quote from a list of animes. Ani-
mechan doesn’t require any authentication, so we can simply make a request to
https://animechan.vercel.app/api/random. If you were to just type this into a
web browser, you’d get a response in the form of JSON, which can be parsed
using the JSON library in Python. We’ll cover this in the next section.
Here’s an example response (responses are random, but you should get a re-
sponse with the same form).
{" anime ":" Naruto " ," character ":" Kiba Inuzuka " ," quote ":"
Akamaru , what ’ s wrong boy ? Have you forgotten my
scent ? We ’ ve always been together haven ’ t we ? We
grew up together . Akamaru please , somewhere in
there , there has to be a part of you that remembers
. Show me that you remember . AKAMARU ! Forgive me .
Can you ? I know that I ’ ve brought you nothing but
pain and suffering . I broke my word . I swore I ’ d
always protect you . Akamaru I ’ m sorry . Sorry I wasn
’ t a better master . I ’ m here . Here for you . Forever
."}
Observe in the request how we added the .json() method to the end of the
requests.get() statement. Again, because we know that the API returns its
data as a JSON object, we can just tell Python that we want to automatically
typecast the data into a Python dictionary. The requests library will process
the response as a JSON and parse it into a Python dictionary. This saves us a
step down the line, explicitly parsing a JSON string into a Python dictionary.
However, we still included this as an option below, since sometimes, JSON
string parsing using the requests library doesn’t work as we might expect.
We printed the data in its dictionary form, hence why we got the curly
braces, keys, and values. Just like with any other Python dictionary, we can
use all of our regular methods on dictionaries on this one. In our case, the
dictionary is stored in a variable response.
1 print ( response [ " anime " ])
2 print ( response [ " character " ])
Naruto
Kiba Inuzuka
3 print ( response )
This time, our response is left in the string form that it was given to us by the
API. We can parse this using the loads() method in the json library. loads()
typically only takes one argument, the string with the JSON-formatted string.
1 response = json . loads ( response )
2 print ( type ( response ) )
3 print ( response )
We can now see that the type is no longer str but is a dictionary object instead.
The key with the load() and loads() methods is that they work on any
JSON-formatted data, not just data that comes from a web request. For exam-
ple, consider if we imported a JSON as a file and read it into a string variable,
as we showed in Chapter 8. We could then use the loads() method to parse
this string into a Python dictionary.
Exercise Questions
These exercise questions cover chapter 11.3.
No exercise questions exist for this section yet.
Labs
This chapter contains labs and rubrics for each lab. Your instructor might
assign certain labs to you and not others.
Any modifications that your instructor makes should take precedence over
the lab provided here. You should take care to follow your course’s style guide, if
your instructor has one. This style guide should give you important information
regarding naming conventions, line spacing, whitespace, and other notes like
these. Remember to follow your course’s style guide!
285
LABS 286
Task
You will be guided through this lab step-by-step.
Lab Preparation
1. In another tab, go to colab.research.google.com. If this is your first time
using Google Colab, you’ll probably see a Welcome to Colaboratory doc-
ument. You can just ignore this. Instead, within your web browser, go to
File ¿ New notebook. This will make a new Jupyter Notebook and open
it for you. You’ll do your lab in this notebook.
2. Now, at the top of the notebook, you should see that your document is
probably named Untitled0.ipynb. This is normal, and you can change
the name of your notebook by just clicking on the document’s name. Go
ahead and change it to your username, followed by Lab 0.ipynb.
3. Hint: Your Jupyter Notebook must always end in .ipynb for it to be recog-
nized as an actual Notebook. ipynb stands for Interactive PYthon Note-
Book.
Markdown
The default first block of your new Jupyter Notebook is a code block. However,
we actually don’t want to use a code block - we want to use a markdown or
text block. So, hover over the existing code block and at the very right side, in
the menu that shows up, click on the Trash can icon. This will delete the code
block.
Now, at the top, underneath the menu bar, you probably see two buttons:
one to add code and one to add text.
Click on the button to add a new text block.
i. Text
When you add text, even if you add Python code, it cannot and will not
be executed.
287 LABS
ii. Links
Great! The next most important thing that you’ll need to do is put in
links. There’s only one way to put in links for Jupyter Notebooks:
If you’re not sure what to put inside of the brackets, you can always just
put the URL in there. That might look like this:
Hint: Why don’t we specify the http: or https: at the beginning? We’ll let
our web browser figure that out for us. Instead, just put two slashes and
the markdown renderer will figure out which protocol to use.
Your task: Make a new text block. In this text block, make a link that
goes to your favorite YouTube video, with the link label stating the name
of the video. Put this in a new text block in your lab notebook.
iii. Lists
You’ll probably also have to make lists. Lists are pretty simple, too. You
can create either unordered lists (bullet points) or ordered lists.
Let’s start with unordered lists. To create an unordered list, just put a
dash and a space at the beginning of each new line with new material.
- Item
- Item
- Item
You can also make nested lists. If you are making a nested unordered list,
use plus signs for the next level in, then dashes for the next level in, then
pluses, and so on and so forth. For each level in, add an indent.
- Item
+ Subitem
- Sub - subitem
+ Sub - sub - subitem
- Sub - subitem
+ Sub - sub - subitem
+ Sub - sub - subitem
- Sub - subitem
+ Subitem
+ Subitem
- Item
- Item
Your task: Now, in your notebook, make a new text block. In this new
text block, make an unordered list with the courses you are taking this
semester, and make an ordered list of your top five favorite buildings on
your campus. If you don’t know, just make some buildings up.
iv. Emphasis
It’s often really handy to emphasize something. Here’s how to emphasize
stuff in Markdown.
Emphasis , aka italics , with * asterisks * or
_underscores_ .
Your task: Now, in your notebook, make a new text block. In this new
text block, make some egregious statement like ”The earth is flat” and
strike it out. Then, make a statement on what your favorite candy is, and
bold the candy’s name.
v. Emphasis
Sometimes, you’ll want to put code into your Markdown, like if you want
to show what function you’re defining or let your reader know what your
variable is named.
If you are highlighting code inline (within paragraph text), then you can
just use a backquote at the beginning and end of your code chunk. For
example this is some code. The backquote character is at the top-left
corner of your keyboard, next to the 1/! key.
Inline ‘ code ‘ has ‘ back - ticks around ‘ it .
Sometimes, you need to list more read-only code than can comfortably
fit inside of an inline code mark. Instead, you can write an inline code
block. This is still markdown, and it’s not executable, but it’ll still have
Python notation highlighted.
‘‘‘ python
s = " Python syntax highlighting "
print s
‘‘‘
Your task: Now, make a fenced code block that contains the following
code, marked down in Python. Add one more line of code, based on
something that you’ve seen in class.
This is the interactive Python shell, and it’s great for running small or large
chunks of Python code. One of the other cool things about the Python environ-
ment is that your variables are maintained throughout your entire notebook.
Now, try to define a new variable students and assign it some integer value.
Now, create a text block explaining where you got the number for students.
Create a new code block (not the code block that you used above) and print
the value of students.
Turn In
Turn in your Jupyter Notebook as an .ipynb file.
291 LABS
Task
Begin with the following code. Confirm that it works correctly.
1 import pandas as pd
2 def main () :
3 ser = pd . Series ()
4 n_ethereum = int ()
5 value_per_coin = float ()
6 total_value = float ()
7 n_ethereum = int ( input ( " Enter the number of
Ethereum in wallet . " ) )
8 print ( ’ You entered : ’ + str ( n_ethereum ) + ’\ n ’)
9
10 value_per_coin = float ( input ( " Enter the dollar
value of one Ethereum . " ) )
11 print ( ’ You entered : ’ + str ( value_per_coin ) + ’\ n ’
)
12
13 total_value = value_per_coin * n_ethereum
14 print ( ’ Total value in wallet is ’ + str (
total_value ) + ’ dollars . ’)
15
16 ser [0] = total_value
17
18 if __name__ = = " __main__ " :
19 main ()
Attempt the following tasks. In between each task, revert your program back
to its original state.
Put an extra space in between pandas and as in line 1.
Remove as pd from line 1.
Remove the opening quote from line 7 inside of the input function.
Replace the opening quote on line 7 with a single quote.
Remove the backslash \ on line 8.
LABS 292
For each task, you will report on what happened and why you think that hap-
pened.
Turn In
You will submit a lab writeup in a Jupyter Notebook. Your lab writeup should
contain the following sections.
Introduction: What were you given? What was the purpose of the lab?
Code Description: What does the code (as given to you) do when it is
executed?
Tasks: What did each of the items for the list of tasks do? What happened
when you made the change as directed? Why do you think that happened?
Conclusion: What did you learn during this lab?
293 LABS
Task
Write a program that will prompt the user for the following information.
The number of hours worked in a week
Turn In
You will submit a lab writeup in a Jupyter Notebook. Your lab writeup should
contain the following sections.
Introduction: What is the program supposed to do? If you were provided
with any code, where did it come from?
Code Description: In your head, break down your program into logical
sections. Then, write a subheading for each of your code’s sections, include
the code (in a code block) what it does, and why you included it.
Issues: Did you experience any issues while writing this lab? If so, what
issues did you run into? If not, what are some issues that you could
foresee another student making, and how did you avoid these issues? If
you were creating this lab, how would you change or improve it?
Completed Program: Include one big code block that contains your pro-
gram in its entirety.
LABS 294
Test Runs: Provide the entire output from your program when you ran
it using the trial data from below.
Conclusion: What did you learn during this lab? How did you apply some
of the skills that you’ve learned to this lab?
Trial Data
Trial Hours Dependents
1 15 1
2 40 4
3 53 3
4 2 5
Sample Output
This sample output is provided to guide you to your solution. You should follow
the instructions provided to include all of the functionality that is shown below.
This program will ask you how many hours you worked ,
and calculate your
taxes , dues , gross pay , and net pay .
Grading Table
Requirement Possible Points
Correct output on required trial data 60
Dollar amounts have the right float value
10
(2 decimal places)
Appropriate code formatting, good use of whitespace 5
Meaningful variable names 5
Descriptive comments at the top 5
Descriptive comments to label sections 5
Jupyter Notebook is constructed well and with care 10
Total 100
297 LABS
Description
For many programming projects in the real world, you’ll be using or working
with code that someone else wrote, rather than writing code from scratch. A
good programmer will be able to intelligently break down someone else’s code,
and if you can read someone else’s bad code, it’ll be a piece of cake to read
someone else’s good code.
For this lab, you will be writing using terrible programming practices, then
you’ll be trying to decipher someone else’s program, who has also used terrible
programming practices.
Task
For this lab, you will need a partner.
On your own (without your partner), write a program that takes a student’s
name, year (as an integer), major(s), minor(s), and dormitory. Optional: make
the majors and minors multiple choice questions. You can write this in any
programming language you’d like as long as it’s Python 3.6. Then, print a
summary of that student using the format: Your name is [name] and you are
a [year]. Your major(s) is in/are [major1] and your minor(s) is in/are [minor1
and minor2]. You live in [dormitory].
Your program should correctly choose whether to use the term ”major” or
”majors,” ”minor” or ”minors,” ”is” or ”are,” and whether you need to use
commas or an ”and” for multiple majors or minors.
Now, here’s the kicker. You know what good programming practices are.
Now, break every good programming practice you can without actually break-
ing valid syntax. That means that your program should properly execute, but
it shouldn’t be readable by a human. Use bad indentation practices (with-
out breaking Python), bad/nonexistent comments, confoundingly constructed
if/then statements and loops, the wrong data structures, obfuscation, complex-
ion and anything else to make your program difficult to read. The only thing
you can’t do is hide a file. All of your script must fall in one file. If anyone
except for you can read your program, you have not done a good job.
Trade programs with your partner. You should’ve left them with a horrible
mess that is virtually unusable. Your partner’s job is simple: comment the
code that you wrote, and write a summary of the code that you wrote, all
without asking you any questions. This summary should contain things like
what datatypes are being used for what variables, the logical flow of the program
(like where decisions are being made), and how this code might be fixed (though
you don’t actually have to fix it).
LABS 298
Turn In
You will submit this program inside of a Jupyter Notebook. You should include
both your’s and your partner’s script (unmodified) inside of code blocks. You
should also include your partner’s commented and annotated script, as well as
your best summary of their code.
Trial Data
Trial Name Year Majors Minors Dormitory
1 Alice Alexander Sophomore Economics Statistics Pless Hall
McAlester
2 Ben Loch Junior Env. Science
Apartment
Biology, Computer Lewis
3 Sam Williams Senior
Chemistry Science Dormitory
Grading Table
Requirement Possible Points
Correct output on required trial data on your script 20
Inappropriate code formatting,
bad use of whitespace in your script 30
Poor variable names in your script 5
Poor commenting or no comments,
other challenges to readability in your script 5
Excellent commenting of partner’s code 10
Correct description and interpretation of partner’s code 20
Jupyter Notebook is constructed well and with care 10
Total 100
Note: This means that you will receive points for writing bad (but syntactically
correct) code yourself and for properly analyzing and breaking down your partner’s
bad code.
299 LABS
However, seven figure displays cannot display every character. For example, there
is no way to display the letter g without it being confused for a 9. Seven-figure displays
are not large enough to display the letter m, either. Your goal for this lab is to find
the longest word that can be displayed by a seven-segment display.
Task
Use the requests library to download the words.txt file from
https://pythonforscientists.github.io/data/data/words.txt. This file contains a list of
words of varying length, organized alphanumerically. You will use Python to find what
the longest word out of this list is that can be displayed on a seven-figure display.
Seven-figure displays cannot display the following letters.
Turn In
Turn in a program inside of a Jupyter Notebook. You should include your script, along
with the longest word that you found. Include downsides of the approach that you
chose and how you might choose to change things if you were to write your program
again.
Trial Data
You should use the words.txt file.
LABS 300
Grading Table
Requirement Possible Points
Correct longest word 10
Appropriate whitespace and formatting 10
Good variable names 5
Good comments and inline documentation 5
Downsides identified and explained 10
Jupyter Notebook is constructed well 10
Total 50
301 LABS
Lab 5: Wordle
Description
Wordle is a simple word guessing game. Your user will be guessing words that are
five letters long. They have six chances to get the correct word, otherwise they lose.
If they guess a correct letter in any one of the five slots, it is typically marked yellow,
and if they guess a correct letter in the correct position, it is typically marked green.
Try the Wordle here.
Task
You will program a simplified Wordle, since programming with color is much more
difficult. Use the requests library to get the wordle.txt file from
https://pythonforscientists.github.io/data/data/wordle.txt. The wordle.txt file has
5,757 lower-case words that are five letters long.
At random, pick a word, but hide it from the user. Prompt the user to guess what
the word is. If they guess a letter in the correct place, display it in its place in a
five-letter blank. For example, if they incorrectly guessed herby but the third letter
is r, then display the r to confirm the user’s guess is correct.
__r__
If they guess a letter, but it’s not in the correct place, display the letter next to the
five-letter blank. For example, if they incorrectly guessed beret but third letter is r
and the fourth letter is t, then display the r to confirm that letter is correct and the
t next to the blanks to confirm that the letter is correct, but in the wrong place.
__r__
In the wrong spot - t
Guess :
If the user guesses the secret word in six tries or less, then they win and the game
ends.
Guess : yurts
yurts
You found the word in 4 guesses ! Congratulations !
If the user runs out of guesses, then they lose and the game ends.
Guess : stone
_____
You ran out of guesses ! The word was yurts . Better luck next
time !
If the user tries to enter a word that’s not on the list of words (like ’abcde’ or ’xxxxx’),
tell the user that the word isn’t a valid word, but do not deduct any guesses.
1 Guess : abcde
2 That ’ s not a word . No guesses used . 6 guesses remain .
LABS 302
Sample Output
This sample output is provided to guide you to your solution. You should follow the
instructions provided to include all of the functionality that is shown below.
Guess : deans
_____
In the wrong spot : a , e , s
Guess : cleat
__ea_
In the wrong spot :
Guess : wreak
__ea_
In the wrong spot : r
Guess : smear
smear
You found the word in 5 guesses ! Nice !
Guess : spare
s_are
In the wrong spot :
Guess : scare
scare
You found the word in 3 guesses ! Nice !
Guess : abcde
That ’ s not a word . No guesses used . 5 guesses remain .
Guess : scare
scare
303 LABS
Guess : deans
_____
In the wrong spot : a , e , s
Guess : cleat
__ea_
In the wrong spot :
Guess : wreak
__ea_
In the wrong spot : r
Guess : mount
_____
In the wrong spot : m
Guess : swear
s_ear
You ran out of guesses ! The word was smear . Better luck next
time !
Bonus
For bonus points, have the program ask the user whether they’d like to play again. If
so, you should keep track of these statistics for the session:
What percentage of the time have they gotten the word on their first, second
third, fourth, fifth, and sixth try?
How many times have they played?
How many times have they won?
What’s their win percentage?
After every round, display these statistics to the user. This bonus section is worth
20 extra points, but it must be completed in entirety to be eligible for the bonus. A
partially functional bonus is not worth anything.
Turn In
Turn in your program in a Jupyter Notebook, along with at least four trial runs.
Include at least one win and one loss, as well as what happens when an invalid word
is entered. Also include a small conclusion on what you would change about your
program if you were to write it again. That is, how would you rearrange your program?
LABS 304
Grading Table
Requirement Possible Points
Correctly functioning program 20
Appropriate whitespace and formatting 20
Good variable names 20
Good comments and inline documentation 20
Program refactoring explained 10
Jupyter Notebook is constructed well 10
Optional bonus section 20
Total 100/120
End Matter
Conclusion
With this book, you have only started to scratch the surface of what is possible with
Python. It is an incredibly versatile and powerful programming language with many
features that will prove useful to you as you embark further on your programming
journey. Furthermore, it prepares you to learn other programming languages. If
you decide to pick up a language like C++, Swift, JavaScript, PHP, or any other
language, you’ll immediately begin to notice similarities. Sure, the syntax is different,
but the ideas and structures that exist in Python also exist in nearly every other
programming language. Now that you know what a variable is, what the datatypes
are, how a function works, and other concepts like these, you are well equipped to
adapt this knowledge to new languages: a function in C++ fundamentally does the
same thing as a Python function, even though they might look different.
Programming is useful for other reasons, too. For one, it teaches you how to be
a critical thinker and problem solver. With your programming mindset, you have
become tuned to hunting down and fixing issues and developing cohesive solutions to
complicated problems, and those skills are increasingly important in our modern age
of information. Even if you don’t choose to continue programming formally, consider
continuing to work on these skills, and consider using programming as a means to
achieve great things.
305
END MATTER 306
Further Projects
As mentioned, we have only scratched the surface of Python’s immense capabilities.
Here are some projects to try and exercise your newfound Python skills, as well as
some new things to explore.
Code a Tic-Tac-Toe game in the Python console. You’ll have to figure out how
to make your program interactive.
Learn how to use Turtle. Turtle is a graphics library that allows you to draw
using Python.
Learn how to use Tkinter (after Turtle). Tkinter will allow you to develop
desktop applications using Python.
Learn how to use Flask. Flask is a web-server designed for Python, and it’ll
allow you to flex your new HTML and CSS skills.
Code a Towers of Hanoi simulation. This is a great way to learn how to use
recursion.
Learn JavaScript or R. All are interpreted languages (like Python), but they use
a different syntax. Each of these languages have advantages and disadvantages,
but either will make you a better programmer and problem solver.
Learn Java, C, C++, C#, Swift, or Objective-C. All are compiled languages
and they use a similar syntax, but they have some advantages over Python.
Learning either will make you a much better programmer.
Learn how to build models using Scikit-Learn.
307 END MATTER
Capitalization Conventions
Variables and objects: camelCase (count, numRuns)
Functions: snake case (sum, sum_of, get_result)
Classes: CapCase (GameScore, Runs)
Constants: ALLCAPS (PI, FIELDLENGTH)
Whitespace
Use whitespace wisely. Remember, whitespace takes the form of both horizontal
whitespace (spaces and indentation) and vertical whitespace (blank lines). Both too
much and too little whitespace make your source code difficult to read.
Leave one space around initializations and boolean operators.
1 runs = 1 # Good
2 if ( runs >= 10) : # Good
3 runs = 3 # Bad
Observe how the equal sign in line 1 is surrounded by spaces. This is an example
of space around initialization.
Also observe how the greater than/equal to sign in line 2 is surrounded by spaces
without a space between components of the boolean operator. This ensures that the
syntax is correct for the entire boolean operator (the >= is one unit, not a separate >
and =) while still providing adequate whitespace. This is an example of space around
a boolean operator.
Also leave space before and after comment demarcations, as shown in lines 1-3.
The comment demarcation in Python is a #, and there is a space before and after.
Leave an extra space between function arguments. Do not leave an extra space
before or after function parentheses.
1 atlRuns = GetRuns ( ’ ATL ’) # Good
2 ariWins = GetWins ( ’ ARI ’ , ’ away ’) # Good
3
4 bosRuns = GetRuns ( ’ BOS ’ ) # Bad , too much space around args
5 chiWins = GetWins ( ’ CHI ’ , ’ home ’ ) # Bad , too much space
around args
6 dalWins = GetWins ( ’ DAL ’ , ’ away ’) # Bad , no space between args
END MATTER 308
Indentation
In connection with whitespace, make sure you follow indentation conventions for your
language. Python enforces indentation, so make sure you use consistent indentation.
Indent using one tab, which should indent two spaces.
Indent anything nested, including function contents, logic statement bodies, loops,
and nested objects (mainly arrays, lists, and dictionaries).
Do not put a space before a colon in a conditional or logic statement.
Soft-wrap lines in your editor, not by manually splitting a line into multiple lines.
Not everyone’s editor window size and font size is the same as yours.
309 END MATTER
Errata
No errata exist for the previous edition.
END MATTER 310
Index
311
INDEX 312
object code, 24
object-oriented programming, 174
regex, 116
wildcard, 117
software, 23
source code, 24
statement, 15, 20
statements, 16
strongly typed, 28
weakly typed, 28
whitespace, 15, 54–57, 59, 81, 85
line breaks, 15
spacing, 15
313 INDEX
Sejin Kim ’22 was a student at Kenyon College studying scientific computing. He
served as a lead tutor for introductory programming courses in the Department of
Scientific Computing and is now a systems analyst using Python. He can be contacted
at kim3[AT]kenyon.edu.
INDEX 314
Version Number: 0.1.9040
License: MIT License
Citation: The suggested BibTeX entry is as follows.
@book{scientificpython,
title = {Python for Scientists, version 0.1.9040},
author = {Kim, Sejin},
edition = {Zeroth},
year = {2022},
address = {Gambier},
}
INDEX 2
Python for Scientists
Sejin Kim