0% found this document useful (0 votes)
9 views209 pages

Part 1 Fundamentals Python for Data Science

The document outlines a course on Python fundamentals for data science, covering topics such as Python syntax, data structures, and data analysis using Pandas. It emphasizes the importance of Python in data analytics and includes course objectives, evaluation methods, and installation instructions for Python 3 and Anaconda. The course aims to equip students with practical skills to write Python programs for data-related tasks.

Uploaded by

Anh Vũ Nguyễn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views209 pages

Part 1 Fundamentals Python for Data Science

The document outlines a course on Python fundamentals for data science, covering topics such as Python syntax, data structures, and data analysis using Pandas. It emphasizes the importance of Python in data analytics and includes course objectives, evaluation methods, and installation instructions for Python 3 and Anaconda. The course aims to equip students with practical skills to write Python programs for data-related tasks.

Uploaded by

Anh Vũ Nguyễn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 209

lOMoARcPSD|28737988

Part-1-fundamentals-python-for-data-science

Information Systems Concepts (Unitec Institute of Technology)

Studocu is not sponsored or endorsed by any college or university


Downloaded by Anh V? Nguy?n ([email protected])
lOMoARcPSD|28737988

Python for data science

Dieudonné TCHUENTE
PhD. Senior IT/Data Consultant & Big Data Architect

[email protected]
Ass Professor in Computer Science and Big Data

www.tbs-education.fr

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Motivation: incredible growth of python!

Video Link:https://www.youtube.com/watch?v=7Hll55GCyvI

2 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Main uses of python : data analytic …


https://www.quora.com/What-are-the-top-Python-trends-of-2019

3 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

A big picture: top librairies for data science in python

Data Visualization Data Analysis Machine Learning


and Exploration and statistics

Optimization
Deep Learning and scientific
Scipy
computing

Fundamentals
In this course…
Syntax and Data structures: files, lists,
strings, dictionaries, tuples, etc.

4 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Course Outlines

Part I – Python Fundamentals


- Introduction
- Understand Python Syntax (Variables, Expressions, Statements, Conditional
Execution, Loops and Iterations, Functions)
- Understand Python Data Structures (Strings, Files, Lists, Tuples,
Dictionaries)
- Applied them with real word Use Cases

Part II – Data Analysis with Pandas


- Create, Load and inspect data with Pandas Dataframes
- Modify Dataframes and apply functions
- Aggregate data from Dataframes
- Visualize data from Dataframes with Seaborn
- Applied them with real world Use Cases

5 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Course Objectives

By the end of this class, you will know how to:

- Write a complete python program to answer a business question


- Import existing data into the python environment
- Perform data cleaning using Python
- Perform data transformation using Python
- Perform data exploration using Python

This is useful because:

- Nowadays, Python is the most used language for data analysis in industry
- Data preparation accounts for about 80% of the work of data scientists…

6 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Evaluation

Principle:
• For learning a new programming language, you need to be curious (search in
documentations, forums, …), collaborate and practice, practice, practice…

Group project at the end of the course (50%)


• Goal: solve a real word use case problem with notions learned in class

Final Exam MCQ (50%): to be confirmed !


• Goal: validate the practice of examples seen in class by each student

7 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Introduction

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Python: a programming language

• First version in 1991

• Easy to learn even for non-programmers (intuitive)

• Free and opensource

• Multi-platform (Windows, Linux, Mac, Android, PC, Tablet,


Smartphone, …)

9 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Python: a programming language

• Interpreted language (no compilation)

• Wide Support (very large users communities)

• Wide adoption for data analytics and big data analytics

• Current version 3.x (not compatible with versions 2.x)

10 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Motivation: incredible growth of python!

11 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Python vs R : Both for data analytic, but…

R Python
Objective Statistics General purposes
Data Analysis Data Analysis
Deployment and production
Primary users Scholar and R&D Programmers and developers
Learning Curve Difficult at the beginning Linear and smooth
Popularity 4.23% in 2018 21.69% in 2018

Python

Job opportunities trend


R

12 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Python 2 vs Python 3

• Python 2 released in 2000, newest version 2.7


(will not be maintained after 2020 !),
not forwards-compatible

• Python 3 released in 2008, newest version


3.8.0 (future of python)

• Python 3 adoption is growing quickly

 We use python 3 in this course

13 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Python 3 Installation (with Anaconda)


• Anaconda is the world’s most popular Python distribution platform. When you install
Anaconda, it will install Python and many other useful librairies and tools that will help
you to easily develop python programs.

• To install the individual edition of Anaconda, move to:


https://www.anaconda.com/products/individual at the bottom of the page, download
the graphical installer for your system (e.g. Windows or Mac OS)

• After download, execute the installer file and follow the steps to install it on your
computer (follow the provided installation guide of the course)

14 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Python 3 command line


• For Windows

• For Mac OS: python3 --version (for the version) and python3 (to
launch the interpreter)
15 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Python Interactive Shell

• Interactive Python is good for experiments and programs of 3-4 lines long
• Most programs are much longer, so we type them into a file and tell Python to run
the commands in the file
• In a sense, we are “giving Python a script”
• As a convention, we add “.py” as the suffix on the end of these files to indicate
they contain Python

Interactive vs Script
• Interactive: You type directly to Python one line at a time and it responds
• Script: You enter a sequence of statements (lines) into a file using a text
editor and tell Python to execute the statements in the file

16 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Python in an IDE
• Many IDE (Integrated Development Environment) for editing Python
code files : PyCharm, Spyder, PyDev, Atom …
• In this course we use Spyder (you have it by default after installing
Anaconda)

17 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Python in an IDE

• Create a new file : File  New File


• In the file, add the python instruction: print(‘Hello World from a
python file’)
• Save the file in a directory with the name: hello.py and run the file !

18 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Part 1: Python Fundamentals

www.tbs-education.fr

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Variables,
Expressions, and
Statements

Chapter 1

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Constants

• Fixed values such as numbers, letters, and strings, are called


“constants” because their value does not change

• Numeric constants are as you expect

• String constants use single quotes (‘) or double quotes (")

21 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Variables
• A variable is a named place in the memory where a programmer
can store data and later retrieve the data using the variable
“name”
• Programmers get to choose the names of the variables
• You can change the contents of a variable in a later statement

X = 12.2 X 12.2 12.2100

Y = 14
Y 14

X = 100

22 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Python Variables Names Rules

• Must start with a letter or underscore _


• Must consist of letters, numbers, and underscores
• Case Sensitive

Good: spam eggs spam23 _speed


Bad: 23spam #sign var.12
Different: spam Spam SPAM

23 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Reserved Words

• You cannot use reserved words as variable names / identifiers

24 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Sentences or lines

x=2 Assignment statement


x=x+2 Assignment with expression
print(x) Print statement

Variable Operator Constant Function

25 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Assignment Statement

• We assign a value to a variable using the assignment statement (=)


• An assignment statement consists of an expression on the
right-hand side and a variable to store the result

X= 3.9 * X * (1 – X)

26 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Numeric Expressions
• Because of the lack of mathematical symbols on computer
keyboards - we use “computer-speak” to express the classic math
operations
• Asterisk is multiplication
• Exponentiation (raise to a power) looks different than in math

Operator Operation
+ Addition
- Substraction

* Multiplication

/ Division
** Power
% Remainder

27 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

What does « type » means ?


• In Python variables, literals, and constants have a “type”
• Python knows the difference between an integer number and a
string
• For example “+” means “addition” if something is a number and
“concatenate” if something is a string (concatenate means put
together)

28 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Type matters
• Python knows what “type” everything is

• Some operations are prohibited

• You cannot “add 1” to a string

• We can ask Python what type something is by using the type()


function

29 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Several Types for Numbers


• Numbers have two main types
Integers are whole numbers:
-14, -2, 0, 1, 100, 401233
Floating Point Numbers have decimal parts:
-2.5 , 0.0, 98.6, 14.0
• There are other number types (they are variations on float and
integer)

30 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Types conversions
• When you put an integer and floating point in an expression, the
integer is implicitly converted to a float

• You can control this with the built-in functions int() and float()

31 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Integer Division
• Integer division produces a floating point result

• This was different in python 2.x (e.g. 9/2 = 4)

32 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

String Conversion

• You can also use int() and float() to convert between strings and
integers
• You will get an error if the string does not contain numeric
characters

33 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

User Input

• We can instruct Python to pause and read data from the user
using the input() function
• The input() function returns a string

34 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Converting User Input


• If we want to read a number from the user, we must convert it
from a string to a number using a type conversion function

• Later we will deal with bad input data

35 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Comments in Python
• Anything after a # is ignored by Python
• Why comment?
o Describe what is going to happen in a sequence of code
o Document who wrote the code or other ancillary
information
o Turn off a line of code - perhaps temporarily

36 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Summary

• Types
• Reserved words
• Variables
• Operators
• Integer Division
• Conversion between types
• User input
• Comments (#)

37 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Exercise

Write a program to prompt the user for hours and rate per hour to
compute gross pay. Write this program using a file named pay.py and
execute it.

An output can be:

Enter Hours: 35
Enter Rate: 2.75

Pay: 96.25

38 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

39 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

40 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

What will be the output after the print statement ?

A) Hello1
B) Hello 1
C) A TypeError

41 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

We are using Python 3, what is the type of a ?

A) An integer
B) A String
C) A floating point number
D) A List

42 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Conditional
Executions

Chapter 2

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Conditional Steps

Output :
Smaller than 10
Finish

44 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Comparison Operators

• Boolean expressions ask a question and Python Meaning


produce a Yes or No result which we use to
control program flow < Less than
• Boolean expressions using comparison <= Less than or
operators evaluate to True / False or Yes / No Equal to
• Comparison operators look at variables but do
not change the variables == Equal to

>= Greater
than or
Equal to
> Greather
than
!= Not Equal

Remember: “=” is used for assignment

45 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

One way decisions


Output :
Before 5

Is 5
Is Still 5
Third 5
Afterwards 5
Before 6

Afterwards 6

Nested Block with indentation (4 spaces),


automatic in Spyder Editor after a “:”
You will get an error or anormal execution
in case of bad indentation

46 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Indentation

Good indentation Bad indentation

Output : Output :
Bigger than 2
Still bigger print('Still bigger')
Done with 2 ^
IndentationError: unindent does not
match any outer indentation level

47 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Nested Decisions

Output :
More than one
Less than 100
All done

Try also with x = 101 …

48 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Two Ways Decisions with else:

Output :
Bigger
All done
Try also with x = 1 …

49 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MultiWays Decisions with elif and else:

Output :
Medium
All done
Try also with x = 1 and x = 11 …

50 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MultiWay Puzzles

• Which will never print regardless of the value for x?

51 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

The Try/Except structure

• What happens in this code ?

Output :
Traceback (most recent call last):
File
"C:\Users\d.tchuente\Documents\code\notry.p
y", line 2, in <module>
istr = int(astr)
ValueError: invalid literal for int() with base
10: 'Hello Bob'

The program stop here

52 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

The Try/Except structure

• You surround a dangerous section of code with try and


except
• If the code in the try works - the except is skipped
• If the code in the try fails - it jumps to the except
section

When the first conversion fails - it


just drops into the except: clause
and the program continues.

When the second conversion


succeeds - it just skips the except:
clause and the program continues
Output :
First -1
Second 123
53 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Sample Try/Except structure

Output :

54 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Summary

• Comparison operators
== <= >= > < !=
• One-way Decisions
• Nested Decisions
• Two-way decisions: if: and else:
• Multi-way decisions using elif
• Indentation
• try / except to compensate for errors

55 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Exercise

Rewrite your pay program using try and except so that your program handles
non-numeric input gracefully.
Write this program using a file named pay2.py and execute it.

An output can be:


Enter Hours: 20
Enter Rate: nine
Error, please enter numeric input

Or :
Enter Hours: forty
Error, please enter numeric input

56 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Exercise 2
Write a program to prompt the user for hours and rate per hour using input to
compute gross pay.
Pay the hourly rate for the hours up to 40 and 1.5 times the hourly rate for all hours
worked above 40 hours.

Use 45 hours and a rate of 10.50 per hour to test the program (the pay should be
498.75).

You should use input() to read a string and float() to convert the string to a number.

Use try and except so that your program handles non-numeric input gracefully.

Write this program using a file named pay3.py and execute it.
An output can be:
Enter Hours: 45
Enter Rate: 10.5
Pay: 498.75

Or :
Enter Hours: forty
Error, please enter numeric input

57 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

58 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

59 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

60 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

61 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Functions

Chapter 3

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Stored (and reused) Steps

Output :
Welcome
D2M
Another Invocation
Welcome
D2M

We call these reusable pieces of code “functions”

63 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Python Functions

• There are two kinds of functions in Python

o Built-in functions that are provided as part of Python -


print(), input(), type(), float(), int() ...

o Functions that we define ourselves and then use

• We treat the built-in function names as “new”


reserved words
(i.e., we avoid them as variable names)

64 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Function Definition

• In Python a function is some reusable code that takes argument(s) as


input, does some computation, and then returns a result or results

• We define a function using the def reserved word

• We call/invoke the function by using the function name, parentheses,


and arguments in an expression

65 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Built in function example


Argument

size = len(“Hello world”)


assignment result

11

66 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Building your own function

• We create a new function using the def keyword followed by optional


parameters in parentheses

• We indent the body of the function after “:”

• This defines the function but does not execute the body of the function

67 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Calling the function

• Once we have defined a function, we can call (or invoke) it as many


times as we like

Definition

Call (Invocation)

Output example : Enter Hours: 45


Enter Rate: 10
Pay: 450.0

68 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Argument

• An argument is a value we pass into the function as its input when we


call the function
• We use arguments so we can direct the function to do different kinds of
work when we call it at different times
• We put the arguments in parentheses after the name of the function: in
case of multiple arguments, they are separated by commas…
• When calling, match the number and order of arguments (or use
argument=value for each argument)

Output :
Pay: 450.0
Pay: 450.0
Pay: 450.0
69 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Argument with default (optional) values

• An argument can have a default value (to use if this argument is not
provided when calling), it is an optional argument

Output :
Pay: 400.0
Pay: 498.75
Pay: 472.5
Pay: 472.5
70 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Return values
• Often a function will take its arguments, do some computation, and return a
value to be used as the value of the function call in the calling expression.
The return keyword is used for this.

The function returns a


value
Calling the function and
assign the returned
value to a variable

Output example :
Enter Hours: 45
Enter Rate: 10
Pay: 450.0
71 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

To function or not to function…

• Organize your code into “paragraphs” - capture a complete thought and


“name it”

• Don’t repeat yourself - make it work once and then reuse it

• If something gets too long or complex, break it up into logical chunks and
put those chunks in functions

• Make a library of common stuff that you do over and over - perhaps share
this with your friends...

72 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Summary

• Functions

• Built-in Functions

• Functions definition and invocation

• Arguments

• Default (optional) arguments

• Functions with return value

73 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Exercise

Write a Python function (named max_of_three) that find and return the Max
of three numbers.

Use for instance 4, 6, -1 as input of this function

74 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

75 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

76 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

77 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

78 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Loops and Iterations

Chapter 4

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Repeated Steps
• Loops (repeated steps) have iteration variables that change each time through a
loop. Often these iteration variables go through a sequence of numbers.

Output :
5
4
3
2
1
Out of the while loop!
Last value of n = 0
80 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

An infinite loop
• What is wrong with this loop ?
• Which code line will never execute ?

81 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Breaking out of a loop


• The break statement ends the current loop and jumps to the statement
immediately following the loop
• It is like a loop test that can happen anywhere in the body of the loop

Output example :

82 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Finishing an iteration with continue


• The continue statement ends the current iteration and jumps to the top of
the loop and starts the next iteration

this means if the first character of line equals # (to be seen later …)

Output example :
83 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Definite loop with for


• Definite loops (for loops) executes an exact number of times (a finite set of
things…)
• Definite loops have explicit iteration variable that change each time
through a loop. These iteration variables move through the sequence or
set.

5
4
Output : 3
2
1
End !
84 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Definite loop use case


• What does this code do ?
• What is the value of the variable largest_so_far at the end ?

Output :

• We make a variable that contains the largest value we have seen so


far. If the current number we are looking at is larger, it is the new
largest value we have seen so far.

85 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Definite loop use case


• What does this code do ?
• What is the value of the variable largest_so_far at the end ?

Output :

• We make a variable that contains the largest value we have seen so


far. If the current number we are looking at is larger, it is the new
largest value we have seen so far.

86 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Definite loop use case

• What does this code do ?


• What is the value of the variable sum at the end ?

Output :

• To add up a value we encounter in a loop, we introduce a sum variable


that starts at 0 and we add the value to the sum each time through the
loop.

87 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Constant « None », « is » and « is not » operator


• What does this code do ?
• What is the value of the variable smallest at the end ?
• None constant is of type NoneType
• None means “no value” (null)

• is operator can be used in a logical


expression
• Implies “is the same as”
• Similar but stronger than
==
• is not is also a logical operator

• We have a variable that is the smallest so far. The first time through the loop
smallest is None, so we take the first value to be the smallest.

Output :

88 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Summary

• While loops (indefinite)

• Infinite loops

• Using break

• Using continue

• None constants and variables

• For loops (definite)

• Iteration variables

• Some loops use cases

89 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Exercise
Write a program that repeatedly prompts a user for integer numbers until
the user enters 'done'. Once 'done' is entered, print out the largest and
smallest of the numbers.
If the user enters anything other than a valid number catch it with a
try/except and put out the message ‘Invalid input’ and ignore the number.
Enter 7, 2, bob, 10, 4, done and match the output below.
Output Example

90 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

91 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

92 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

93 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Strings

Chapter 5

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

String Data Type

• A string is a sequence of characters

• A string literal uses quotes


'Hello' or "Hello"

• For strings, + means “concatenate”

• When a string contains numbers,


it is still a string

• We can convert numbers in a string


into a number using int()

95 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Reading and Converting


• We prefer to read data in using
strings and then parse and convert
the data as we need

• This gives us more control over error


situations and/or bad user input

• Input numbers must be converted


from strings

96 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Looking inside Strings


• We can get at any single character in a string using an index specified
in square brackets
• The index value must be an integer and starts at zero
• The index value can be an expression that is computed

97 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

A Character Too Far

• You will get a python error if you attempt to index beyond the end of a
string
• So be careful when constructing index values and slices

98 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

A Character Too Far

• You will get a python error if you attempt to index beyond the end of a
string
• So be careful when constructing index values and slices

99 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Strings have length

• The built-in function len gives us the length of a string

100 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Looping through Strings

• We can use a definite loop using a for statement

101 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Looping through Strings

• with while statement with for statement (more elegant)

• The iteration variable is completely taken care of by the for loop

102 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Slicing Strings
• We can also look at any continuous section
of a string using a colon operator

• The second number is one beyond the end


of the slice - “up to but not including”

• If the second number is beyond the end of the


string, it stops at the end

103 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Slicing Strings

If we leave off the first number or the last


number of the slice, it is assumed to be the
beginning or end of the string respectively

104 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Using « in » as a logical operator

• The in keyword can also be used to check to see if one string is “in”
another string
• The in expression is a logical expression that returns True or False and
can be used in an if statement

105 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

String Library
• Python has a number of string functions
which are in the string library

• These functions are already built into


every string - we invoke them by appending
the function to the string variable

• These functions do not modify the original


string, instead they return a new string that
has been altered

106 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

String Library
• To get the list of built-in function that apply to variable (the type of
the variable), use dir()
>>> stuff = 'Hello world'
>>> type(stuff)
<class 'str'>
>>> dir(stuff)
['capitalize', 'casefold', 'center', 'count', 'encode',
'endswith', 'expandtabs', 'find', 'format', 'format_map',
'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit',
'isidentifier', 'islower', 'isnumeric', 'isprintable',
'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower',
'lstrip', 'maketrans', 'partition', 'replace', 'rfind',
'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip',
'split', 'splitlines', 'startswith', 'strip', 'swapcase',
'title', 'translate', 'upper', 'zfill']

• The full list of built-in functions (methods) for Strings is


accessible in python documentation:
https://docs.python.org/3/library/stdtypes.html#string-methods
107 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

String Library
• The full list of built-in functions (methods) for Strings is accessible in
python documentation:
https://docs.python.org/3/library/stdtypes.html#string-methods
• Documentation example

108 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

String Library

• Examples

str.capitalize() str.replace(old, new[, count])


str.center(width[, fillchar]) str.lower()
str.endswith(suffix[, start[, end]]) str.rstrip([chars])
str.find(sub[, start[, end]]) str.strip([chars])
str.lstrip([chars]) str.upper()

109 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

String Library (Searching a String)

• We use the find() function to search for a substring within another string
• find() finds the first occurrence of the substring
• If the substring is not found, find() returns -1
• Remember that string position starts at zero

110 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

String Library (Case)

• You can make a copy of a string in lower case with lower() or upper case
with upper()
• Often when we are searching for a string using find() we first convert the
string to lower case so we can search a string regardless of case

111 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

String Library (Search and Replace)

• The replace() function is like a “search and replace” operation in a word


processor
• It replaces all occurrences of the search string with the replacement
string

112 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

String Library (Stripping Whitespaces)

• Sometimes we want to take a string and remove whitespace at the


beginning and/or end
• lstrip() and rstrip() remove whitespace at the left or right
• strip() removes both beginning and ending whitespace

113 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

String Library (Prefixes/Suffixes)

• startswith() and endswith() return a Boolean

114 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

String Library (Prefixes/Suffixes)

• startswith() and endswith() return a Boolean

115 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

String Library (Parsing and extracting)

e.g. Extraxting the host or domain name from an address


atpos =12 sppos=30

From [email protected] Sat Jan 5 09:14:16 2019

116 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

String Library

• String type • String operations


• Read/Convert • String library
• Indexing strings [] • String comparisons
• Slicing strings [2:4] • Searching in strings
• Looping through strings • Replacing text
with for and while
• Stripping white space
• Concatenating strings with +

117 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Exercise

Write code using find() and string slicing to extract the number at the end of
the line below.

text = “X-DSPAM-Confidence: 0.8475”

Convert the extracted value to a floating point number and print it out.

118 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

119 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

120 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

121 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

122 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

123 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

124 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

125 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Files

Chapter 6

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

File Processing
• It is time to go find some Data to mess with!

• A text file can be thought of as a sequence of lines

From [email protected] Sat Jan 5 09:14:16 2008


Return-Path: <[email protected]>
Date: Sat, 5 Jan 2008 09:12:18 -0500
To: [email protected]
From: [email protected]
Subject: [sakai] svn commit: r39772 - content/branches/

Details:
http://source.sakaiproject.org/viewsvn/?view=rev&rev=39772

Download the whole file mbox-short.txt on Campus

127 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Open a File

• Before we can read the contents of the file, we must tell Python which
file we are going to work with and what we will be doing with the file

• This is done with the open() function

• open() returns a “file handle” - a variable used to perform operations


on the file

• Similar to “File -> Open” in a Word Processor

128 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Using open()

fhand = open('mbox.txt', 'r')

• handle = open(filename, mode)

• returns a handle we will use to manipulate the file

• filename is a string

• mode is optional and should be 'r' if we are planning to read the file
and 'w' if we are going to write to the file (by default mode is ‘r’)

129 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

When files are missing…

A FileNotFoundError is raised …

130 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

The newline Character

• We use a special character called the “newline” to indicate when a


line ends
• We represent it as \n in strings
• Newline is still one character - not two

131 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

File processing
• A text file can be thought of as a sequence of lines
• and has newline at the end of each line !

From [email protected] Sat Jan 5 09:14:16 2008 \n


Return-Path: <[email protected]> \n
Date: Sat, 5 Jan 2008 09:12:18 -0500 \n
To: [email protected] \n
From: [email protected] \n
Subject: [sakai] svn commit: r39772 - content/branches/\n
\n
Details:\n
http://source.sakaiproject.org/viewsvn/?view=rev&rev=39772 \n

• If we access this line for example, the corresponding string length will
be 9 (not 8)

132 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

File Handle as a Sequence

• A file handle open for read can be treated as a sequence of strings


where each line in the file is a string in the sequence
• We can use the for statement to iterate through a sequence
• Remember - a sequence is an ordered set

133 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Counting lines in a file

• Open a file read-only


• Use a for loop to read each line
• Count the lines and print out the number of lines

Output :
Line Count: 132045

134 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Searching through a file

• For example, we can put an if statement in our for loop to only print
lines that meet some criteria

135 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Searching through a file


• For example, we can look for a string anywhere in a line as our
selection criteria (lines containing the keyword nakamura)

136 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Searching through a file


• What does this code do ?

Output Example

137 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Summary

• Opening a file - file handle


• File structure - newline character
• Reading a file line by line with a for loop
• Searching for lines
• Reading file names
• Dealing with bad files

138 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Exercise

Write a program that prompts for a file name, then opens that file and reads
through the file, looking for lines starting with the form:
X-DSPAM-Confidence: 0.8475
Look in the file mbox-short.txt for instance.
These lines represent probably spams.
Count these lines, extract the floating point values from each of these lines
and compute the average of those values (the average spam confidence) and
print it.

For testing use mbox-short.txt file


The average should be: 0.7507185185185187

139 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

140 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

141 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

142 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Lists

Chapter 7

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

What is not a collection ?

• Most of our variables have one value in them


• When we put a new value in the variable, the old value is overwritten

144 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

A list is a kind of collection

• A collection allows us to put many values in a single “variable”


• A collection is nice because we can carry all many values around in one
convenient package.

145 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Lists Constants

• List constants are surrounded by square brackets and the elements in the
list are separated by commas
• A list element can be any Python object – even another list
• A list can be empty

146 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

We already uses Lists !

Output :
5
4
3
2
1
End !

147 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Looking inside Lists

• Just like strings, we can get at any single element in a list using an index
specified in square brackets

• IndexError in case of index out of range

148 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Lists are Mutable

• Recall: Strings are “immutable” - we cannot change the contents of a


string - we must make a new string to make any change

• Lists are “mutable” - we can change an element of a list using the index
operator

149 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

How Long is a List ?

• The len() function takes a list as a parameter and returns the number of
elements in the list
• Actually len() tells us the number of elements of any set or sequence
(such as a string...)

150 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Using the « range » function

• The range function returns a list of numbers that range from zero to one
less than the parameter
• We can construct an index loop using for and an integer iterator

• We can use the function list to obtain


the list from the range

151 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Concatenating a list using « + »

• We can create a new list by adding two existing lists together

152 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Building a list from scratch

• We can create an empty list and then add elements using the append()
method
• The list stays in order and new elements are added at the end of the list

153 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Lists can be Sliced using « : »

• Remember: Just like in strings, the second number is “up to but not
including”

154 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Lists Methods

>>> x = list()
>>> type(x)
<type 'list'>
>>> dir(x)
['append', 'count', 'extend', 'index', 'insert', 'pop',
'remove', 'reverse', 'sort']
>>>

https://docs.python.org/3/tutorial/datastructures.html

155 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Building a list from scratch

• We can create an empty list and then add elements using the append()
method
• The list stays in order and new elements are added at the end of the list

156 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Is something in a List ?

• Python provides two operators (in and not in) that let you check if an
item is in a list
• These are logical operators that return True or False
• They do not modify the list

157 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Lists are in order

• A list can hold many items and keeps those items in the order until we do
something to change the order
• A list can be sorted (i.e., change its order)
• The sort method means “sort yourself” and the list is modified

158 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Built-in functions and Lists

• There are a number of functions built into Python that take lists as
parameters (e.g. len, min, max, sum)

159 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

« split » method on a String return a List

• split breaks a string into parts and produces a list of strings. We think of
these as words. We can access a particular word or loop through all the
words.
• By default split use a spaces as separator

160 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

« split » method on a String return a list

• When you do not specify a delimiter, multiple spaces are treated like one
delimiter
• You can specify what delimiter character to use in the splitting

161 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

The Double Split Pattern

• Sometimes we split a line one way, and then grab one of the pieces of
the line and split that piece again
• e.g. extract host from the line "From [email protected] Sat Jan
5 09:14:16 2019"

162 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Summary

• Concept of a collection • Slicing lists


• Lists and definite loops • List methods: append, remove
• Indexing and lookup • Sorting lists
• List mutability • Splitting strings into lists of words
• Functions: len, min, max, sum • Using split to parse strings

163 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Exercise

Open the file romeo.txt and read it line by line. For each line, split the line
into a list of words using the split() function. The program should build a list of
words. For each word on each line check to see if the word is already in the
list and if not append it to the list. When the program completes, sort and
print the resulting words in alphabetical order.

output:
['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'kill', 'light', 'moon',
'pale', 'sick', 'soft', 'sun', 'the', 'through', 'what', 'window', 'with', 'yonder']

164 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

165 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

166 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

167 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

168 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

169 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Dictionaries

Chapter 8

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

A story of two collections


• List
o A linear collection of values that stay in order
o Lists index their entries based on the position in the list

• Dictionary
o A “bag” of values, each with its own label (key) 0 Joseph
o Entries index with a key (could be of any data type)
o Values could also be of any data type 2 Sally
o No order 1 Glenn
key Value
• Dictionaries are Python’s most powerful collection
• Dictionaries allow us to do fast database-like operations in Python

171 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Comparing Lists and Dictionaries

• Dictionaries are like lists except that they use keys instead of index
numbers to look up values

dict() to construct empty dictionary

dico[0]=‘Joseph’ to add the key 0 with the


value Joseph
Curly braces with a set of
key:value separated by commas
Get a value from a key

Add a new key test with the value Blabla

Get a value from a key

172 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Dictionary Literals (Constants)

• Dictionary literals use curly braces and have list of key:value pairs
• You can also make an empty dictionary using empty curly braces

173 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Dictionary Tracebacks

• It is an error to reference a key which is not in the dictionary


• We can use the in operator to see if a key is in the dictionary

4 is not in the dictionary keys

174 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Modifying a Value for a Key

• We can modify the value for a key in dictionary by assigning a new value
for this key

Change the mark of Jean to 18

Add one to the mark of


Claude

175 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Counting with Dictionaries

• One common use of dictionaries is counting how often we “see”


something

• e.g. counting names occurrences in a list:

If the name is not in


the dictionary
we add a new key with
a count of 1
If the name is already in
the dictionary, we just
add one to his count

Output : {'Jean': 2, 'Pierre': 1, 'Paul': 1, 'Jacques': 1}

176 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

The « get » method for Dictionaries

• The pattern of checking to see if a key is already in a dictionary and


assuming a default value if the key is not there is so common that there is
a method called get() that does this for us

counts.get(‘Bob’, 0):
if the key Bob doesn’t
exist in the dictionary
this return 0
(no Traceback !)

the key Jean exist in


the dictionary,
thus this return the
value for the Key Jean

177 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Simplify counting with get()

o If the name is not already in the dictionary, it is added in


the dictionary with the value (count) of 1 (0+1).
o If the name is already in the dictionary, his counts is
merely incremented by 1
178 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Counting examples (tweets)


This could be a preliminary step for tweets
analysis (topics, sentiment, etc.)
A more global analysis could also include
steps such as full stop words removal,
grouping similar words (synonyms), topic
modelling, …

tweet in a variable (String)


Removing punctuation (stop words)

Split the string into a list of words

Counting words

Output:

179 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Definite loops in Dictionaries

• Even though dictionaries are not stored in order, we can write a for
loop that goes through all the entries in a dictionary - actually it
goes through all of the keys in the dictionary and looks up the values

180 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Retrieving Lists of Keys and Values

• You can get a list of keys, values, or items (both) from a dictionary

List of (key, value) tuples, what is a tuple ? coming soon …

181 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Bonus: Two Iterations Variables

• We loop through the key-value pairs in a dictionary using *two*


iteration variables
• Each iteration, the first variable is the key and the second variable is
the corresponding value for the key

Same as

182 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Summary

• Lists versus Dictionaries • Retrieving list of keys and values


• Dictionary constants • Writing dictionary loops
• Counting with Dictionaries • Sneak peek: tuples
• Using get() method

183 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Exercise
Write a program to read through the mbox-short.txt and figure out who has sent
the greatest number of mail messages.
The program looks lines starting with 'From ' and takes the second word of those
lines as the person who sent the mail.
The program creates a Python dictionary that maps the sender's mail address to
a count of the number of times they appear in the file. After the dictionary is
produced, the program reads through the dictionary using a maximum loop to
find the most prolific committer.

Output should be : [email protected] with 5 mail sent

184 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

185 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

186 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

187 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

188 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

189 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Tuples

Chapter 9

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Tuples are like Lists

• Tuples are another kind of sequence that functions much like a list -
they have elements which are indexed starting at 0

note use of parenthesis rather than square


brackets (for lists)
Indexing like lists

functions like lists (e.g. max function)

Iteration like lists

191 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

But Tuples are … « immutables »


• Unlike a list, once you create a tuple, you cannot alter its contents -
similar to a string

You can alter a List after his creation (Lists are mutables)

You cannot alter a String


after his creation
(Strings are immutables)

You cannot alter a


Tuple after his creation
(Tuples are immutables)

192 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Things not to do with Tuples

193 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

A Tale of Two Sequences

• Only two methods (count and index) for Tuples

>>> l = list()
>>> dir(l)
['append', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']

>>> t = tuple()
>>> dir(t)
['count', 'index']

194 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Tuples are more efficient

• Since Python does not have to build tuple structures to be


modifiable, they are simpler and more efficient in terms of memory
use and performance than lists

• So in our program when we are making “temporary variables” we


prefer tuples over lists

195 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Tuples and assignments

• We can also put a tuple on the left-hand side of an assignment


statement
• We can even omit the parentheses

196 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Tuples and Dictionaries

• The items() method in dictionaries returns a list of (key, value)


tuples

197 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Sorting Lists of Tuples

• We can take advantage of the ability to sort a list of tuples to get a


sorted version of a dictionary
• First we sort the dictionary by the key using the items() method and
sorted() function

198 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Using « sorted() »

• We can do this even more directly using the built-in function sorted
that takes a sequence as a parameter and returns a sorted
sequence

199 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Sort by Values instead of keys

• If we could construct a list of tuples of the form (value, key) we


could sort by value
• We do this with a for loop that creates a list of tuples

200 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Even shorter version

• List comprehension creates a dynamic list. In this case, we make a


list of reversed tuples and then sort it.

https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions

201 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

Exercise
Write a python program that reads the mbox.txt file, parse it, and print
the number of mail sent per each hour of the day (in descending order
based on number of mail sent).

What is the most used hour of the day for sending mail ?

Note: for extracting the hour of the day, consider the lines starting with “From ” like
“From [email protected] Fri Jan 4 16:10:39 2008”, and for instance extract 16 as the
hour of the day in this case.

Output: the most used hour of the day will be 10 am with 198 mails sent

202 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

203 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

204 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

205 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

206 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

207 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])


lOMoARcPSD|28737988

MCQ Example

208 Référence document 13/01/2022

Downloaded by Anh V? Nguy?n ([email protected])

You might also like