Part 1 Fundamentals Python for Data Science
Part 1 Fundamentals Python for Data Science
Part-1-fundamentals-python-for-data-science
Dieudonné TCHUENTE
PhD. Senior IT/Data Consultant & Big Data Architect
[email protected]
Ass Professor in Computer Science and Big Data
www.tbs-education.fr
Video Link:https://www.youtube.com/watch?v=7Hll55GCyvI
Optimization
Deep Learning and scientific
Scipy
computing
Fundamentals
In this course…
Syntax and Data structures: files, lists,
strings, dictionaries, tuples, etc.
Course Outlines
Course Objectives
- Nowadays, Python is the most used language for data analysis in industry
- Data preparation accounts for about 80% of the work of data scientists…
Evaluation
Principle:
• For learning a new programming language, you need to be curious (search in
documentations, forums, …), collaborate and practice, practice, practice…
Introduction
R Python
Objective Statistics General purposes
Data Analysis Data Analysis
Deployment and production
Primary users Scholar and R&D Programmers and developers
Learning Curve Difficult at the beginning Linear and smooth
Popularity 4.23% in 2018 21.69% in 2018
Python
Python 2 vs Python 3
• After download, execute the installer file and follow the steps to install it on your
computer (follow the provided installation guide of the course)
• For Mac OS: python3 --version (for the version) and python3 (to
launch the interpreter)
15 Référence document 13/01/2022
• Interactive Python is good for experiments and programs of 3-4 lines long
• Most programs are much longer, so we type them into a file and tell Python to run
the commands in the file
• In a sense, we are “giving Python a script”
• As a convention, we add “.py” as the suffix on the end of these files to indicate
they contain Python
Interactive vs Script
• Interactive: You type directly to Python one line at a time and it responds
• Script: You enter a sequence of statements (lines) into a file using a text
editor and tell Python to execute the statements in the file
Python in an IDE
• Many IDE (Integrated Development Environment) for editing Python
code files : PyCharm, Spyder, PyDev, Atom …
• In this course we use Spyder (you have it by default after installing
Anaconda)
Python in an IDE
www.tbs-education.fr
Variables,
Expressions, and
Statements
Chapter 1
Constants
Variables
• A variable is a named place in the memory where a programmer
can store data and later retrieve the data using the variable
“name”
• Programmers get to choose the names of the variables
• You can change the contents of a variable in a later statement
Y = 14
Y 14
X = 100
Reserved Words
Sentences or lines
Assignment Statement
X= 3.9 * X * (1 – X)
Numeric Expressions
• Because of the lack of mathematical symbols on computer
keyboards - we use “computer-speak” to express the classic math
operations
• Asterisk is multiplication
• Exponentiation (raise to a power) looks different than in math
Operator Operation
+ Addition
- Substraction
* Multiplication
/ Division
** Power
% Remainder
Type matters
• Python knows what “type” everything is
Types conversions
• When you put an integer and floating point in an expression, the
integer is implicitly converted to a float
• You can control this with the built-in functions int() and float()
Integer Division
• Integer division produces a floating point result
String Conversion
• You can also use int() and float() to convert between strings and
integers
• You will get an error if the string does not contain numeric
characters
User Input
• We can instruct Python to pause and read data from the user
using the input() function
• The input() function returns a string
Comments in Python
• Anything after a # is ignored by Python
• Why comment?
o Describe what is going to happen in a sequence of code
o Document who wrote the code or other ancillary
information
o Turn off a line of code - perhaps temporarily
Summary
• Types
• Reserved words
• Variables
• Operators
• Integer Division
• Conversion between types
• User input
• Comments (#)
Exercise
Write a program to prompt the user for hours and rate per hour to
compute gross pay. Write this program using a file named pay.py and
execute it.
Enter Hours: 35
Enter Rate: 2.75
Pay: 96.25
MCQ Example
MCQ Example
MCQ Example
A) Hello1
B) Hello 1
C) A TypeError
MCQ Example
A) An integer
B) A String
C) A floating point number
D) A List
Conditional
Executions
Chapter 2
Conditional Steps
Output :
Smaller than 10
Finish
Comparison Operators
>= Greater
than or
Equal to
> Greather
than
!= Not Equal
Is 5
Is Still 5
Third 5
Afterwards 5
Before 6
Afterwards 6
Indentation
Output : Output :
Bigger than 2
Still bigger print('Still bigger')
Done with 2 ^
IndentationError: unindent does not
match any outer indentation level
Nested Decisions
Output :
More than one
Less than 100
All done
Output :
Bigger
All done
Try also with x = 1 …
Output :
Medium
All done
Try also with x = 1 and x = 11 …
MultiWay Puzzles
Output :
Traceback (most recent call last):
File
"C:\Users\d.tchuente\Documents\code\notry.p
y", line 2, in <module>
istr = int(astr)
ValueError: invalid literal for int() with base
10: 'Hello Bob'
Output :
Summary
• Comparison operators
== <= >= > < !=
• One-way Decisions
• Nested Decisions
• Two-way decisions: if: and else:
• Multi-way decisions using elif
• Indentation
• try / except to compensate for errors
Exercise
Rewrite your pay program using try and except so that your program handles
non-numeric input gracefully.
Write this program using a file named pay2.py and execute it.
Or :
Enter Hours: forty
Error, please enter numeric input
Exercise 2
Write a program to prompt the user for hours and rate per hour using input to
compute gross pay.
Pay the hourly rate for the hours up to 40 and 1.5 times the hourly rate for all hours
worked above 40 hours.
Use 45 hours and a rate of 10.50 per hour to test the program (the pay should be
498.75).
You should use input() to read a string and float() to convert the string to a number.
Use try and except so that your program handles non-numeric input gracefully.
Write this program using a file named pay3.py and execute it.
An output can be:
Enter Hours: 45
Enter Rate: 10.5
Pay: 498.75
Or :
Enter Hours: forty
Error, please enter numeric input
MCQ Example
MCQ Example
MCQ Example
MCQ Example
Functions
Chapter 3
Output :
Welcome
D2M
Another Invocation
Welcome
D2M
Python Functions
Function Definition
11
• This defines the function but does not execute the body of the function
Definition
Call (Invocation)
Argument
Output :
Pay: 450.0
Pay: 450.0
Pay: 450.0
69 Référence document 13/01/2022
• An argument can have a default value (to use if this argument is not
provided when calling), it is an optional argument
Output :
Pay: 400.0
Pay: 498.75
Pay: 472.5
Pay: 472.5
70 Référence document 13/01/2022
Return values
• Often a function will take its arguments, do some computation, and return a
value to be used as the value of the function call in the calling expression.
The return keyword is used for this.
Output example :
Enter Hours: 45
Enter Rate: 10
Pay: 450.0
71 Référence document 13/01/2022
• If something gets too long or complex, break it up into logical chunks and
put those chunks in functions
• Make a library of common stuff that you do over and over - perhaps share
this with your friends...
Summary
• Functions
• Built-in Functions
• Arguments
Exercise
Write a Python function (named max_of_three) that find and return the Max
of three numbers.
MCQ Example
MCQ Example
MCQ Example
MCQ Example
Chapter 4
Repeated Steps
• Loops (repeated steps) have iteration variables that change each time through a
loop. Often these iteration variables go through a sequence of numbers.
Output :
5
4
3
2
1
Out of the while loop!
Last value of n = 0
80 Référence document 13/01/2022
An infinite loop
• What is wrong with this loop ?
• Which code line will never execute ?
Output example :
this means if the first character of line equals # (to be seen later …)
Output example :
83 Référence document 13/01/2022
5
4
Output : 3
2
1
End !
84 Référence document 13/01/2022
Output :
Output :
Output :
• We have a variable that is the smallest so far. The first time through the loop
smallest is None, so we take the first value to be the smallest.
Output :
Summary
• Infinite loops
• Using break
• Using continue
• Iteration variables
Exercise
Write a program that repeatedly prompts a user for integer numbers until
the user enters 'done'. Once 'done' is entered, print out the largest and
smallest of the numbers.
If the user enters anything other than a valid number catch it with a
try/except and put out the message ‘Invalid input’ and ignore the number.
Enter 7, 2, bob, 10, 4, done and match the output below.
Output Example
MCQ Example
MCQ Example
MCQ Example
Strings
Chapter 5
• You will get a python error if you attempt to index beyond the end of a
string
• So be careful when constructing index values and slices
• You will get a python error if you attempt to index beyond the end of a
string
• So be careful when constructing index values and slices
Slicing Strings
• We can also look at any continuous section
of a string using a colon operator
Slicing Strings
• The in keyword can also be used to check to see if one string is “in”
another string
• The in expression is a logical expression that returns True or False and
can be used in an if statement
String Library
• Python has a number of string functions
which are in the string library
String Library
• To get the list of built-in function that apply to variable (the type of
the variable), use dir()
>>> stuff = 'Hello world'
>>> type(stuff)
<class 'str'>
>>> dir(stuff)
['capitalize', 'casefold', 'center', 'count', 'encode',
'endswith', 'expandtabs', 'find', 'format', 'format_map',
'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit',
'isidentifier', 'islower', 'isnumeric', 'isprintable',
'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower',
'lstrip', 'maketrans', 'partition', 'replace', 'rfind',
'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip',
'split', 'splitlines', 'startswith', 'strip', 'swapcase',
'title', 'translate', 'upper', 'zfill']
String Library
• The full list of built-in functions (methods) for Strings is accessible in
python documentation:
https://docs.python.org/3/library/stdtypes.html#string-methods
• Documentation example
String Library
• Examples
• We use the find() function to search for a substring within another string
• find() finds the first occurrence of the substring
• If the substring is not found, find() returns -1
• Remember that string position starts at zero
• You can make a copy of a string in lower case with lower() or upper case
with upper()
• Often when we are searching for a string using find() we first convert the
string to lower case so we can search a string regardless of case
String Library
Exercise
Write code using find() and string slicing to extract the number at the end of
the line below.
Convert the extracted value to a floating point number and print it out.
MCQ Example
MCQ Example
MCQ Example
MCQ Example
MCQ Example
MCQ Example
MCQ Example
Files
Chapter 6
File Processing
• It is time to go find some Data to mess with!
Details:
http://source.sakaiproject.org/viewsvn/?view=rev&rev=39772
Open a File
• Before we can read the contents of the file, we must tell Python which
file we are going to work with and what we will be doing with the file
Using open()
• filename is a string
• mode is optional and should be 'r' if we are planning to read the file
and 'w' if we are going to write to the file (by default mode is ‘r’)
A FileNotFoundError is raised …
File processing
• A text file can be thought of as a sequence of lines
• and has newline at the end of each line !
• If we access this line for example, the corresponding string length will
be 9 (not 8)
Output :
Line Count: 132045
• For example, we can put an if statement in our for loop to only print
lines that meet some criteria
Output Example
Summary
Exercise
Write a program that prompts for a file name, then opens that file and reads
through the file, looking for lines starting with the form:
X-DSPAM-Confidence: 0.8475
Look in the file mbox-short.txt for instance.
These lines represent probably spams.
Count these lines, extract the floating point values from each of these lines
and compute the average of those values (the average spam confidence) and
print it.
MCQ Example
MCQ Example
MCQ Example
Lists
Chapter 7
Lists Constants
• List constants are surrounded by square brackets and the elements in the
list are separated by commas
• A list element can be any Python object – even another list
• A list can be empty
Output :
5
4
3
2
1
End !
• Just like strings, we can get at any single element in a list using an index
specified in square brackets
• Lists are “mutable” - we can change an element of a list using the index
operator
• The len() function takes a list as a parameter and returns the number of
elements in the list
• Actually len() tells us the number of elements of any set or sequence
(such as a string...)
• The range function returns a list of numbers that range from zero to one
less than the parameter
• We can construct an index loop using for and an integer iterator
• We can create an empty list and then add elements using the append()
method
• The list stays in order and new elements are added at the end of the list
• Remember: Just like in strings, the second number is “up to but not
including”
Lists Methods
>>> x = list()
>>> type(x)
<type 'list'>
>>> dir(x)
['append', 'count', 'extend', 'index', 'insert', 'pop',
'remove', 'reverse', 'sort']
>>>
https://docs.python.org/3/tutorial/datastructures.html
• We can create an empty list and then add elements using the append()
method
• The list stays in order and new elements are added at the end of the list
Is something in a List ?
• Python provides two operators (in and not in) that let you check if an
item is in a list
• These are logical operators that return True or False
• They do not modify the list
• A list can hold many items and keeps those items in the order until we do
something to change the order
• A list can be sorted (i.e., change its order)
• The sort method means “sort yourself” and the list is modified
• There are a number of functions built into Python that take lists as
parameters (e.g. len, min, max, sum)
• split breaks a string into parts and produces a list of strings. We think of
these as words. We can access a particular word or loop through all the
words.
• By default split use a spaces as separator
• When you do not specify a delimiter, multiple spaces are treated like one
delimiter
• You can specify what delimiter character to use in the splitting
• Sometimes we split a line one way, and then grab one of the pieces of
the line and split that piece again
• e.g. extract host from the line "From [email protected] Sat Jan
5 09:14:16 2019"
Summary
Exercise
Open the file romeo.txt and read it line by line. For each line, split the line
into a list of words using the split() function. The program should build a list of
words. For each word on each line check to see if the word is already in the
list and if not append it to the list. When the program completes, sort and
print the resulting words in alphabetical order.
output:
['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'kill', 'light', 'moon',
'pale', 'sick', 'soft', 'sun', 'the', 'through', 'what', 'window', 'with', 'yonder']
MCQ Example
MCQ Example
MCQ Example
MCQ Example
MCQ Example
Dictionaries
Chapter 8
• Dictionary
o A “bag” of values, each with its own label (key) 0 Joseph
o Entries index with a key (could be of any data type)
o Values could also be of any data type 2 Sally
o No order 1 Glenn
key Value
• Dictionaries are Python’s most powerful collection
• Dictionaries allow us to do fast database-like operations in Python
• Dictionaries are like lists except that they use keys instead of index
numbers to look up values
• Dictionary literals use curly braces and have list of key:value pairs
• You can also make an empty dictionary using empty curly braces
Dictionary Tracebacks
• We can modify the value for a key in dictionary by assigning a new value
for this key
counts.get(‘Bob’, 0):
if the key Bob doesn’t
exist in the dictionary
this return 0
(no Traceback !)
Counting words
Output:
• Even though dictionaries are not stored in order, we can write a for
loop that goes through all the entries in a dictionary - actually it
goes through all of the keys in the dictionary and looks up the values
• You can get a list of keys, values, or items (both) from a dictionary
Same as
Summary
Exercise
Write a program to read through the mbox-short.txt and figure out who has sent
the greatest number of mail messages.
The program looks lines starting with 'From ' and takes the second word of those
lines as the person who sent the mail.
The program creates a Python dictionary that maps the sender's mail address to
a count of the number of times they appear in the file. After the dictionary is
produced, the program reads through the dictionary using a maximum loop to
find the most prolific committer.
MCQ Example
MCQ Example
MCQ Example
MCQ Example
MCQ Example
Tuples
Chapter 9
• Tuples are another kind of sequence that functions much like a list -
they have elements which are indexed starting at 0
You can alter a List after his creation (Lists are mutables)
>>> l = list()
>>> dir(l)
['append', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']
>>> t = tuple()
>>> dir(t)
['count', 'index']
Using « sorted() »
• We can do this even more directly using the built-in function sorted
that takes a sequence as a parameter and returns a sorted
sequence
https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions
Exercise
Write a python program that reads the mbox.txt file, parse it, and print
the number of mail sent per each hour of the day (in descending order
based on number of mail sent).
What is the most used hour of the day for sending mail ?
Note: for extracting the hour of the day, consider the lines starting with “From ” like
“From [email protected] Fri Jan 4 16:10:39 2008”, and for instance extract 16 as the
hour of the day in this case.
Output: the most used hour of the day will be 10 am with 198 mails sent
MCQ Example
MCQ Example
MCQ Example
MCQ Example
MCQ Example
MCQ Example