Python - Basic - 3 - Jupyter Notebook (Student)
Python - Basic - 3 - Jupyter Notebook (Student)
We start with reading and writing from files, learn how to transport data objects in the JSON format
and how to browse the filesystem.
Then we move to Exception handling. We learn about the different types of Exceptions, what they
mean and how to resolve them. Then we learn how to handle Exceptions in our code using the try-
except statement.
Last but not least, we learn about Object Oriented Programming including class inheritance and
how to define and use your own classes in your coding.
Let's go!
Table of Contents
File I/O
Reading and writing lines with open
JSON
File browsing with glob
Summary
open
JSON
import
glob
RUN ME
Exercises
Exercise 1 - Read in stocks
Exercise 2 - First ten names
Exercise 3 - Inc only
Exercise 4 - Average PE
localhost:8890/notebooks/2022/22Aug/PRJ63504 Capstone (Python)/CADS/Python for Analytics (Basic)/MC/Day 3/Python_Day3_MC.ipynb 1/28
8/22/22, 9:20 PM Python_Day3_MC - Jupyter Notebook
Exception Handling
Exceptions
Try except
Summary
try except
try except else finally
RUN ME
Exercises
Exercise 1 - Fix it Multiply
Exercise 2 - Fix it Numbers
Exercise 3 - Fix it Open
Exercise 4 - Try salaries
Object Oriented Programming
Summary
class
RUN ME
Exercises
Exercise 1 - Get pokemon info
Exercise 2 - Create a pokemon class
Exercise 3 - Load all pokemons
Exercise 4 - Provide insights
File I/O
With the open() built-in function we can read, write and append to files.
First check what is your working directory. You will be able to find the file that we will read and write
there.
In [ ]: # MC
%pwd
We now create a file object which we will use to write something to file.
In [ ]: # MC
f = open('test.txt', 'w')
f.write("Hey!")
Now check the file in your folder. Can you see Hey! in test.txt?
In [ ]: # MC
f.close()
Now it is there.
open() returns a file object and it is commonly used with two arguments: open(file, mode)
Notice the fat-green with as statement. Within the with statement the file object will be available as
f. After the statement the file object will be closed.
In [ ]: # MC
with open('test2.txt', 'r') as f:
print(f.read())
In [ ]: # MC
with open('test2.txt', 'a') as f:
f.write("\n")
f.write("Hello!")
Use strip() to remove all whitespace characters from the beginning and the end of the string.
In [ ]: # MC
import os
file_names = os.path.join("..","data", "names_raw.txt")
file_names
The import statement allows you to import libraries which you can use.
Here we use os.path.join to safely join the paths of the data folder and the names_raw. txt file so
that it works for any operating system (Windows/Mac).
You can iterate over the file object and do something line by line.
In [ ]: # MC
with open(file_names, 'r') as f:
for line in f:
print(line)
Let's say now we want to transform this file into a list of names.
In [ ]: # MC - str.strip()
with open(file_names, 'r') as f:
lines_list = []
for line in f:
lines_list.append(line.strip())
lines_list[2:]
JSON
JSON (JavaScript Object Notation) is an open-standard format that uses human-readable text to
transmit data objects consisting of attribute–value pairs.
The notation is almost the same as the dictionary in Python except for:
"firstName": "John",
"lastName": "Smith",
"isAlive": true,
"age": 25,
"address": {
"state": "NY",
"postalCode": "10021-3100"
},
"phoneNumbers": [
"type": "home",
},
"type": "office",
},
"type": "mobile",
],
"children": [],
"spouse": null
In [ ]: # MC
import json
We can transform a data object into a json string and transform it back.
Notice the difference in true and null. Also notice the addidtional quotes at the start and the end
as this is a string.
Now we del data. Then load it again from the JSON file.
In [ ]: # MC
del data
Now we can dump and load data from text files that contain json directly.
In [ ]: data = [{'name': 'Jeremy', 'grades': [10, 15], 'teacher': True, 'complaints': Non
{'name': 'Akmal', 'grades': [3, 4], 'teacher': True, 'complaints': "Many"
{'name': 'Jan', 'grades': [8, 13, 20], 'teacher': True, 'complaints': Non
{'name': 'Heng', 'grades': [5, 5, 5], 'teacher': False, 'complaints': "Ma
]
In [ ]: # MC
with open('cads.json', 'w') as f:
json.dump(data, f)
Let's look at the resulting json file using !cat for Mac/Linux and !type for windows
And recursively.
Summary
open
f.write("Hello")
data = f.readlines()
f.readline()
data = []
for line in f:
data.append(line)
JSON
import json
json.dumps(data)
json.loads(json_str)
json.dump(data, f)
data = json.load(f)
import
import json
import os
glob
print(filename)
RUN ME
Please run the below code snippet. It is required for running tests for your solution.
if got == expected:
prefix = ' OK '.center(max_width_print, '=')
else:
prefix = ' FAIL '.center(max_width_print, '=')
print(('%s\n got: %s \nexpected: %s' % (prefix, repr(got), repr(expected)
end = '\n\n')
Exercises
Each line of this file is a json object, but the whole file content is not a valid json object since the
jsons are no separated by a comma and there is no extra brackets surrounding the jsons.
So you need to read the file line by line and transform those strings into dict using json.loads.
# TEST
print("read_in_stocks")
test(len(stocks), 6756)
test(type(stocks[0]), dict)
In [ ]: # MC
import os
import json
stocks_file = os.path.join("..", "data", "stocks.json")
stocks = []
with open(stocks_file, 'r') as f:
for line in f:
stocks.append(json.loads(line))
# TEST
print("read_in_stocks")
test(len(stocks), 6756)
test(type(stocks[0]), dict)
test(type(stocks[-1]), dict)
In [ ]: # MC
# or
import os
import json
stocks_file = os.path.join("..", "data", "stocks.json")
with open(stocks_file, 'r') as f:
stocks = [json.loads(line) for line in f]
# TEST
print("read_in_stocks")
test(len(stocks), 6756)
test(type(stocks[0]), dict)
test(type(stocks[-1]), dict)
Hints: inspect one dict to see which key contains the name.
In [ ]: # MC
names = [stock["Company"] for stock in stocks[:10] ]
# TEST
print("first_ten_names")
from answers import ten_companies
test(names, ten_companies)
From the top 10 companies, now only show the names that contain the word 'Inc.'
In [ ]: # MC
names = [n for n in names if 'Inc.' in n]
# TEST
from answers import inc_companies
print("inc_only")
test(names, inc_companies)
Exercise 4 - Average PE
Now show the average P/E for all the data. Round it by 2.
Not all the stocks have the P/E reported, so you need to handle that.
In [ ]: stocks[0]['Ticker']
In [ ]: # MC
pe = [s['P/E'] for s in stocks if 'P/E' in s] # not all stock has 'P/E'
avg_pe = round(sum(pe) / len(pe), 2)
perc_with_pe = round(len(pe) / len(stocks), 2)
# TEST
print("average_pe")
test(avg_pe, 41.71)
test(perc_with_pe, 0.5)
Exception Handling
Exceptions
Here is an example Exception.
Arrow ^ pointing at the earliest point in the line where the error was detected
The error is caused by the token preceding the arrow
File name and line number are printed
The last line of the error message indicates what happened. Let's go over them.
Find more
information here (https://docs.python.org/3/library/exceptions.html).
In [ ]: 10 * (1/0)
In [ ]: 4 + result*3
In [ ]: '1' + 1
In [ ]: d = {}
d[0]
In [ ]: l = []
l[0]
The preceding part of the error message shows the context where the exception happened.
Here is an example.
Try except
Sometimes exceptions are expected to happen and we want our code to handle those exceptions
in a certain way. For this there is the try except statement.
typing a number
typing a non-numeric string
typing a zero
ending the cell by intterupting the kernel
In [ ]: try:
print(var)
except:
print("An exception occurred")
In [ ]: try:
x = float(input("Please enter a number: "))
print("inverse is:", 1/x)
except:
print("Oops! That was no valid number.")
In this case we have used a bare except statement. This means all exception are catched
including the KeyboardInterrupt.
You can specifiy which exception you want to be catched and you want to handle them.
Let's try typing an number, a non-numeric number, a zero and a keyboardintterup again.
The raise statement allows the programmer to force a specified exception to occur. For example:
raise NameError('HiThere')
In [ ]: try:
x = float(input("Please enter a number: "))
print("inverse is:", 1/x)
except ValueError:
print("Oops! That was no valid number. Try again...")
except ZeroDivisionError:
print("Oops! Cannot divide by 0!")
except:
print("Something else went wrong")
# raise # return error message
divide(2,0)
In [ ]: divide(2,1)
In [ ]: divide("a","b")
In [ ]: total = 0
numbers = [1,2,3,"a", "b", "c", 5, 6, 7]
for number in numbers:
total += number
total
If we want to have the total of the numbers in the list we can ignore the strings.
total
In [ ]: # MC
total = 0
for number in numbers:
try:
total += number
except TypeError as e:
print("Error: {}".format(e))
total
You can use the bare except statement. As long as you are mindful about that it catches all
exceptions it's fine. For a script you operate yourself it can be fine.
In [ ]: # MC
total = 0
for number in numbers:
try:
total += number
except:
pass
total
Summary
try except
try:
total += number
except:
pass
In addition to using an except block after the try block, you can also use the
finally
block. The code in the finally block will be executed regardless of whether an
exception
occurs. The else code is executed in case the try statement was a
succes.
try:
result = x / y
except ZeroDivisionError:
print("division by zero!")
else:
print("result is", result)
finally:
RUN ME
Please run the below code snippet. It is required for running tests for your solution.
if got == expected:
prefix = ' OK '.center(max_width_print, '=')
else:
prefix = ' FAIL '.center(max_width_print, '=')
print(('%s\n got: %s \nexpected: %s' % (prefix, repr(got), repr(expected)
end = '\n\n')
Exercises
In [ ]: # MC
# "a" is the key, a is a variable that does not exist
# data not dala
data = {"a": 10, "b": 20}
answer = data["a"] * data["b"]
# TEST
print("fix_it")
test(answer, 200)
We want to extend numbers with another list and then get the sum.
In [ ]: # MC
# extend is an in-place operation!
numbers = [1,2,3,4]
numbers.extend([5,6,7,8])
answer = sum(numbers)
# TEST
print("fix_it_numbers")
test(answer, 36)
In [ ]: import os
# TEST
print("fix_it_open")
test(content, 'The names are:\n\nJeremy\nJan\nAkmal')
In [ ]: # MC
# data is a string
# read mode needs to be 'r' for reading
with open(os.path.join("data", "names_raw.txt"), 'r') as f:
content = f.read()
# TEST
print("fix_it_open")
test(content, 'The names are:\n\nJeremy\nJan\nAkmal')
Load it into a list of dictionaries and use try catch to handle the non-dictionary lines.
# TEST
print("try_salaries")
from answers import the_salaries
test(salaries, the_salaries)
# TEST
print("try_salaries")
from answers import the_salaries
test(salaries, the_salaries)
We will define a class Student. A class is a blue-print for an object. After defining the class Student
there is only one Student class and you can create many Student objects out of that class.
Then we create a child class PythonStudent. The child class inherits all the properties from its
parent and has it's own functionalities additionally.
In [ ]: # MC
class Student:
Instantiating objects
In [ ]: print(jeremy.grades)
print(jeremy.name)
You can copy paste again and add the average function.
In [ ]: # MC
class Student:
def __repr__(self):
return "{}:{}".format(self.name, self.grades)
def average(self):
return round(sum(self.grades) / len(self.grades), 2)
In OOP there is the concept of inheritance. You can make a class that is the child of another class.
In [ ]: # MC
class PythonStudent(Student):
def can_program(self):
return True
In [ ]: # MC
narjes.average()
In [ ]: # MC
narjes.can_program()
In [ ]: students = [jeremy, amin, narjes] # jeremy, amin, narjes are objects.
print(students)
In [ ]: # MC
[n for n in students if isinstance(n, Student)]
Well done!
Summary
class
class Student:
self.name = name
self.grades = grades
def average(self):
class PythonStudent(Student):
def can_program(self):
return True
RUN ME
Please run the below code snippet. It is required for running tests for your solution.
if got == expected:
prefix = ' OK '.center(max_width_print, '=')
else:
prefix = ' FAIL '.center(max_width_print, '=')
print(('%s\n got: %s \nexpected: %s' % (prefix, repr(got), repr(expected)
end = '\n\n')
Exercises
The CEO of your company happens to have a kid who is totally into Pokemon! For this reason he
is very interested in finding out more about these Pokemon for himself. The thing is.. all these
Pokemons are hidden away in files and spread out over different continents and countries folders.
Oh no, what a bummer.
Ambitious as you are and on the verge of locking down that next promotion you walk in and say:
Use your Python skills in browsing folder structure, lists, dictionaries, reading from csv and writing
to Excel to provide those insights and totally save the day!
Have a look the the pokeworld folder inside the data folder.
Each file represents a pokemon and inside the file is the description.
Get the continent, country, name and description for the pokemon in
pokeworld/Asia/Malaysia/Wooper.txt
In [ ]: import os
pokefile = os.path.join("data", "pokeworld/Asia/Malaysia/Wooper.txt")
pokefile
In [ ]: # MC
lst = pokefile.split("/")
lst[-1].split('.')[0]
In [ ]: # MC
continent = ## your code
country = ## your code
name = ## your code
description = ## your code
# TEST
print("get_pokemon_info")
from answers import wooper_desc
test(continent, 'Asia')
test(country, 'Malaysia')
test(name, 'Wooper')
test(description, wooper_desc)
In [ ]: # MC
continent = pokefile.split("/")[-3]
country = pokefile.split("/")[-2]
name = pokefile.split("/")[-1].split(".")[0]
# this is for later test(description, wooper_desc) purpose
with open(pokefile, 'r') as f:
description = f.read()
# TEST
print("get_pokemon_info")
from answers import wooper_desc
test(continent, 'Asia')
test(country, 'Malaysia')
test(name, 'Wooper')
test(description, wooper_desc)
# TEST
print("create_pokemon_class")
pokefile = os.path.join("data", "pokeworld/Asia/Malaysia/Wooper.txt")
pokefile2 = os.path.join("data", "pokeworld/Asia/Thailand/Hive.txt")
wooper = Pokemon(pokefile)
hive = Pokemon(pokefile2)
test(wooper.__repr__(), 'Wooper from Malaysia, Asia')
test(wooper.is_big_data(), False)
test(hive.is_big_data(), True)
In [ ]: # MC
class Pokemon():
self.continent = pokefile.split("/")[-3]
self.country = pokefile.split("/")[-2]
self.name = pokefile.split("/")[-1].split(".")[0]
with open(pokefile, 'r') as f:
self.description = f.read()
def is_big_data(self):
return 'Hadoop' in self.description or 'Apache' in self.description
def __repr__(self):
return "{} from {}, {}".format(self.name, self.country, self.continent)
# TEST
print("create_pokemon_class")
pokefile = os.path.join("data", "pokeworld/Asia/Malaysia/Wooper.txt")
pokefile2 = os.path.join("data", "pokeworld/Asia/Thailand/Hive.txt")
wooper = Pokemon(pokefile)
hive = Pokemon(pokefile2)
test(wooper.__repr__(), 'Wooper from Malaysia, Asia')
test(wooper.is_big_data(), False)
test(hive.is_big_data(), True)
Now that we have defined the Pokemon class we can make a list of all the pokemon.
In [ ]: # MC
import glob
folder = "data/pokeworld"
pokefiles = glob.glob('{}/**/*.txt'.format(folder), recursive=True)
pokemons = [Pokemon(f) for f in pokefiles] # remember that 'Pokemon' is a class d
# TEST
print("load_all_pokemons")
test(len(pokemons), 750) # pokemons is a list of 750 objects
test(type(pokemons[0]), Pokemon) # check if the first item in 'pokemons' has the
In [ ]: # MC
pikachu = [p for p in pokemons if p.name == "Pikachu"][0] # get the first item w
pikachu.country
In [ ]: # MC
malaysians = [p for p in pokemons if p.country == "Malaysia"] # remember that 'po
len(malaysians)
In [ ]: # MC
asians = [p for p in pokemons if p.continent == "Asia"]
len(asians)
Hints:
setdefault() method returns the value of a key (if the key is in dictionary). If not, it inserts
key with a value to the dictionary.
In [ ]: # MC
continents = {}
for p in pokemons:
continents.setdefault(p.continent, 0) # if p.continent not found, insert 0
continents[p.continent] += 1
sorted(continents.items(), key=lambda x: x[-1], reverse=True)[0]
In [ ]: # MC
countries = {}
for p in pokemons:
countries.setdefault(p.country, 0)
countries[p.country] +=1
In [ ]: # MC
bigdata = set([p.country for p in pokemons if p.is_big_data()])
bigdata