Pyecon
Pyecon
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Learning Python for econometrics 3
Essential
concepts
Getting started Knowledge after completing this course:
Procedural
programming
Object-orientation
You have acquired a basic understanding of programming in general
Numerical
programming
with Python and a special knowledge of working with standard
NumPy package numerical packages.
Array basics
Linear algebra
You are able to study Python in depth and absorb new knowledge
Data formats and
handling for your scientific work with Python.
Pandas package
Series
You know the capabilities and further possibilities to use Python
DataFrame
in econometrics.
Import/Export data
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Learning Python for econometrics 4
Essential
concepts
Getting started What you should not expect from this course:
Procedural
programming
Object-orientation
A guide how to install or maintain an application.
Numerical
programming An introduction to programming for beginners.
NumPy package
Array basics An introduction to professional development tools.
Linear algebra
Data formats and Non-scientific, general purpose programming (beyond the language
handling
Pandas package
essentials).
Series
DataFrame
Few content and less effort...
Import/Export data
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Course organisation 5
Essential
concepts
Getting started This course can be seen as an applied lecture:
Procedural
programming
Object-orientation Lecture:
Numerical We try to explain the partly theoretical knowledge on Python by sim-
programming
NumPy package ple, easy to understand examples. You can learn the programming
Array basics
Linear algebra
language’s subtleties by reading literature.
Data formats and Exercises:
handling
Pandas package Digital work sheets in the form of Jupyter notebooks with applied
Series
DataFrame
tasks are available for each chapter. For all exercises there are sample
Import/Export data solutions available in separate notebooks.
Visual
illustrations Self-tests:
Matplotlib package
Figures and subplots
At the end of each of the five chapters there are typical exam questions.
Plot types and styles
Pandas layers
Written exam:
Applications There will be a final exam. This will be a pure multiple choice exam:
Time series
60 questions, 90 minutes.
Moving window
Financial applications
After the successful participation in the exam you will receive 6 ECTS.
© 2019 PyEcon.org
Literature 6
Essential
concepts
Getting started The programming language Python is already established and very well
Procedural
programming in trend for numerical applications. Some keywords:
Object-orientation
Numerical
programming
Data science,
NumPy package
Array basics
Data wrangling,
Linear algebra
Machine learning,
Data formats and
handling
Pandas package
Numerical statistics,
Series
DataFrame
...
Import/Export data
© 2019 PyEcon.org
Software: Python 3 7
Essential
concepts
Getting started We are using Python 3. There was a big revision in the migration
Procedural
programming from Python 2 to version 3 and the new version is no longer backwards
Object-orientation
compatible to the old version.
Numerical
programming
NumPy package Python 3 running [command line]
Array basics
Linear algebra python3 --version
Data formats and
handling
Pandas package ## Python 3.6.7
Series
DataFrame
Import/Export data The normal execution mode is that the Python interpreter processes
Visual
illustrations
the instructions in the background – in other numeric programming
Matplotlib package languages such as R this is known as batch mode. It executes program
Figures and subplots
Plot types and styles
code that is usually located in a source code file.
Pandas layers
The interpreter can also be started in an interactive mode. It is used
Applications
Time series for testing and analytical purposes in order to obtain fast results when
Moving window
Financial applications
performing simple applications.
© 2019 PyEcon.org
Software: IDEs 8
Essential
concepts
Getting started For everyday work with Python it would be extremely tedious to make
Procedural
programming all edits in interactive mode.
Object-orientation
Numerical
There are a number of excellent integrated development environments
programming
NumPy package
(IDEs) for Python, with three being emphasized here:
Array basics
Linear algebra
Jupyter (and IPython)
Data formats and
handling Spyder (scientific IDE)
Pandas package
Series PyCharm (by IntelliJ)
DataFrame
Import/Export data
Visual
Of course, you can also use a simple text editor. However, you would
illustrations
Matplotlib package
probably miss the comfort of an IDE.
Figures and subplots
Plot types and styles
Installing, adding and maintaining Python is not trivial at the beginning.
Pandas layers Therefore, as a beginner, you are well advised to download and install
Applications the Python distribution Anaconda. Bonus: Many standard packages
Time series
Moving window are supplied directly or you can post-install them conveniently.
Financial applications
© 2019 PyEcon.org
Following this course 9
Essential
concepts
Getting started In this course – in a numerical and analytical context – we use only
Procedural
programming Jupyter with the IPython kernel.
Object-orientation
Numerical
That is why we have combined
programming
NumPy package
Array basics
1 all the code from the slides, and
Linear algebra
2 all the exercises and solutions
Data formats and
handling
Pandas package
into interactive Jupyter notebooks that you can use online without
Series
DataFrame having to install software locally on your computer. The GWDG has
Import/Export data
set up a cloud-based Jupyter-Hub for you.
Visual
illustrations
Matplotlib package
You can access the working environment with your university credentials
Figures and subplots at
Plot types and styles
Pandas layers https://jupyter.gwdg.de/
Applications
Time series
create a profile and get started right away – even using your smart
Moving window devices. However, so far you are still asked to upload the course
Financial applications
notebooks by yourself or rewrite the code from scratch.
© 2019 PyEcon.org
Notebook workflow 10
Essential
concepts
Getting started A Jupyter notebook is divided into individual, vertically arranged cells,
Procedural
programming which can be executed separately:
Object-orientation
Numerical
programming
NumPy package
Array basics
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
The notebook approach is not novel and comes from the field of
computer algebra software.
© 2019 PyEcon.org
Notebook workflow 11
Essential
concepts
Getting started Actually, an interactive Python interpreter called IPython is started “in
Procedural
programming the core”.
Object-orientation
Applications
magic commands.
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Following this course 12
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Numerical
programming
NumPy package
Array basics
Linear algebra Finally, we wish you a lot of fun and success with and in this course!
Data formats and
handling
Pandas package Practice makes perfect!
Series
DataFrame
Import/Export data
Visual
illustrations
Matplotlib package
Figures and subplots Contribution and credits:
Plot types and styles
Pandas layers
Fabian H. C. Raters
Applications
Time series Eike Manßen
Moving window
Financial applications
GWDG for the Jupyter-Hub
© 2019 PyEcon.org
Table of contents 13
Essential
concepts
Getting started
Procedural
programming
1 Essential concepts 4 Visual illustrations
Object-orientation
1.1 Getting started 4.1 Matplotlib package
Numerical
programming 1.2 Procedural programming 4.2 Figures and subplots
NumPy package
Array basics
1.3 Object-orientation 4.3 Plot types and styles
Linear algebra
2 Numerical programming 4.4 Pandas layers
Data formats and
handling 2.1 NumPy package 5 Applications
Pandas package
Series
2.2 Array basics 5.1 Time series
DataFrame 2.3 Linear algebra 5.2 Moving window
Import/Export data
© 2019 PyEcon.org
Chapter 1 14
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Essential concepts
Numerical
programming
NumPy package
Array basics
1.1 Getting started
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Section 1.1 15
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Essential concepts
Numerical
programming
NumPy package
Array basics
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Motivation for learning Python 16
Essential
concepts
Getting started Python can be described as
Procedural
programming
Object-orientation
a dynamic, strongly typed, multi-paradigm and object-oriented
Numerical
programming
programming language,
NumPy package
Array basics
for versatile, powerful, elegant and clear programming,
Linear algebra
with a general, high-level, multi-platform application scope,
Data formats and
handling
Pandas package
which is being used very successfully in the data science sector
Series and very much in trend.
DataFrame
Import/Export data
Visual
Moreover, Python is relatively easy to learn and its successful language
illustrations
Matplotlib package
design supports novices to professional developers.
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
A short history of time 17
Essential
concepts
Getting started ... of the Python era:
Procedural
programming
Object-orientation The language was originally developed in 1991 by Guido van Rossum.
Numerical Its name was based on Monty Python’s Flying Circus. Its main identifi-
programming
NumPy package cation feature is the novel markup of code blocks – by indentation:
Array basics
Linear algebra
Indentation example
Data formats and
handling password = input("I am your bank. Password please: ")
Pandas package
Series ## I am your bank. Password please: sparkasse
DataFrame
Import/Export data if password == "sparkasse":
Visual print("You successfully logged in!")
illustrations else:
Matplotlib package
print("Fail. Will call the police!")
Figures and subplots
Plot types and styles
Pandas layers ## You successfully logged in!
Applications
Time series
Moving window
This increases the readability of code and should at the same time
Financial applications encourage the programmer in programming neatly. Since the source
code can be written more compactly with Python, an increased efficiency
in daily work can be expected.
© 2019 PyEcon.org
A short history of time 18
Essential
concepts
Getting started Overview of the Python development by versions and dates:
Procedural
programming
Object-orientation
Numerical
programming
1990 1995 2000 2005 2010 2015 2020
NumPy package
Array basics
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Python 2.7 lives forever Python 2.7 will die
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Python 3.6
Financial applications
© 2019 PyEcon.org
In comparison 19
Essential
concepts
Getting started Comparing the way Python works with common programming languages,
Procedural
programming we briefly discuss a selection of popular competitors:
Object-orientation
Numerical C/C++:
programming
NumPy package CPython is interpreted, not compiled.
Array basics
Linear algebra C/C++ are strongly static, complex languages.
Data formats and
handling Java:
Pandas package
Series CPython is not compiled just-in-time.
DataFrame
Import/Export data Java has a C-type syntax.
Visual
illustrations
MATLAB
Matplotlib package
Figures and subplots
In Python you primarily follow a scalar way of thinking, while in
Plot types and styles MATLAB you write matrix-based programs.
Pandas layers
Applications In the numerical context, the matrix view and syntax are very
Time series
similar to those of MATLAB.
Moving window
Financial applications
MATLAB is partially compiled just-in-time.
Where CPython is the reference implementation – the “Original Python”,
© 2019 PyEcon.org
which is implemented in C itself.
In comparison 20
Essential
concepts
Getting started R
Procedural
programming
Object-orientation
In Python you primarily follow a scalar way of thinking, while in R
Numerical
you write vector-based programs.
programming
NumPy package R has a C-type syntax including additions to novel language con-
Array basics
Linear algebra
cepts.
Data formats and Stata
handling
Pandas package Any comparison would inadequately describe the differences.
Series
DataFrame
Import/Export data Reference semantics
Visual
illustrations An extremely important difference between the first two languages,
Matplotlib package
Figures and subplots
C/C++ and Java, as well as Python itself, and the last three languages
Plot types and styles is that they follow a call-by-reference semantic, while MATLAB, R and
Pandas layers
Applications
Stata are call-by-copy.
Time series
Moving window Further specific differences and similarities to MATLAB and R will be
Financial applications
addressed in other parts of this course.
© 2019 PyEcon.org
Versatility – diversity 21
Essential
concepts
Getting started Python has become extremely popular:
Procedural
programming
Object-orientation
Numerical
programming
NumPy package
Array basics
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
Source: https://stackoverflow.blog/2017/09/06/incredible-growth-python/
© 2019 PyEcon.org
Versatility – diversity 22
Essential
concepts
Getting started So, you’re on the right track – because who wants to bet on the wrong
Procedural
programming hoRse?
Object-orientation
Numerical
programming
NumPy package
Array basics
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
Source: https://stackoverflow.blog/2017/09/06/incredible-growth-python/
© 2019 PyEcon.org
Versatility – diversity 23
Essential
concepts
Getting started Areas in which Python is used with great success:
Procedural
programming
Object-orientation Scripts,
Numerical Console applications,
programming
NumPy package GUI applications,
Array basics
Linear algebra
Game development,
Data formats and Website development, and
handling
Pandas package
Numerical programming.
Series
DataFrame Places where Python is used:
Import/Export data
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Yet another outline 24
Essential
concepts
Getting started In this course we will successively gain the following insights:
Procedural
programming
Object-orientation
Numerical
programming
1 General basics of the language.
NumPy package
Array basics
Linear algebra
2 Numerical programming and handling of data sets.
Data formats and
handling 3 Application to economic and analytical questions.
Pandas package
Series
DataFrame
Import/Export data
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Section 1.2 25
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Essential concepts
Numerical
programming
NumPy package
Array basics
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
The first program 26
Essential
concepts
Getting started Programs can be implemented very quickly – this is a pretty minimal
Procedural
programming example. You can write this command to a text file of your choice and
Object-orientation
run it directly on your system:
Numerical
programming
NumPy package Hello there
Array basics
Linear algebra
print("Hello there!")
Data formats and
handling ## Hello there!
Pandas package
Series
Visual
illustrations Function displays argument (a string) on screen,
Matplotlib package
Figures and subplots
Arguments are passed to the function in parentheses,
A string must be wrapped in " " or ’ ’,
Plot types and styles
Pandas layers
Applications
No semicolon at the end.
Time series
Moving window
Financial applications
© 2019 PyEcon.org
User input 27
Essential
concepts
Getting started Let’s add a user input to the program:
Procedural
programming
Object-orientation Hello you
name = input("Please enter your name: ")
Numerical
programming
NumPy package
Array basics
## Please enter your name: Angela Merkel
Linear algebra
print("Hello " + name + "!")
Data formats and
handling
Pandas package
## Hello Angela Merkel!
Series
DataFrame
Import/Export data
Visual
The function input() is used for interactive text input,
You can use the equal sign = to assign variables (here: name),
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Strings can be joined by the (overloaded) Operator +.
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Determining weekdays 28
Essential
concepts
Getting started We are now trying to find out on which weekday a person was born
Procedural
programming (Merkel’s birthday is 17-07-1954):
Object-orientation
Numerical
programming
Weekday of birth
NumPy package
from datetime import datetime
Array basics
Linear algebra answer = input("Your birthday (DD-MM-YYYY): ")
Data formats and
handling ## Your birthday (DD-MM-YYYY): 17-07-1954
Pandas package
Series birthday = datetime.strptime(answer, "%d-%m-%Y")
DataFrame print("Your birthday was on a " + birthday.strftime("%A") + "!")
Import/Export data
© 2019 PyEcon.org
Time since birth 29
Essential
concepts
Getting started And how many days have passed since then (until Merkel’s 4th swearing-
Procedural
programming in as Federal Chancellor)?
Object-orientation
Numerical
programming
Age in days
NumPy package
someday = datetime.strptime("14-03-2018", "%d-%m-%Y")
Array basics
Linear algebra
print("You are " + str((someday - birthday).days) + " days old!")
Data formats and
handling ## You are 23251 days old!
Pandas package
Series
DataFrame
Import/Export data You can create time differences, i. e., the operator - is overloaded,
Visual
illustrations The difference represents a new object, with its own attributes,
Matplotlib package
Figures and subplots
such as days,
Plot types and styles
Pandas layers
When using the overloaded operator +, you have to explicitly
Applications convert the number of days by means of str() into a string.
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Time since birth 30
Essential
concepts
Getting started How many years, weeks and days do you think that is?
Procedural
programming
Object-orientation Human readable age
Numerical
programming
from dateutil.relativedelta import relativedelta
NumPy package delta = relativedelta(someday, birthday)
Array basics print(f"That’s {delta.years} years, {delta.months} months "
Linear algebra
f"and {delta.days} days!!")
Data formats and
handling
Pandas package
## That's 63 years, 7 months and 25 days!!
Series
DataFrame
Import/Export data
Visual
You don’t have to keep reinventing the wheel – a wealth of packages
illustrations and individual modules are freely available,
Matplotlib package
Figures and subplots
A lowercase f before "..." provides convenient formatting – there
Plot types and styles
Pandas layers are other options as well,
Applications
Time series
Two strings in sequence are implicitly joined together – "That"
Moving window "’s nice"!
Financial applications
© 2019 PyEcon.org
Getting help 31
Essential
concepts
Getting started When working with the interactive interpreter, i. e., in a notebook, you
Procedural
programming can quickly get useful information about Python objects:
Object-orientation
Numerical
programming
Help system
NumPy package
help(len)
Array basics
Linear algebra
## Help on built-in function len in module builtins:
Data formats and
handling ##
Pandas package ## len(obj, /)
Series
## Return the number of items in a container.
DataFrame
Import/Export data
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Lexical structure 32
Essential
concepts
Getting started As with natural language, programming languages have a lexical struc-
Procedural
programming ture. Source code consists of the smallest possible, indivisible elements,
Object-orientation
the tokens. In Python you can find the following groups of elements:
Numerical
programming
NumPy package Literals
Array basics
Linear algebra
Variables
Data formats and
handling Operators
Pandas package
Series Delimiters
DataFrame
Import/Export data Keywords
Visual
illustrations Comments
Matplotlib package
Figures and subplots
Plot types and styles These terms give us a rock-solid foundation for exploring the heart of
Pandas layers
a programming language.
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Literals and variables 33
Essential
concepts
Getting started Basically, we distinguish between literals and variables:
Procedural
programming
Object-orientation Assigning variables with literals
Numerical
programming
myint = 7
NumPy package myfloat = 4.0
Array basics myboat = "nice"
Linear algebra
mybool = True
Data formats and
handling
myfloat = myboat
Pandas package
Series
DataFrame
Import/Export data In this course, we will work with four different literals: integer (7),
Visual float (4.0), string ("nice") and boolean (True),
illustrations
Matplotlib package Literals are assigned to variables at runtime,
Figures and subplots
Plot types and styles In Python the data type is derived from the literal and does not
Pandas layers
Applications
have to be described explicitly,
Time series
Moving window
It is allowed to assign values of different data types to the same
Financial applications variable (name) sequentially,
If we don’t assign a literal to any variables, we forfeit it.
© 2019 PyEcon.org
Operators and delimiters 34
Essential
concepts
Getting started Most operators and delimiters will be introduced to you during this
Procedural
programming course. Here is an overview of the operators:
Object-orientation
Numerical
programming
Overview of operators
NumPy package
## + - * / ** //
## % @ << >> & |
Array basics
Linear algebra
© 2019 PyEcon.org
Arithmetic operators 35
Essential
concepts
Getting started All regular arithmetic operations involving numbers are possible:
Procedural
programming
Object-orientation Pocket calculator
Numerical 10 + 5
programming
NumPy package
100 - 20
Array basics 8 / 2
Linear algebra 4 * (10 + 20)
Data formats and 2**3
handling
Pandas package ## 15
Series ## 80
DataFrame
## 4.0
Import/Export data
## 120
Visual
illustrations ## 8
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers The result of dividing two integers is a floating point number,
Applications
Time series
The conventional rules apply: Parentheses first, then multiplication
Moving window and division, etc.,
Financial applications
The operator ** is used for exponentiation.
© 2019 PyEcon.org
Boolean operators 36
Essential
concepts
Getting started In order to demonstrate the use of logical operators (and formatted
Procedural
programming strings and for-loops), we create a handy table summarizing some
Object-orientation
important results from boolean algebra:
Numerical
programming
NumPy package Logical table
Array basics
Linear algebra # Create table head
Data formats and print("a b a and b a or b not a\n"
handling "--------------------------------")
Pandas package
Series
DataFrame # Loop through the rows
Import/Export data for a in [False, True]:
Visual for b in [False, True]:
illustrations
print(f"{a:1} {b:3} {a and b:6} {a or b:8} {not a:7}")
Matplotlib package
Figures and subplots ## a b a and b a or b not a
Plot types and styles
Pandas layers
## --------------------------------
## 0 0 0 0 1
Applications
Time series
## 0 1 0 1 1
Moving window ## 1 0 0 1 0
Financial applications ## 1 1 1 1 0
© 2019 PyEcon.org
Keywords and comments 37
Essential
concepts
Getting started The programmer explains the structure of his/her program to the
Procedural
programming interpreter via a restricted set of short commands, the keywords:
Object-orientation
Visual
There are two ways to make comments:
illustrations
Matplotlib package Provide some comments
Figures and subplots
Plot types and styles # Set variable to something - or nothing?
Pandas layers something = None
Applications
Time series """
Moving window
Financial applications
I am a docstring!
A multiline string comment hybrid.
I will be useful for describing classes and methods.
"""
© 2019 PyEcon.org
Data types 38
Essential
concepts
Getting started Python offers the following basic data types, which we will use in this
Procedural
programming course:
Object-orientation
Applications
Each data type has its own methods, that is, functions that are appli-
Time series cable specifically to an object of this type.
Moving window
Financial applications You will gradually get to know new and more complex data types or
object classes.
© 2019 PyEcon.org
Lists 39
Essential
concepts
Getting started A list is an ordered array of objects, accessible via an index:
Procedural
programming
Object-orientation Listing tech companies
stocks = ["Google", "Amazon", "Facebook", "Apple"]
Numerical
programming
NumPy package stocks[1]
Array basics
stocks.append("Twitter")
stocks.insert(2, "Microsoft")
Linear algebra
© 2019 PyEcon.org
Tuples 40
Essential
concepts
Getting started Tuples are immutable sequences related to lists that cannot be extended,
Procedural
programming for example. The drawbacks in flexibility are compensated by the
Object-orientation
advantages in speed and memory usage:
Numerical
programming
NumPy package Selecting elements in sequences
Array basics
Linear algebra lottery = (1, 8, 9, 12, 24, 28)
Data formats and len(lottery)
handling lottery[1:3]
lottery[:4]
Pandas package
Series
DataFrame lottery[-1]
Import/Export data lottery[-2:]
Visual
illustrations ## (1, 8, 9, 12, 24, 28)
Matplotlib package ## 6
Figures and subplots
## (8, 9)
Plot types and styles
Pandas layers
## (1, 8, 9, 12)
Applications
## 28
Time series ## (24, 28)
Moving window
Financial applications
The same operations are also supported when using lists.
© 2019 PyEcon.org
Dictionaries 41
Essential
concepts
Getting started Dictionaries are associative collections of key-value pairs. The key must
Procedural
programming be immutable and unique:
Object-orientation
Numerical
programming
Internet slang dictionary
NumPy package
slang = {"imho": "in my humble opinion",
"lol": "laughing out loud",
Array basics
Linear algebra
Applications
Time series
Moving window
Financial applications
The constructor for dict() is { } with :,
The pairs are unordered, iterable sequences.
© 2019 PyEcon.org
Sets 42
Essential
concepts
Getting started A set is an unordered collection of objects without duplicates:
Procedural
programming
Object-orientation Set operations
x = {"o", "n", "y", "t"}
Numerical
programming
NumPy package y = {"p", "h", "o", "n"}
Array basics
x & y
x | y
Linear algebra
Applications
Time series
Defines its own operators that overload existing ones.
Moving window
Financial applications
Empty set via set(), because {} already creates dict().
© 2019 PyEcon.org
Comparison operators 43
Essential
concepts
Getting started The <, <=, >, >=, ==, != operators compare the values of two objects
Procedural
programming and return True or False.
Object-orientation
© 2019 PyEcon.org
Comparison operators 44
Essential
concepts
Getting started
Procedural Comparing examples
programming
Object-orientation x, y = 5, 8
Numerical print("x < y is", x < y)
programming
NumPy package
## x < y is True
Array basics
Linear algebra
print("x > y is", x > y)
Data formats and
handling
Pandas package ## x > y is False
Series
DataFrame print("x == y is", x == y)
Import/Export data
Visual ## x == y is False
illustrations
Matplotlib package
Figures and subplots
print("x != y is", x != y)
Plot types and styles
Pandas layers ## x != y is True
Applications
Time series print("This is", "Name" == "Name", "and not", "Name" == "name")
Moving window
Financial applications
## This is True and not False
Applications
2 < x and x < 10 # unchained expression
Time series
Moving window ## True
Financial applications
© 2019 PyEcon.org
Logical operators 46
Essential
concepts
Getting started There are three logical operators: not, and, or.
Procedural
programming
Object-orientation Op. Description
Numerical not x Returns True only if x is False
programming
NumPy package x and y Returns True only if x and y are True
Array basics
Linear algebra
x or y Returns True only if x or y or both are True
Data formats and
handling
Pandas package Logical operators examples
Series
DataFrame x, y = 5, 8
Import/Export data
Visual (x == 5) and (y == 9)
illustrations
## False
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers (x == 5) or (y == 8)
Applications
Time series ## True
Moving window
Financial applications not(x == 4) or (y == 9)
## True
© 2019 PyEcon.org
Exclusive or 47
Essential
concepts
Getting started In some situations, you need a logical operation that is True only when
Procedural
programming the operands differ (one is True, the other is False). This task can
Object-orientation
be solved by using the logical operators not, and, or or simply !=.
Numerical
programming
NumPy package Exclusive or
x, y = 5, 8
Array basics
Linear algebra
© 2019 PyEcon.org
Binary numbers 49
Essential
concepts
Getting started How to convert binary numbers to integers (the unknown keywords and
Procedural
programming language structures will be introduced soon):
Object-orientation
Numerical
programming
Binary to integer
NumPy package
def bintoint(binary):
Array basics
Linear algebra
binary = binary[::-1]
Data formats and
num = 0
handling for i in range(len(binary)):
Pandas package
num += int(binary[i]) * 2**i
return num
Series
DataFrame
Import/Export data
Visual bintoint("1101001")
illustrations
Matplotlib package ## 105
Figures and subplots
int("1101001", 2)
Plot types and styles
Pandas layers
# compare with built-in function
Applications
Time series
## 105
Moving window
Financial applications
© 2019 PyEcon.org
Binary numbers 50
Essential
concepts
Getting started How to convert integers to binary numbers:
Procedural
programming
Object-orientation
Integers to binary
Numerical def inttobin(num):
programming
NumPy package
binary = ""
Array basics if num != 0:
Linear algebra
while num >= 1:
Data formats and if num % 2 == 0:
handling
Pandas package
binary += "0"
Series num = num / 2
DataFrame else:
Import/Export data
binary += "1"
Visual
illustrations
num = (num - 1) / 2
Matplotlib package else:
Figures and subplots binary = "0"
Plot types and styles
return binary[::-1]
Pandas layers
inttobin(105)
Applications
## '1101001'
Time series
Moving window
Financial applications
bin(105)[2:] # compare with built-in function
## '1101001'
© 2019 PyEcon.org
Bitwise operators 51
Essential
concepts
Getting started Python offers distinct bitwise operators. Some of them will be redefined
Procedural
programming entirely different by extensions, such as, e. g., vectorization.
Object-orientation
Applications
## a: 101
Time series ## b: 111
Moving window ## c: 101
Financial applications
print(c)
## 5
© 2019 PyEcon.org
Bitwise operators 52
Essential
concepts
Getting started
Procedural Bitwise operators
programming
Object-orientation a, b = 5, 7
Numerical c = a | b # bitwise or
programming
NumPy package
## a: 101
Array basics ## b: 111
Linear algebra
## c: 111
Data formats and
handling print(c)
Pandas package
Series ## 7
DataFrame
Import/Export data
a = 13
Visual
b = a << 2 # bitwise shift
illustrations
Matplotlib package
## a: 1101
Figures and subplots ## b: 110100
Plot types and styles
a, b = 35, 37
Pandas layers
c = a ^ b # bitwise exclusive or
Applications
Time series ## a: 100011
Moving window ## b: 100101
Financial applications
## c: 000110
© 2019 PyEcon.org
Control flow: Conditional statements 53
Essential
concepts
Getting started Python has only one kind of conditional statement – if-elif-else:
Procedural
programming
Object-orientation Computer data sizes
bytes = 100000000 / 8 # e.g. DSL 100000
Numerical
programming
NumPy package if bytes >= 1e9:
Array basics
print(f"{bytes/1e9:6.2f} GByte")
elif bytes >= 1e6:
Linear algebra
© 2019 PyEcon.org
Control flow: continue and break 55
Essential
concepts
Getting started Loops can skip iterations (continue):
Procedural
programming
Object-orientation Continue the loop
Numerical
programming
for x in ["a", "b", "c"]:
NumPy package a = x.upper()
Array basics continue
Linear algebra
print(x)
Data formats and print(a)
handling
Pandas package
Series ## C
DataFrame
Import/Export data
Or a loop can be aborted instantly (break):
Visual
illustrations
Matplotlib package Breaking the habit
Figures and subplots
Plot types and styles y = 0
Pandas layers for i in [7, 3, 4, "x", 6, 15]:
Applications if not isinstance(i, int):
Time series break
Moving window
y += i
Financial applications
print(f"The total sum is {y}.")
Numerical
Have you already noticed the keyword else? Python only executes the
programming
NumPy package
branch if it was not terminated by break:
Array basics
Linear algebra Favorite lottery number
Data formats and
handling
import random
Pandas package n = 0
Series favorite = 7
DataFrame
Import/Export data
while n < 100:
n += 1
Visual
illustrations draw = random.randint(1, 49) # e.g. German lottery
Matplotlib package if draw == favorite:
Figures and subplots
print("Got my number! :)")
Plot types and styles
Pandas layers
break
Applications
else:
Time series print("My favorite did not show up! :(")
Moving window print(f"I tried {n} times!")
Financial applications
## Got my number! :)
## I tried 10 times!
© 2019 PyEcon.org
Functions 57
Essential
concepts
Getting started Functions are defined using the keyword def. The structure of function
Procedural
programming signature and body is specified by indentation, too:
Object-orientation
Numerical
programming
Drawing lottery numbers
NumPy package
def draw_sample(n, first=1, last=49):
numbers = list(range(first, last + 1))
Array basics
Linear algebra
© 2019 PyEcon.org
Functions 58
Essential
concepts
Getting started Functions are of type callable(), defined as closures, and can be
Procedural
programming created and used like other objects:
Object-orientation
## [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
© 2019 PyEcon.org
Seems weird? We discuss namespaces in the next section.
Section 1.3 59
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Essential concepts
Numerical
programming
NumPy package
Array basics
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Python is object-oriented 60
Essential
concepts
Getting started There are three widely known programming paradigms: procedural,
Procedural
programming functional and object-oriented programming (OOP). Python supports
Object-orientation
them all.
Numerical
programming
NumPy package You have learned how to handle predefined data types in Python.
Array basics
Linear algebra
Actually, we have already encountered classes and instances, take for
Data formats and
example dict().
handling
Pandas package In this section you will learn the basics of dealing with (your own)
Series
DataFrame
classes:
Import/Export data
1 References
Visual
illustrations 2 Classes
Matplotlib package
Figures and subplots 3 Instances
Plot types and styles
Pandas layers 4 Main principles
Applications
Time series 5 Garbage collection
Moving window
Financial applications OOP is a wide field and challenging for beginners. Don’t get discouraged
and, if you find deficits in yourself, read the literature.
© 2019 PyEcon.org
References 61
Essential
concepts
Getting started When you assign a variable, a reference to an object is set:
Procedural
programming
Object-orientation Equal but not identical
Numerical
programming a = ["Star", "Trek"]
NumPy package b = ["Star", "Trek"]
Array basics c = a
Linear algebra
a == b
Data formats and
handling
a == c
Pandas package a is b
Series a is c
DataFrame
Import/Export data ## ['Star', 'Trek']
Visual
## ['Star', 'Trek']
illustrations ## ['Star', 'Trek']
Matplotlib package
## True
Figures and subplots
Plot types and styles
## True
Pandas layers ## False
Applications ## True
Time series
Moving window
Financial applications
Two equal but not identical objects are created,
Variables a and c link to the same object.
© 2019 PyEcon.org
Copying objects 62
Essential
concepts
Getting started When we introduced lists, we initially did not mention that they are a
Procedural
programming first-class example of mutable objects:
Object-orientation
Applications
Referenced mutable objects might be modified,
Time series
Moving window
Referenced immutable objects might be copyied.
Financial applications
© 2019 PyEcon.org
Copying objects 64
Essential
concepts
Getting started We are able to make an exact copy of the object:
Procedural
programming
Object-orientation Copying
Numerical
programming def last_element(x):
NumPy package y = x.copy()
Array basics
return y.pop(-1)
Linear algebra
© 2019 PyEcon.org
Deep and shallow copying 65
Essential
concepts
Getting started However, keep in mind that, in most cases, a method copy() will
Procedural
programming create shallow copys while only deep copying will duplicate also the
Object-orientation
contents of a mutable object with a complex structure:
Numerical
programming
NumPy package Cloning fast food
Array basics
Linear algebra fastfood = [["burgers", "hot dogs"], ["pizza", "pasta"]]
Data formats and italian = fastfood.copy()
handling italian.pop(0)
american = list(fastfood)
Pandas package
Series
DataFrame american.pop(1)
Import/Export data american[0] = american[0].copy()
Visual fastfood[0][1] = "chicken wings"
illustrations
fastfood[1][0] = "risotto"
Matplotlib package
Figures and subplots
italian
Plot types and styles american
Pandas layers
## [['risotto', 'pasta']]
Applications
Time series
## [['burgers', 'hot dogs']]
Moving window
Financial applications
Both approaches, copy() and list(), create new list objects con-
taining new references to the original sub-lists. But for a deep copy,
© 2019 PyEcon.org
you have to recursively create duplicates of all its objects.
Classes 66
Essential
concepts
Getting started In Python everything is an object and more complex objects consist of
Procedural
programming several other objects.
Object-orientation
© 2019 PyEcon.org
Class definition 67
Essential
concepts
Getting started Specifically, we want to create “rectangle object” and define a separate
Procedural
programming Rectangle class for it:
Object-orientation
Numerical
programming
Rectangle class
NumPy package class Rectangle:
width = 0
Array basics
Linear algebra
height = 0
Data formats and
handling
Pandas package def area(self):
Series
return self.width * self.height
DataFrame
Import/Export data
Visual
myrectangle = Rectangle()
illustrations myrectangle.width = 10
Matplotlib package
myrectangle.height = 20
myrectangle.area()
Figures and subplots
Plot types and styles
Pandas layers
Applications
## 200
Time series
Moving window
Financial applications
New classes are defined using the keyword class,
The variable self always refers to the instance itself.
© 2019 PyEcon.org
Class constructor 68
Essential
concepts
Getting started We add a constructor (method) __init__(), that is called to initialize
Procedural
programming an object of Rectangle:
Object-orientation
Numerical
programming
Rectangle class with constructor
NumPy package class Rectangle:
width = 0
Array basics
Linear algebra
height = 0
Data formats and
handling
Pandas package def __init__(self, width, height):
Series
self.width = width
DataFrame
Import/Export data self.height = height
Visual
illustrations def area(self):
Matplotlib package
return self.width * self.height
myrectangle = Rectangle(15, 30)
Figures and subplots
Plot types and styles
Pandas layers myrectangle.area()
Applications
Time series ## 450
Moving window
Financial applications
In our example, we use the constructor to set the attributes. Methods
with names matching __fun__() have a special, standardized meaning
© 2019 PyEcon.org
in Python.
Class inheritance 69
Essential
concepts
Getting started One of the most important concepts of OOP is inheritance. A class
Procedural
programming inherits all attributes and methods of its parent class and can add new
Object-orientation
or overwrite existing ones:
Numerical
programming
NumPy package Square inherits Rectangle
Array basics
Linear algebra class Square(Rectangle):
Data formats and def __init__(self, length):
handling super().__init__(length, length)
Pandas package
Series
DataFrame def diagonal(self):
Import/Export data return (self.width**2 + self.height**2)**0.5
Visual mysquare = Square(15)
illustrations
Matplotlib package print(f"Area: {mysquare.area()}")
Figures and subplots
print(f"Diagonal length: {mysquare.diagonal():7.4f}")
Plot types and styles
Pandas layers ## Area: 225
Applications ## Diagonal length: 21.2132
Time series
Moving window
Financial applications The methods of the parent class, including the constructor, may be
referenced by super().
© 2019 PyEcon.org
Garbage collection 70
Essential
concepts
Getting started You do not have to worry about memory management in Python. The
Procedural
programming garbage collector will tidy up for you.
Object-orientation
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Namespaces 72
Essential
concepts
Getting started Reference names from the local namespace mask the same names in
Procedural
programming an outer or in the global namespace:
Object-orientation
Numerical
programming
Namespaces
NumPy package
def multiplier(x):
x = 4 * x
Array basics
Linear algebra
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Namespaces 73
Essential
concepts
Getting started In fact, functions defined in Python are themselves objects that remem-
Procedural
programming ber and can access their own context where they were created. This
Object-orientation
concept comes from functional programming and is called closure:
Numerical
programming
NumPy package Closures
Array basics
Linear algebra
def gen_multiplier(a):
Data formats and
def fun(x):
handling return a * x
Pandas package
return fun
Series
DataFrame
Import/Export data multi1 = gen_multiplier(4)
Visual multi2 = gen_multiplier(5)
illustrations multi1
Matplotlib package
Figures and subplots
multi1("EH")
Plot types and styles multi2("EH")
Pandas layers
## <function gen_multiplier.<locals>.fun at 0x7fe838606f28>
Applications ## EHEHEHEH
## EHEHEHEHEH
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Managing code 74
Essential
concepts
Getting started In order to provide, maintain and extend modular functionality with
Procedural
programming Python, its code containing components can be described hierarchically:
Object-orientation
Numerical
programming Packages
NumPy package
Array basics
Linear algebra Modules
Data formats and
handling
Pandas package
Classes
Series
DataFrame Functions
Import/Export data
Visual dt.date.today()
illustrations
Matplotlib package
dt.timedelta.days
Figures and subplots
Plot types and styles date.today()
timedelta.days
Pandas layers
Applications
Time series
Moving window
datetime.now()
Financial applications
In the latter case, all classes and functions, but no instances, are
imported from the datetime namespace.
© 2019 PyEcon.org
Build-in modules 76
Essential
concepts
Getting started A Python installation ships with a standard library consisting of built-
Procedural
programming in modules. These modules provide standardized solutions for many
Object-orientation
problems that occur in everyday programming - “batteries included”.
Numerical
programming For example, they provide access to system functionality such as file
NumPy package
Array basics
management. The Python Docs give an overview of all build-in modules.
Linear algebra
Applications ## 120
Time series
Moving window
Financial applications
randint(10, 20)
## 18
© 2019 PyEcon.org
Installing modules 77
Essential
concepts
Getting started Often you might want to use extended functionality. Python has a large
Procedural
programming and active community of users who make their developments publicly
Object-orientation
available under open source license terms. Packages are containers of
Numerical
programming modules which can be imported and used within your Python code.
NumPy package
Array basics These third-party packages can be installed comfortably by using the
Linear algebra
(command line) package manager pip. The Python Package Index
Data formats and
handling provides an overview of the thousands of packages available. Basic
Pandas package
Series
commands for maintaining, for example, the installation of the package
DataFrame “numpy”:
Import/Export data
Applications
Time series
Uninstalling the package: pip uninstall numpy
Moving window
Financial applications
© 2019 PyEcon.org
Installing modules 78
Essential
concepts
Getting started Example: OpenCV is a package for image processing in Python. Here
Procedural
programming you can see how the installation proceeds in a Unix terminal.
Object-orientation
Numerical
programming
NumPy package
Array basics
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Writing modules 79
Essential
concepts
Getting started Your Python projects will become complex and you will need to main-
Procedural
programming tain the codes properly. Therefore, one can break a large, unwieldy
Object-orientation
programming task into separate, more manageable modules. Modules
Numerical
programming can be written in Python itself or in C, but here we keep focussing on
NumPy package
Array basics
the Python language.
Linear algebra
Creating modules in Python is very straightforward - a Python module
Data formats and
handling is a file containing Python code, for example:
Pandas package
Series
DataFrame s = "Hello world!"
Import/Export data
l = [1, 2, 3, 5, 5]
Visual
illustrations
Matplotlib package
Figures and subplots
def add_one(n):
Plot types and styles return n + 1
Pandas layers
Applications
Time series File: mymodule.py
Moving window
Financial applications
© 2019 PyEcon.org
Working with modules 80
Essential
concepts
Getting started If you import the module mymodule, the interpreter looks in the
Procedural
programming current working directory for a file mymodule.py, reads and interprets
Object-orientation
its contents and makes its namespace available:
Numerical
programming
NumPy package Usage of own modules
Array basics
Linear algebra import mymodule
Data formats and mymodule.s
handling mymodule.l
Pandas package
mymodule.add_one(5)
Series
DataFrame
## Hello world!
## [1, 2, 3, 5, 5]
Import/Export data
Visual
illustrations
## 6
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Python packages 81
Essential
concepts
Getting started Large projects could require more than one module. Packages allow
Procedural
programming to structure the modules and their namespaces hierarchically by using
Object-orientation
the dot notation. They are simple folders containing modules and
Numerical
programming (sub-)packages. Consider the following structure:
NumPy package
Array basics
Linear algebra
Visual
illustrations The directory mypackage contains two modules which we can import
Matplotlib package
Figures and subplots separately:
Plot types and styles
Pandas layers
Usage of own package
Applications
Time series import mypackage.mymodule
Moving window import mypackage.somemodule
Financial applications
mypackage.mymodule.add_one(4)
## 5
© 2019 PyEcon.org
Package initialization 82
Essential
concepts
Getting started If a package directory contains a file __init__.py, its code is invoked
Procedural
programming when the package gets imported. The directory mypackage, now,
Object-orientation
contains the two modules and the initialization file:
Numerical
programming
NumPy package
Array basics
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots The file __init__.py can be empty but can also be used for package
Plot types and styles
Pandas layers
initialization purposes.
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
The Zen of Python 83
Essential
concepts
Getting started
Procedural The Zen of Python
programming
Object-orientation import this
Numerical
programming
## The Zen of Python, by Tim Peters
NumPy package ##
Array basics ##
Linear algebra
## Beautiful is better than ugly.
Data formats and
handling
## Explicit is better than implicit.
Pandas package
## Simple is better than complex.
Series ## Complex is better than complicated.
DataFrame
## Flat is better than nested.
## Sparse is better than dense.
Import/Export data
Visual
illustrations
## Readability counts.
Matplotlib package ## Special cases aren't special enough to break the rules.
Figures and subplots ## Although practicality beats purity.
Plot types and styles
Pandas layers
## Errors should never pass silently.
## Unless explicitly silenced.
Applications
Time series
## In the face of ambiguity, refuse the temptation to guess.
Moving window ## ...
Financial applications
© 2019 PyEcon.org
Further topics 84
Essential
concepts
Getting started A selection of exciting topics that are among the advanced basics but
Procedural
programming are not covered in this lecture:
Object-orientation
Numerical
programming
Dynamic language concepts, such as duck typing,
NumPy package
Array basics
Further, complex type classes, such as ChainMap or OrderedDict,
Linear algebra
Iterators and generators in detail,
Data formats and
handling
Pandas package
Exception handling, raising exceptions, catching errors,
Series
DataFrame
Debugging, introspection and annotations.
Import/Export data
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Chapter 2 85
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Numerical programming
Numerical
programming
NumPy package
Array basics
2.1 NumPy package
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Section 2.1 86
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Numerical programming
Numerical
programming
NumPy package
Array basics
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
The NumPy package 87
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Numerical
programming
NumPy package
Array basics
Linear algebra
© 2019 PyEcon.org
Motivation 88
Essential
concepts
Getting started
Procedural Element-wise addition
programming
Object-orientation vec1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
Numerical vec2 = np.array(vec1)
programming vec1 + vec1
NumPy package
Array basics
Linear algebra
## [1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Data formats and
handling
vec2 + vec2
Pandas package
Series ## array([ 2, 4, 6, 8, 10, 12, 14, 16, 18])
DataFrame
Import/Export data
for i in range(len(vec1)):
Visual vec1[i] += vec1[i]
illustrations
Matplotlib package
vec1
Figures and subplots
Plot types and styles ## [2, 4, 6, 8, 10, 12, 14, 16, 18]
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Motivation 89
Essential
concepts
Getting started
Procedural Matrix multiplication
programming
Object-orientation mat1 = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
Numerical mat2 = np.array(mat1)
programming
np.dot(mat2, mat2)
NumPy package
Array basics
Linear algebra ## array([[ 30, 36, 42],
Data formats and
## [ 66, 81, 96],
handling ## [102, 126, 150]])
Pandas package
Series
mat3 = np.zeros([3, 3])
DataFrame
Import/Export data for i in range(3):
Visual
for k in range(3):
illustrations for j in range(3):
Matplotlib package
mat3[i][k] = mat3[i][k] + mat1[i][j] * mat1[j][k]
mat3
Figures and subplots
Plot types and styles
Pandas layers
Applications
## array([[ 30., 36., 42.],
Time series ## [ 66., 81., 96.],
Moving window ## [102., 126., 150.]])
Financial applications
© 2019 PyEcon.org
Motivation 90
Essential
concepts
Getting started
Procedural Time comparison
programming
Object-orientation import time
Numerical mat1 = np.random.rand(50, 50)
programming mat2 = np.array(mat1)
t = time.time()
NumPy package
Array basics
Linear algebra mat3 = np.dot(mat2, mat2)
Data formats and nptime = time.time() - t
handling mat3 = np.zeros([50, 50])
Pandas package
Series
t = time.time()
DataFrame for i in range(50):
Import/Export data for k in range(50):
Visual for j in range(50):
illustrations
mat3[i][k] = mat3[i][k] + mat1[i][j] * mat1[j][k]
pytime = time.time() - t
Matplotlib package
Figures and subplots
Plot types and styles times = str(pytime / nptime)
Pandas layers print("NumPy is " + times + " times faster!")
Applications
Time series ## NumPy is 17.29180230837526 times faster!
Moving window
Financial applications
© 2019 PyEcon.org
Section 2.2 91
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Numerical programming
Numerical
programming
NumPy package
Array basics
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Creating NumPy arrays 92
Essential
concepts
Getting started np.array(list): Converts python list into NumPy arrays.
Procedural
programming array.ndim: Returns Dimension of the array.
Object-orientation
array.shape: Returns shape of the array as a list.
Numerical
programming
NumPy package Creation
Array basics
Linear algebra arr1 = [4, 8, 2]
Data formats and
arr1 = np.array(arr1)
handling arr2 = np.array([24.3, 0., 8.9, 4.4, 1.65, 45])
Pandas package
arr3 = np.array([[4, 8, 5], [9, 3, 4], [1, 0, 6]])
arr1.ndim
Series
DataFrame
Import/Export data
Visual
## 1
illustrations
Matplotlib package arr3.shape
Figures and subplots
Plot types and styles
Pandas layers
## (3, 3)
Applications
Time series
Moving window From now on, the name array refers to an np.array().
Financial applications
© 2019 PyEcon.org
Array creation functions 93
Essential
concepts
Getting started np.arange(start, stop, step): Creates vector of values from start
Procedural
programming to stop with step width step.
Object-orientation
np.zeros((rows, columns)): Creates array with all values set to 0.
Numerical
programming np.identity(n): Creates identity matrix of dimension n.
NumPy package
Array basics
Linear algebra
Creation functions
Data formats and np.zeros((4, 3))
handling
Pandas package
Series
## array([[0., 0., 0.],
DataFrame ## [0., 0., 0.],
Import/Export data ## [0., 0., 0.],
Visual ## [0., 0., 0.]])
illustrations
Matplotlib package
np.arange(6)
Figures and subplots
Plot types and styles
Pandas layers ## array([0, 1, 2, 3, 4, 5])
Applications
Time series
np.identity(3)
Moving window
Financial applications ## array([[1., 0., 0.],
## [0., 1., 0.],
## [0., 0., 1.]])
© 2019 PyEcon.org
Array creation functions 94
Essential
concepts
Getting started np.linspace(start, stop, n): Creates vector of n evenly divided
Procedural
programming values from start to stop.
Object-orientation
np.full((row, column), k): Creates array with all values set to k.
Numerical
programming
NumPy package Array creation
Array basics
Linear algebra np.linspace(0, 80, 5)
Data formats and
handling ## array([ 0., 20., 40., 60., 80.])
Pandas package
Series
DataFrame
np.full((5, 4), 7)
Import/Export data
## array([[7, 7, 7, 7],
Visual
illustrations ## [7, 7, 7, 7],
Matplotlib package ## [7, 7, 7, 7],
Figures and subplots
## [7, 7, 7, 7],
Plot types and styles
Pandas layers ## [7, 7, 7, 7]])
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Array creation functions 95
Essential
concepts
Getting started np.random.rand(rows, columns): Creates array of random floats
Procedural
programming between zero and one.
Object-orientation
np.rondom.randint(k, size=(rows, columns)): Creates array of
Numerical
programming random integers between 0 and k-1.
NumPy package
Array basics
Linear algebra
Array of random numbers
Data formats and np.random.rand(3, 3)
handling
Pandas package
Series
## array([[0.01014591, 0.55955228, 0.48103055],
DataFrame ## [0.30368877, 0.99078572, 0.61537046],
Import/Export data
## [0.83572553, 0.45976471, 0.63241975]])
Visual
illustrations
np.random.randint(10, size=(5, 4))
Matplotlib package
Figures and subplots
Plot types and styles ## array([[7, 9, 7, 8],
Pandas layers ## [0, 6, 7, 5],
Applications ## [7, 3, 4, 7],
Time series ## [9, 4, 4, 8],
Moving window
## [8, 0, 6, 1]])
Financial applications
© 2019 PyEcon.org
Copy arrays 96
Essential
concepts
Getting started
Procedural Reference
programming
Object-orientation arr3
Numerical
programming ## array([[4, 8, 5],
NumPy package ## [9, 3, 4],
Array basics
## [1, 0, 6]])
Linear algebra
Applications
Time series
Moving window
call-by-reference
Financial applications
arr = arr3 binds arr to the existing arr3. They both refer to the
same object.
© 2019 PyEcon.org
Copy array 97
Essential
concepts
Getting started array.copy(): Copies an array without reference (call-by-value).
Procedural
programming
Object-orientation
© 2019 PyEcon.org
Overview: Array creation functions 98
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Function Description
Numerical
programming array Convert input array in NumPy array
NumPy package
Array basics arange(start,stop,step) Creates array from given input
Linear algebra
ones Creates array containing only ones
Data formats and
handling zeros Creates array containing only zeros
Pandas package
Series
empty Allocating memory without specific values
DataFrame eye, identity Creates N x N identity matrix
Import/Export data
Visual
linspace Creats array of evenly divided values
illustrations full Creates array with values set to one number
Matplotlib package
Figures and subplots random.rand Creates array of random floats
Plot types and styles
Pandas layers
random.randint Creates array of random int
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Data types of arrays 99
Essential
concepts
Getting started array.dtype: Returns the type of array.
Procedural
programming array.astype(np.type): Conducts a manual typecast.
Object-orientation
Numerical
programming
Data types
NumPy package arr1.dtype
Array basics
Linear algebra
## dtype('int64')
Data formats and
handling
Pandas package arr2.dtype
Series
DataFrame ## dtype('float64')
Import/Export data
© 2019 PyEcon.org
Array operations 100
Essential
concepts
Getting started
Procedural
Element-wise operations
programming
Object-orientation
Calculation operators on NumPy arrays operate element-wise.
Numerical
programming
NumPy package
Array basics
Element-wise operations
Linear algebra
arr3
Data formats and
handling
## array([[4, 8, 5],
Pandas package
Series
## [9, 3, 4],
DataFrame ## [1, 0, 6]])
Import/Export data
© 2019 PyEcon.org
Integer indexing 102
Essential
concepts
Getting started array[index]: Selects the value at position index from the data.
Procedural
programming
Object-orientation Indexing with an integer
Numerical
programming arr = np.arange(10)
NumPy package arr
Array basics
Linear algebra
## array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Data formats and
handling
Pandas package
arr[4]
Series
DataFrame ## 4
Import/Export data
Visual arr[-1]
illustrations
Matplotlib package
## 9
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Slicing 103
Essential
concepts
Getting started array[start : stop : step]: Selects a subset of the data.
Procedural
programming
Object-orientation Slicing in one dimension
Numerical
programming arr = np.arange(10)
NumPy package arr
Array basics
Linear algebra
## array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Data formats and
handling
Pandas package
arr[3:7]
Series
DataFrame ## array([3, 4, 5, 6])
Import/Export data
Visual arr[1:]
illustrations
Matplotlib package
## array([1, 2, 3, 4, 5, 6, 7, 8, 9])
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Slicing 104
Essential
concepts
Getting started
Procedural Slicing in one dimension with steps
programming
Object-orientation arr[:7]
Numerical
programming ## array([0, 1, 2, 3, 4, 5, 6])
NumPy package
arr[-3:]
Array basics
Linear algebra
arr[::2]
illustrations
Matplotlib package
Figures and subplots
Plot types and styles ## array([0, 2, 4, 6, 8])
Pandas layers
Applications arr[:5:-1]
Time series
Moving window ## array([9, 8, 7, 6])
Financial applications
© 2019 PyEcon.org
Slicing 105
Essential
concepts
Getting started
Procedural Slicing in higher dimensions
programming
Object-orientation
In n-dimensional arrays the element at each index is an
Numerical
programming (n − 1)-dimensional array.
NumPy package
Array basics
Linear algebra Indexing rows
Data formats and
handling arr3
Pandas package
Series ## array([[4, 8, 5],
## [9, 3, 4],
DataFrame
Import/Export data
## [1, 0, 6]])
Visual
illustrations
Matplotlib package vec = arr3[1]
Figures and subplots vec
Plot types and styles
## array([9, 3, 4])
Pandas layers
Applications
Time series
Moving window
arr3[-1]
Financial applications
## array([1, 0, 6])
© 2019 PyEcon.org
Slicing 106
Essential
concepts
Getting started
Procedural Slicing in two dimensions
programming
Object-orientation arr3
Numerical
programming ## array([[4, 8, 5],
NumPy package
## [9, 3, 4],
Array basics
Linear algebra
## [1, 0, 6]])
Data formats and
handling arr3[0:2, 0:2]
Pandas package
Series ## array([[4, 8],
DataFrame
## [9, 3]])
Import/Export data
Visual
illustrations
arr3[2:, :]
Matplotlib package
Figures and subplots ## array([[1, 0, 6]])
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Slicing 107
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Numerical
programming
NumPy package
Array basics
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Views on arrays 108
Essential
concepts
Getting started So far, selecting by index numbers or slicing belongs to basic indexing
Procedural
programming in NumPy. With basic indexing you get NO COPY of your data but a
Object-orientation
so-called view on the existing data set – a different perspective.
Numerical
programming A view on an array can be seen as a reference to a rectangular memory
NumPy package
Array basics
area of its values. The view is intended to
Linear algebra
edit a rectangular part of a matrix, e. g., a sub-matrix, a column,
Data formats and
handling or a single value,
Pandas package
Series change the shape of the matrix or the arrangement of its elements,
DataFrame
Import/Export data
e. g., transpose or reshape a matrix,
Visual
illustrations
change the visual representation of values, e. g., to cast a float
Matplotlib package array into an int array,
Figures and subplots
Plot types and styles map the values in other program areas.
Pandas layers
Applications The crucial point here is that for efficiency reasons data arrays in your
Time series
working memory do not have to be copied again and again for simple
Moving window
Financial applications index operations, which would require an excessive additional effort
writing to the computer memory.
© 2019 PyEcon.org
Creating views implicitly 109
Essential
concepts
Getting started A view is created automatically when you do basic indexing such as
Procedural
programming slicing:
Object-orientation
Numerical
programming
Create a view by slicing
NumPy package
column = arr3[:, 1]
column
Array basics
Linear algebra
© 2019 PyEcon.org
Creating views implicitly 110
Essential
concepts
Getting started
Procedural Create a view by slicing
programming
Object-orientation elem = column[1:2]
Numerical elem.base
programming
NumPy package
Array basics
## array([[ 4, 8, 5],
Linear algebra ## [ 9, 100, 4],
Data formats and
## [ 1, 0, 6]])
handling
Pandas package elem[0] = 3
Series
arr3
DataFrame
Import/Export data
## array([[4, 8, 5],
Visual
illustrations ## [9, 3, 4],
Matplotlib package ## [1, 0, 6]])
Figures and subplots
Plot types and styles
Pandas layers
Applications The middle column is a view of the base array referenced by arr3,
Time series
Moving window Any changes to the values of a view directly affect the base data,
Financial applications
A view of a view is another view on the same base matrix.
© 2019 PyEcon.org
Obtaining views explicitly 111
Essential
concepts
Getting started In addition, an array contains methods and attributes that return a
Procedural
programming view of its data:
Object-orientation
## False
© 2019 PyEcon.org
Obtaining views explicitly 112
Essential
concepts
Getting started
Procedural Obtain a view
programming
Object-orientation arr3_v = arr3.view()
Numerical arr3_v.flags.owndata
programming
NumPy package ## False
Array basics
Linear algebra
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Fancy indexing 113
Essential
concepts
Getting started The behavior described above changes with advanced indexing, i. e., if
Procedural
programming at least one component of the index tuple is not a scalar index number
Object-orientation
or slice. The case of fancy indexing is described below:
Numerical
programming
NumPy package Advanced and basic indexing
Array basics
Linear algebra arr3
Data formats and
handling ## array([[4, 8, 5],
Pandas package ## [9, 3, 4],
Series
DataFrame
## [1, 0, 6]])
Import/Export data
Visual
arr = arr3[[0, 2], [0, 2]]
illustrations arr
Matplotlib package
Figures and subplots
## array([4, 6])
Plot types and styles
Pandas layers
arr.base
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Fancy indexing 114
Essential
concepts
Getting started
Procedural
Advanced and basic indexing
programming
Object-orientation arr = arr3[0:3:2, 0:3:2]
Numerical arr
programming
NumPy package
## array([[4, 5],
Array basics
Linear algebra
## [1, 6]])
Data formats and
handling arr.base
Pandas package
Series ## array([[4, 8, 5],
DataFrame
Import/Export data
## [9, 3, 4],
## [1, 0, 6]])
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Contrary to intuition, fancy indexing does not return a (2 × 2)-
Pandas layers
matrix, but a vector of the matrix elements (0, 0) and (2, 2). This
Applications
Time series
is a complete copy – a new object and not a view to the original
Moving window matrix.
Financial applications
Applications
Time series
Moving window
Boolean arrays
Financial applications
Logical operations on NumPy arrays work in a similar way compared
to bitwise operators.
© 2019 PyEcon.org
Indexing with boolean arrays 117
Essential
concepts
Getting started Boolean arrays can be used to select elements of other NumPy arrays.
Procedural
programming If x is an array and y is a boolean array of the same dimension, then
Object-orientation
a[b] selects all the elements of x, for which the correspanding value (at
Numerical
programming the same position) of y is True.
NumPy package
Array basics
Linear algebra
Indexing with boolean arrays
Data formats and arr3
handling
## array([[4, 8, 5],
Pandas package
Series
DataFrame ## [9, 3, 4],
Import/Export data ## [1, 0, 6]])
Visual
illustrations y = arr3 % 2 == 0
y
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers ## array([[ True, True, False],
Applications ## [False, False, True],
Time series ## [False, True, True]])
Moving window
Financial applications
arr3[y]
## array([4, 8, 4, 0, 6])
© 2019 PyEcon.org
Conditional indexing 118
Essential
concepts
Getting started Conditional indexing allows you using boolean arrays to select subsets
Procedural
programming of values and to avoid loops. Applying comparison operator on arrays,
Object-orientation
every element of the array is tested, if it corresponds to the logical
Numerical
programming condition. Consider an application setting all even numbers to 5:
NumPy package
Array basics
Linear algebra
Find and replace values in arrays
Data formats and a, b = arr3.copy(), arr3.copy()
handling
for i in range(a.shape[0]):
Pandas package
Series
for j in range(a.shape[1]):
DataFrame if a[i, j] % 2 == 0:
Import/Export data a[i, j] = 5
Visual
illustrations
Matplotlib package
b[b % 2 == 0] = 5
Figures and subplots b
Plot types and styles
Pandas layers ## array([[5, 5, 5],
Applications ## [9, 3, 5],
Time series
## [1, 5, 5]])
Moving window
Financial applications
np.allclose(a, b)
## True
© 2019 PyEcon.org
Conditional indexing 119
Essential
concepts
Getting started
Procedural Find and replace values in arrays, condition: equal
programming
Object-orientation arr3
Numerical
programming ## array([[4, 8, 5],
NumPy package
## [9, 3, 4],
Array basics
Linear algebra
## [1, 0, 6]])
Data formats and
handling arr = arr3.copy()
Pandas package arr[arr == 4] = 100
Series
arr
DataFrame
Import/Export data
## array([[100, 8, 5],
Visual
illustrations ## [ 9, 3, 100],
Matplotlib package ## [ 1, 0, 6]])
Figures and subplots
Plot types and styles
Pandas layers
Keep in mind that, in this case only, the results are not arrays but
© 2019 PyEcon.org values!
Best practice: Indexing arrays 121
Essential
concepts
Getting started Step 1b
Procedural
programming Integer indexing array[row index]: In n-dimensional arrays, the ele-
Object-orientation
ment at each index is an (n − 1)-dimensional array.
Numerical
programming
NumPy package Best practice Step 1b
Array basics
Linear algebra mat = np.arange(12).reshape((3, 4))
Data formats and mat
handling
Pandas package
## array([[ 0, 1, 2, 3],
Series
DataFrame
## [ 4, 5, 6, 7],
Import/Export data ## [ 8, 9, 10, 11]])
Visual
illustrations mat[2]
Matplotlib package
Figures and subplots
Plot types and styles
## array([ 8, 9, 10, 11])
Pandas layers
mat[0]
Applications
Time series
Moving window ## array([0, 1, 2, 3])
Financial applications
By specifying the row index only, we create arrays which are views.
© 2019 PyEcon.org
Best practice: Indexing arrays 122
Essential
concepts
Getting started Step 2a
Procedural
programming Slicing array[start : stop : step]: Slicing can be used separately
Object-orientation
for rows and columns.
Numerical
programming
NumPy package
Best practice Step 2a
Array basics
Linear algebra
mat = np.arange(12).reshape((3, 4))
Data formats and
mat
handling
Pandas package ## array([[ 0, 1, 2, 3],
Series
## [ 4, 5, 6, 7],
DataFrame
Import/Export data
## [ 8, 9, 10, 11]])
Visual
illustrations mat[0:2]
Matplotlib package
Figures and subplots ## array([[0, 1, 2, 3],
Plot types and styles
## [4, 5, 6, 7]])
Pandas layers
Applications
mat[0:2, ::2]
Time series
Moving window
Financial applications
## array([[0, 2],
## [4, 6]])
© 2019 PyEcon.org
Best practice: Indexing arrays 123
Essential
concepts
Getting started Step 2b
Procedural
programming A frequent task is to get a specific row or column of an array. This can
Object-orientation
be done easily by slicing.
Numerical
programming
NumPy package Best practice Step 2b
Array basics
Linear algebra mat
Data formats and
handling ## array([[ 0, 1, 2, 3],
Pandas package ## [ 4, 5, 6, 7],
Series
DataFrame
## [ 8, 9, 10, 11]])
Import/Export data
row = mat[1] # get second row
Visual
illustrations column = mat[:, 2] # get third column
Matplotlib package row
Figures and subplots
## array([4, 5, 6, 7])
Plot types and styles
Pandas layers
Applications
Time series
column
Moving window
Financial applications ## array([ 2, 6, 10])
Slicing with [:] means to take every element from the first to the last.
© 2019 PyEcon.org
Best practice: Indexing arrays 124
Essential
concepts
Getting started Step 3
Procedural
programming Fancy indexing array[rows list, columns list]: Return a one di-
Object-orientation
mensional array with the values at the index tuples specified elementwise
Numerical
programming by the index lists.
NumPy package
Array basics
Linear algebra
Best practice Step 3
Data formats and mat = np.arange(12).reshape((3, 4))
handling
Pandas package
mat
Series
DataFrame ## array([[ 0, 1, 2, 3],
Import/Export data
## [ 4, 5, 6, 7],
Visual ## [ 8, 9, 10, 11]])
illustrations
Applications
## [111, 111, 111, 111],
Time series
## [111, 111, 111, 111]])
Moving window
Financial applications
© 2019 PyEcon.org
Best practice: Indexing arrays 126
Essential
concepts
Getting started Step 5
Procedural
programming Replacing values in arrays. Assigning a slice of an array to new values,
Object-orientation
the shape of slice must be considered.
Numerical
programming
NumPy package Best practice Step 5
Array basics
Linear algebra mat[0] = np.array([3, 2, 1]) # Fails because the shapes do not fit
Data formats and
handling
## Error: could not broadcast array from shape (3) into shape (4)
Pandas package
Series
mat[2, 3] = 100
DataFrame mat[:, 0] = np.array([3, 3, 3])
Import/Export data mat
Visual
illustrations ## array([[ 3, 111, 111, 111],
Matplotlib package
## [ 3, 111, 111, 111],
Figures and subplots
Plot types and styles ## [ 3, 111, 111, 100]])
Pandas layers
© 2019 PyEcon.org
Adding and removing elements of arrays 128
Essential
concepts
Getting started np.append(array, value): Appends value to the end of array.
Procedural
programming np.insert(array, index, value): Inserts values before index.
Object-orientation
np.delete(array, index, axis): Deletes row or column on index.
Numerical
programming
NumPy package Naming
Array basics
Linear algebra a = np.arange(5)
Data formats and a = np.append(a, 8)
handling a = np.insert(a, 3, 77)
Pandas package
Series
print(a)
DataFrame
Import/Export data ## [ 0 1 2 77 3 4 8]
Visual
illustrations a.resize((3, 3))
Matplotlib package
np.delete(a, 1, axis=0)
Figures and subplots
Plot types and styles
Pandas layers ## array([[0, 1, 2],
Applications
## [8, 0, 0]])
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Combining and splitting 129
Essential
concepts
Getting started np.concatenate((arr1, arr2), axis): Joins a sequence of arrays
Procedural
programming along an existing axis.
Object-orientation
np.split(array, n): Splits an array into multiple sub-arrays.
Numerical
programming np.hsplit(array, n): Splits an array into multiple sub-arrays hori-
NumPy package
Array basics
zontally.
Linear algebra
Visual
illustrations
## [ 8, 0, 0],
Matplotlib package ## [ 0, 1, 2],
Figures and subplots ## [ 3, 4, 5]])
Plot types and styles
np.split(np.arange(8), 4)
Pandas layers
Applications
Time series
Moving window
## [array([0, 1]), array([2, 3]), array([4, 5]), array([6, 7])]
Financial applications
© 2019 PyEcon.org
Transposing array 130
Essential
concepts
Getting started array.T: Returns the transposed array (as a view).
Procedural
programming
Object-orientation Transpose
Numerical
programming arr3
NumPy package
Array basics ## array([[4, 8, 5],
Linear algebra
## [9, 3, 4],
Data formats and ## [1, 0, 6]])
handling
Pandas package
Series arr3.T
DataFrame
Import/Export data ## array([[4, 9, 1],
Visual ## [8, 3, 0],
illustrations
## [5, 4, 6]])
Matplotlib package
Figures and subplots
Plot types and styles np.eye(3).T
Pandas layers
© 2019 PyEcon.org
Matrix multiplication 131
Essential
concepts
Getting started np.dot(arr1, arr2): Conducts a matrix multiplication of arr1 and
Procedural
programming arr2. The @ operator can be used instead of the np.dot() function.
Object-orientation
Numerical
programming
Matrix multiplication
NumPy package
res = np.dot(arr3, np.arange(18).reshape((3, 6)))
Array basics
Linear algebra
res
Data formats and
handling ## array([[108, 125, 142, 159, 176, 193],
Pandas package ## [ 66, 82, 98, 114, 130, 146],
Series
## [ 72, 79, 86, 93, 100, 107]])
DataFrame
Import/Export data
res2 = arr3 @ np.arange(18).reshape((3, 6))
Visual
illustrations res2
Matplotlib package
Figures and subplots ## array([[108, 125, 142, 159, 176, 193],
Plot types and styles
## [ 66, 82, 98, 114, 130, 146],
Pandas layers
## [ 72, 79, 86, 93, 100, 107]])
Applications
np.allclose(res, res2)
Time series
Moving window
Financial applications
## True
© 2019 PyEcon.org
Array functions 132
Essential
concepts
Getting started
Procedural Element-wise functions
programming
Object-orientation arr3
Numerical
programming ## array([[4, 8, 5],
NumPy package ## [9, 3, 4],
Array basics
## [1, 0, 6]])
Linear algebra
© 2019 PyEcon.org
Overview: Element-wise array functions 133
Essential
concepts
Getting started
Procedural
programming Function Description
Object-orientation
abs Absolute value of integer and floating point
Numerical
programming sqrt Sqare root
NumPy package
Array basics exp Exponential function
Linear algebra
log, log10, log2 Natural logarithm, log base 10, log base 2
Data formats and
handling sign Sign (1 : positiv, 0: zero, -1 : negative)
Pandas package
Series
ceil Rounding up to integer
DataFrame floor Round down to integer
Import/Export data
Visual
rint Round to nearest integer
illustrations
Matplotlib package
modf Returns fractional parts
Figures and subplots sin, cos, tan, sinh, cosh, tanh, arcsin, ...
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Binary functions 134
Essential
concepts
Getting started
Procedural Binary
programming
Object-orientation x = np.array([3, -6, 8, 4, 3, 5])
Numerical y = np.array([3, 5, 7, 3, 5, 9])
programming
np.maximum(x, y)
NumPy package
Array basics
Linear algebra ## array([3, 5, 8, 4, 5, 9])
Data formats and
handling np.greater_equal(x, y)
Pandas package
Series ## array([ True, False, True, True, False, False])
DataFrame
np.add(x, y)
Import/Export data
Visual
illustrations
Matplotlib package
## array([ 6, -1, 15, 7, 8, 14])
Figures and subplots
Plot types and styles np.mod(x, y)
Pandas layers
© 2019 PyEcon.org
Overview: Binary functions 135
Essential
concepts
Getting started
Procedural
programming Function Description
Object-orientation
add Add elements of arrays
Numerical
programming subtract Subtract elements in the second from the first array
NumPy package
Array basics multiply Multiply elements
Linear algebra
divide Divide elements
Data formats and
handling power Raise elements in first array to powers in second
Pandas package
Series
maximum Element-wise maximum
DataFrame minimum Element-wise minimum
Import/Export data
Visual
mod Element-wise modulus
illustrations
Matplotlib package
greater, less, equal gives boolean
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Data processing 136
Essential
concepts
Getting started np.meshgrid(array1, array2): Returns coordinate matrices from
Procedural
programming coordinate arrays.
Object-orientation
p
Numerical
programming Evaluate the function f (x , y ) = x 2 + y 2 on a 10 x 10 grid
NumPy package
Array basics p = np.arange(-5, 5, 0.01)
Linear algebra x, y = np.meshgrid(p, p)
Data formats and x
handling
Pandas package
Series
## array([[-5. , -4.99, -4.98, ..., 4.97, 4.98, 4.99],
DataFrame ## [-5. , -4.99, -4.98, ..., 4.97, 4.98, 4.99],
Import/Export data ## [-5. , -4.99, -4.98, ..., 4.97, 4.98, 4.99],
Visual ## ...,
illustrations
## [-5. , -4.99, -4.98, ..., 4.97, 4.98, 4.99],
## [-5. , -4.99, -4.98, ..., 4.97, 4.98, 4.99],
Matplotlib package
Figures and subplots
Plot types and styles ## [-5. , -4.99, -4.98, ..., 4.97, 4.98, 4.99]])
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Data processing 137
Essential
concepts
Getting started p
Procedural
programming
Evaluate the function f (x , y ) = x 2 + y 2 on a 10 x 10 grid.
Object-orientation
import matplotlib.pyplot as plt
Numerical
programming
val = np.sqrt(x**2 + y**2)
NumPy package plt.figure(figsize=(2, 2))
Array basics plt.imshow(val, cmap="hot")
Linear algebra
plt.colorbar()
Data formats and
handling
## <matplotlib.colorbar.Colorbar object at 0x7fe8375f8160>
Pandas package
Series
DataFrame
Import/Export data
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Data processing 138
Essential
concepts
Getting started p
Procedural
programming
Evaluate the function f (x , y ) = x 2 + y 2 on a 10 x 10 grid.
plt.show()
Object-orientation
Numerical
programming
NumPy package
Array basics
Linear algebra
4
DataFrame
Import/Export data
Visual
illustrations
2
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Conditional logic 139
Essential
concepts
Getting started np.where(condition, a, b): If condition is True, returns value a,
Procedural
programming otherwise returns b.
Object-orientation
Visual
illustrations
res = np.where(a <= b, b, a)
Matplotlib package
res
Figures and subplots
Plot types and styles ## array([4, 9, 8, 3, 9, 3])
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Conditional logic 140
Essential
concepts
Getting started
Procedural Conditional logic, examples
programming
Object-orientation arr3
Numerical
programming ## array([[4, 8, 5],
NumPy package
## [9, 3, 4],
Array basics
Linear algebra
## [1, 0, 6]])
Data formats and
handling res = np.where(arr3 < 5, 0, arr3)
Pandas package res
Series
DataFrame
## array([[0, 8, 5],
Import/Export data
## [9, 0, 0],
Visual
illustrations ## [0, 0, 6]])
Matplotlib package
Figures and subplots even = np.where(arr3 % 2 == 0, arr3, arr3 + 1)
Plot types and styles
even
Pandas layers
Applications
## array([[ 4, 8, 6],
## [10, 4, 4],
Time series
Moving window
Financial applications ## [ 2, 0, 6]])
© 2019 PyEcon.org
Statistical methods 141
Essential
concepts
Getting started array.mean(): Computes the mean of all array elements.
Procedural
programming array.sum(): Computes the sum of all array elements.
Object-orientation
Numerical
programming
Statistical methods
NumPy package arr3
Array basics
Linear algebra
## array([[4, 8, 5],
Data formats and ## [9, 3, 4],
handling
Pandas package
## [1, 0, 6]])
Series
DataFrame arr3.mean()
Import/Export data
Visual ## 4.444444444444445
illustrations
Matplotlib package
Figures and subplots
arr3.sum()
Plot types and styles
Pandas layers ## 40
Applications
Time series arr3.argmin()
Moving window
Financial applications ## 7
© 2019 PyEcon.org
Overview: Statistical methods 142
Essential
concepts
Getting started
Procedural
programming Method Description
Object-orientation
sum Sum of all array elements
Numerical
programming mean Mean of all array elements
NumPy package
Array basics std, var Standard deviation, variance
Linear algebra
min, max Minimum and Maximum value in array
Data formats and
handling argmin, argmax Indices of Minimum and Maximum value
Pandas package
Series
DataFrame
Import/Export data
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Axis 143
Essential
concepts
Getting started Axes are defined for arrays with more than one dimension. A two-
Procedural
programming dimensional array has two axes. The first one is running vertically
Object-orientation
downwards across the rows (axis=0), the second one running horizon-
Numerical
programming tally across the columns (axis=1).
NumPy package
Array basics
Linear algebra
Axis
Data formats and arr3
handling
Pandas package
## array([[4, 8, 5],
Series
DataFrame
## [9, 3, 4],
Import/Export data ## [1, 0, 6]])
Visual
illustrations arr3.sum(axis=0)
Matplotlib package
Figures and subplots
Plot types and styles
## array([14, 11, 15])
Pandas layers
arr3.sum(axis=1)
Applications
Time series
Moving window ## array([17, 16, 7])
Financial applications
© 2019 PyEcon.org
Sorting 144
Essential
concepts
Getting started array.sort(axis): Sorts array by an axis.
Procedural
programming
Object-orientation Sorting one-dimensional arrays
Numerical
programming arr2
NumPy package
Array basics ## array([24.3 , 0. , 8.9 , 4.4 , 1.65, 45. ])
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Sorting 145
Essential
concepts
Getting started
Procedural
Sorting two-dimensional arrays
programming
Object-orientation
arr3
Numerical
programming ## array([[4, 8, 5],
NumPy package ## [9, 3, 4],
Array basics
## [1, 0, 6]])
Linear algebra
The default axis using sort() is -1, which means to sort along the
© 2019 PyEcon.org last axis (in this case axis 1).
Section 2.3 146
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Numerical programming
Numerical
programming
NumPy package
Array basics
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Inverse matrix 147
Essential
concepts
Getting started
Procedural Import numpy.linalg
programming
Object-orientation import numpy.linalg as nplin
Numerical
programming
NumPy package nplin.inv(array): Computes the inverse matrix.
Array basics
Linear algebra
np.allclose(array1, array2): Returns True if two arrays are ele-
Data formats and
ment-wise equal within a tolerance.
handling
Pandas package
Series
Inverse
DataFrame inv = nplin.inv(arr3)
Import/Export data
inv
Visual
illustrations
Matplotlib package
## array([[ 4., -21., 16.],
Figures and subplots ## [ -5., 24., -18.],
Plot types and styles ## [ 1., -4., 3.]])
Pandas layers
© 2019 PyEcon.org
Matrix functions 148
Essential
concepts
Getting started nplin.det(array): Computes the determinant.
Procedural
programming np.trace(array): Computes the trace.
Object-orientation
np.diag(array): Returns the diagonal elements as an array.
Numerical
programming
NumPy package Linear algebra functions
Array basics
Linear algebra nplin.det(arr3)
Data formats and
handling ## -1.0
Pandas package
Series
DataFrame
np.trace(arr3)
Import/Export data
## 13
Visual
illustrations
Matplotlib package np.diag(arr3)
Figures and subplots
Plot types and styles
## array([0, 4, 9])
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Eigenvalues and eigenvectors 149
Essential
concepts
Getting started nplin.eig(array): Returns the array of eigenvalues and the array of
Procedural
programming eigenvectors as a list.
Object-orientation
Numerical
programming
Get eigenvalues and eigenvectors
NumPy package
A = np.array([[3, -1, 0], [2, 0, 0], [-2, 2, -1]])
eigenval, eigenvec = nplin.eig(A)
Array basics
Linear algebra
Visual
illustrations
## array([[ 0. , -0.40824829, -0.70710678],
Matplotlib package ## [ 0. , -0.81649658, -0.70710678],
Figures and subplots ## [ 1. , -0.40824829, 0. ]])
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Eigenvalues and eigenvectors 150
Essential
concepts
Getting started
Procedural Check eigenvalues and eigenvectors
programming
Object-orientation eigenval * eigenvec
Numerical
programming ## array([[-0. , -0.40824829, -1.41421356],
NumPy package
## [-0. , -0.81649658, -1.41421356],
Array basics
Linear algebra
## [-1. , -0.40824829, 0. ]])
Data formats and
handling np.dot(A, eigenvec)
Pandas package
Series ## array([[ 0. , -0.40824829, -1.41421356],
DataFrame
## [ 0. , -0.81649658, -1.41421356],
Import/Export data
## [-1. , -0.40824829, 0. ]])
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers 3 −1 0 0 0 0
Applications 2 0 0 · 0 = (−1) · 0 = 0
Time series
Moving window −2 2 −1 1 1 −1
Financial applications
© 2019 PyEcon.org
QR decomposition 151
Essential
concepts
Getting started nplin.qr(array): Conducts a QR decomposition and returns Q and
Procedural
programming R as lists.
Object-orientation
Numerical QR decomposition
programming
NumPy package Q, R = nplin.qr(arr3)
Array basics
Q
Linear algebra
© 2019 PyEcon.org
Linearsystem 152
Essential
concepts
Getting started nplin.solve(A, b): Returns the solution of the linearsystem Ax = b.
Procedural
programming
Object-orientation Solve linearsystems
Numerical
programming b = np.array([7, 4, 8])
NumPy package x = nplin.solve(A, b)
Array basics
x
Linear algebra
© 2019 PyEcon.org
Overview: Linear algebra 153
Essential
concepts
Getting started
Procedural
programming Function Description
Object-orientation
np.dot Matrix multiplication
Numerical
programming np.trace Sum of the diagonal elements
NumPy package
Array basics np.diag Diagonal elements as an array
Linear algebra
nplin.det Matrix determinant
Data formats and
handling nplin.eig Eigenvalues and eigenvectors
Pandas package
Series
nplin.inv Inverse matrix
DataFrame nplin.qr QR decomposition
Import/Export data
Visual
nplin.solve Solve linearsystem
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Chapter 3 154
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Data formats and handling
Numerical
programming
NumPy package
Array basics
3.1 Pandas package
Linear algebra
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Section 3.1 155
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Data formats and handling
Numerical
programming
NumPy package
Array basics
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Pandas 156
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Numerical
programming
NumPy package
Array basics
Linear algebra
© 2019 PyEcon.org
Motivation 157
Essential
concepts
Getting started With pandas you can import and visualize financial data in only a few
Procedural
programming lines of code.
Object-orientation
Numerical Motivation
programming
NumPy package
import pandas as pd
Array basics import matplotlib.pyplot as plt
Linear algebra
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Motivation 158
Essential
concepts
Getting started
Procedural
programming
DJI
Object-orientation
Numerical 27500
programming
NumPy package
Array basics
25000
Linear algebra
Import/Export data
Visual
illustrations 15000
Matplotlib package
Figures and subplots 12500
Plot types and styles
Pandas layers
10000
Applications
Time series
Moving window
7500
Financial applications
6 8 0 2 4 6 8
200 200 201 201 201 201 201
Date
© 2019 PyEcon.org
Section 3.2 159
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Data formats and handling
Numerical
programming
NumPy package
Array basics
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Series 160
Essential
concepts
Getting started Series are a data structure in pandas.
Procedural
programming
Object-orientation
One-dimensional array-like object,
Numerical
programming Containing a sequence of values and a corresponding array of
NumPy package
Array basics
labels, called the index,
Linear algebra
The string representation of a Series displays the index on the left
Data formats and
handling and the values on the right,
Pandas package
Series The default index consists of the integers 0 through N-1.
DataFrame
Import/Export data
Visual
illustrations String representation of a Series
## 0 3
Matplotlib package
Figures and subplots
Plot types and styles ## 1 7
Pandas layers ## 2 -8
Applications ## 3 4
Time series
## 4 26
## dtype: int64
Moving window
Financial applications
© 2019 PyEcon.org
Create Series 161
Essential
concepts
Getting started pd.Series(): Creates one-dimensional array-like object including val-
Procedural
programming ues and an index.
Object-orientation
Applications
Time series Simple Series formed only from a list,
Moving window
Financial applications An index is added automatically.
© 2019 PyEcon.org
Create Series 162
Essential
concepts
Getting started
Procedural
Series indexing vs. Numpy indexing
programming
Object-orientation obj2 = pd.Series([2, -5, 9, 4], index=["a", "b", "c", "d"])
Numerical npobj = np.array([2, -5, 9, 4])
programming obj2
NumPy package
## a 2
Array basics
Linear algebra
Visual
obj2["b"]
illustrations
Matplotlib package ## -5
Figures and subplots
Plot types and styles
npobj[1]
Pandas layers
Applications ## -5
Time series
Moving window
Financial applications
Visual ## a 2
illustrations
Matplotlib package
## b -5
Figures and subplots ## c 9
Plot types and styles ## d 4
Pandas layers
## dtype: int64
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Create Series 164
Essential
concepts
Getting started
Procedural Series from dicts
programming
Object-orientation dictdata = {"Göttingen": 117665, "Northeim": 28920,
Numerical "Hannover": 532163, "Berlin": 3574830}
programming obj3 = pd.Series(dictdata)
NumPy package
obj3
Array basics
Linear algebra
## Göttingen 117665
Data formats and
handling ## Northeim 28920
Pandas package ## Hannover 532163
Series
## Berlin 3574830
DataFrame
Import/Export data
## dtype: int64
Visual
illustrations
Matplotlib package
Figures and subplots The index of the Series can be set manually,
Plot types and styles
Pandas layers Compared to NumPy array you can use the set index to select
Applications single values,
Time series
Moving window Data contained in a dict can be passed to a Series. The index of
Financial applications
the resulting Series consists of the dict’s keys.
© 2019 PyEcon.org
Create Series 165
Essential
concepts
Getting started
Procedural Dict to Series with manual index
programming
Object-orientation cities = ["Hamburg", "Göttingen", "Berlin", "Hannover"]
Numerical obj4 = pd.Series(dictdata, index=cities)
programming obj4
NumPy package
Array basics
Linear algebra
## Hamburg NaN
## Göttingen 117665.0
Data formats and
handling ## Berlin 3574830.0
Pandas package ## Hannover 532163.0
Series
## dtype: float64
DataFrame
Import/Export data
Visual
illustrations
Matplotlib package
Passing a dict to a Series, the index can be set manually,
Figures and subplots
Plot types and styles
NaN (not a number) marks missing values where the index and the
Pandas layers dict do not match.
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Series properties 166
Essential
concepts
Getting started Series.values: Returns the values of a Series.
Procedural
programming Series.index: Returns the index of a Series.
Object-orientation
Visual obj2.index
illustrations
Matplotlib package
## Index(['a', 'b', 'c', 'd'], dtype='object')
Figures and subplots
Plot types and styles
Pandas layers
Applications The values and the index of a Series can be printed separately.
Time series
Moving window The default index, if none was explicitly specified, is a RangeIndex.
Financial applications
© 2019 PyEcon.org
Selecting and manipulating values 167
Essential
concepts
Getting started
Procedural Series manipulation
programming
Object-orientation obj2[["c", "d", "a"]]
Numerical
programming ## c 9
NumPy package
Array basics
## d 4
Linear algebra ## a 2
Data formats and
## dtype: int64
handling
Pandas package obj2[obj2 < 0]
Series
DataFrame
Import/Export data
## b -5
## dtype: int64
Visual
illustrations
Matplotlib package
Figures and subplots
NumPy-like functions can be applied on Series
Plot types and styles
Pandas layers
For filtering data,
Applications To do scalar multiplications or applying math functions,
Time series
Moving window The index-value link will be preserved.
Financial applications
© 2019 PyEcon.org
Selecting and manipulating values 168
Essential
concepts
Getting started
Procedural
Series functions
programming
Object-orientation
obj2 * 2
Numerical
programming
## a 4
NumPy package ## b -10
Array basics ## c 18
Linear algebra
## d 8
Data formats and
handling
## dtype: int64
Pandas package
Series np.exp(obj2)["a":"c"]
DataFrame
Import/Export data ## a 7.389056
Visual ## b 0.006738
illustrations
Matplotlib package
## c 8103.083928
Figures and subplots ## dtype: float64
Plot types and styles
Pandas layers "c" in obj2
Applications
Time series ## True
Moving window
Financial applications
Numerical
programming
NaN
NumPy package pd.isnull(obj4)
Array basics
Linear algebra
## Hamburg False
Data formats and
handling
## Göttingen False
Pandas package ## Berlin False
Series ## Hannover False
DataFrame
## dtype: bool
Import/Export data
Visual
illustrations
pd.notnull(obj4)
Matplotlib package
Figures and subplots ## Hamburg True
Plot types and styles ## Göttingen True
Pandas layers
## Berlin True
Applications ## Hannover True
Time series
Moving window
## dtype: bool
Financial applications
© 2019 PyEcon.org
Align differently indexed data 171
Essential
concepts
Getting started There are not two values to align for Hamburg and Northeim – so they
Procedural
programming are marked with NaN (not a number).
Object-orientation
Numerical
programming
NumPy package
Data 1 Data 2
Array basics
obj3 obj4
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Align data
Plot types and styles
obj3 + obj4
Pandas layers
Numerical Naming
programming
NumPy package obj4.name = "population"
Array basics
obj4.index.name = "city"
obj4
Linear algebra
Visual
illustrations
## Hannover 1100000.0
Matplotlib package ## Name: population, dtype: float64
Figures and subplots
Plot types and styles
Pandas layers
Applications The attribute name will change the name of the existing Series,
Time series
Moving window There is no default name of the Series or the index.
Financial applications
© 2019 PyEcon.org
Series vs. NumPy arrays 173
Essential
concepts
Getting started
Procedural
programming
NumPy arrays are accessed by their integer positions,
Object-orientation
Series can be accessed by a user defined index, including letters
Numerical
programming and numbers,
NumPy package
Array basics Different Series can be aligned efficiently by the index,
Linear algebra
Data formats and Series can work with missing values, so operations do not auto-
handling
Pandas package
matically fail.
Series
DataFrame
Import/Export data
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Section 3.3 174
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Data formats and handling
Numerical
programming
NumPy package
Array basics
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
DataFrame 175
Essential
concepts
Getting started
Procedural
programming
DataFrames are the primary structure of pandas,
Object-orientation
It represents a table of data with an ordered collection of columns,
Numerical
programming
NumPy package
Each column can have a different data type,
Array basics
Linear algebra
A DataFrame can be thought of as a dict of Series sharing the
Data formats and same index,
handling
Pandas package Physically a DataFrame is two-dimensional but by using hierarchical
Series
DataFrame
indexing it can respresent higher dimensional data.
Import/Export data
Visual
illustrations String representation of a DataFrame
Matplotlib package
Figures and subplots ## company price volume
Plot types and styles
## 0 Daimler 69.20 4456290
## 1 E.ON 8.11 3667975
Pandas layers
Applications
## 2 Siemens 110.92 3669487
Time series
Moving window
## 3 BASF 87.28 1778058
Financial applications ## 4 BMW 87.81 1824582
© 2019 PyEcon.org
DataFrame 176
Essential
concepts
Getting started pd.DataFrame(): Creates a DataFrame which is a two-dimensional
Procedural
programming tabular-like structure with labeled axis (rows and columns).
Object-orientation
Applications
© 2019 PyEcon.org
Inputs to DataFrame constructor 178
Essential
concepts
Getting started
Procedural
programming Type Description
Object-orientation
2D NumPy arrays A matrix of data
Numerical
programming dict of arrays, lists, or tuples Each sequence becomes a column
NumPy package
Array basics dict of Series Each value becomes a column
Linear algebra
dict of dicts Each inner dict becomes a column
Data formats and
handling List of dicts or Series Each item becomes a row
Pandas package
Series
List of lists or tuples Treated as the 2D NumPy arrays
DataFrame Another DataFrame Same indexes
Import/Export data
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Indexing and adding DataFrames 179
Essential
concepts
Getting started
Procedural Add data to DataFrame
programming
Object-orientation frame2["change"] = [1.2, -3.2, 0.4, -0.12, 2.4]
Numerical frame2["change"]
programming
NumPy package ## 0 1.20
Array basics
Linear algebra
## 1 -3.20
## 2 0.40
Data formats and
handling ## 3 -0.12
Pandas package ## 4 2.40
Series
## Name: change, dtype: float64
DataFrame
Import/Export data
Visual
illustrations
Matplotlib package
Selecting the column of DataFrame, a Series is returned,
Figures and subplots
Plot types and styles
A attribute-like access, e. g., frame2.change, is also possible,
Pandas layers
The returned Series has the same index as the initial DataFrame.
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Indexing DataFrames 180
Essential
concepts
Getting started
Procedural Indexing DataFrames
programming
Object-orientation frame2[["company", "change"]]
Numerical
programming ## company change
NumPy package
Array basics
## 0 Daimler 1.20
Linear algebra ## 1 E.ON -3.20
Data formats and
## 2 Siemens 0.40
handling ## 3 BASF -0.12
Pandas package
## 4 BMW 2.40
Series
DataFrame
Import/Export data
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Changing DataFrames 181
Essential
concepts
Getting started del DataFrame[column]: Deletes column from DataFrame.
Procedural
programming
Object-orientation DataFrame delete column
Numerical
programming
del frame2["volume"]
NumPy package frame2
Array basics
Linear algebra ## company price change
Data formats and ## 0 Daimler 69.20 1.20
handling
Pandas package
## 1 E.ON 8.11 -3.20
Series ## 2 Siemens 110.92 0.40
DataFrame ## 3 BASF 87.28 -0.12
Import/Export data
## 4 BMW 87.81 2.40
Visual
illustrations
frame2.columns
Matplotlib package
Figures and subplots
Plot types and styles ## Index(['company', 'price', 'change'], dtype='object')
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Naming DataFrames 182
Essential
concepts
Getting started
Procedural Naming properties
programming
Object-orientation frame2.index.name = "number:"
Numerical frame2.columns.name = "feature:"
programming
frame2
NumPy package
Array basics
Linear algebra
## feature: company price change
Data formats and
## number:
handling ## 0 Daimler 69.20 1.20
Pandas package
## 1 E.ON 8.11 -3.20
## 2 Siemens 110.92 0.40
Series
DataFrame
Import/Export data ## 3 BASF 87.28 -0.12
Visual ## 4 BMW 87.81 2.40
illustrations
Matplotlib package
Figures and subplots
Plot types and styles In DataFrames there is no default name for the index or the
Pandas layers
columns.
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Reindexing 183
Essential
concepts
Getting started DataFrame.reindex(): Creates new DataFrame with data conformed
Procedural
programming to a new index, while the initial DataFrame will not be changed.
Object-orientation
Numerical
programming
Reindexing
NumPy package
frame3 = frame.reindex([0, 2, 3, 4])
frame3
Array basics
Linear algebra
Index values that are not already present will be filled with NaN by
Figures and subplots
Plot types and styles
Pandas layers
default,
Applications
Time series There are many options for filling missing values.
Moving window
Financial applications
© 2019 PyEcon.org
Reindexing 184
Essential
concepts
Getting started
Procedural Filling missing values
programming
Object-orientation frame4 = frame.reindex(index=[0, 2, 3, 4, 5], fill_value=0,
Numerical columns=["company", "price", "market cap"])
programming
frame4
NumPy package
Array basics
Linear algebra ## company price market cap
Data formats and
## 0 Daimler 69.20 0
handling ## 2 Siemens 110.92 0
Pandas package
## 3 BASF 87.28 0
## 4 BMW 87.81 0
Series
DataFrame
Import/Export data ## 5 0 0.00 0
Visual
illustrations frame4 = frame.reindex(index=[0, 2, 3, 4], fill_value=np.nan,
Matplotlib package
columns=["company", "price", "market cap"])
frame4
Figures and subplots
Plot types and styles
Pandas layers
Applications
## company price market cap
Time series ## 0 Daimler 69.20 NaN
Moving window ## 2 Siemens 110.92 NaN
Financial applications
## 3 BASF 87.28 NaN
## 4 BMW 87.81 NaN
© 2019 PyEcon.org
Fill NaN 185
Essential
concepts
Getting started DataFrame.fillna(value): Fills NaNs with value.
Procedural
programming
Object-orientation Filling NaN
Numerical
programming
frame4[:3]
NumPy package
Array basics ## company price market cap
Linear algebra ## 0 Daimler 69.20 NaN
Data formats and ## 2 Siemens 110.92 NaN
handling
## 3 BASF 87.28 NaN
Pandas package
Series
DataFrame frame4.fillna(1000000, inplace=True)
Import/Export data frame4[:3]
Visual
illustrations ## company price market cap
Matplotlib package
Figures and subplots
## 0 Daimler 69.20 1000000.0
Plot types and styles ## 2 Siemens 110.92 1000000.0
Pandas layers ## 3 BASF 87.28 1000000.0
Applications
Time series
Moving window
Financial applications
The option inplace=True fills the current DafaFrame (here
frame4). Without using inplace a new DataFrame will be cre-
ated, filled with NaN values.
© 2019 PyEcon.org
Dropping entries 186
Essential
concepts
Getting started DataFrame.drop(index, axis): Returns a new object with labels in
Procedural
programming requested axis removed.
Object-orientation
Numerical
programming
Dropping index
NumPy package frame5 = frame
Array basics
Linear algebra
frame5
Data formats and
handling
## company price volume
Pandas package ## 0 Daimler 69.20 4456290
Series ## 1 E.ON 8.11 3667975
DataFrame
Import/Export data
## 2 Siemens 110.92 3669487
## 3 BASF 87.28 1778058
Visual
illustrations ## 4 BMW 87.81 1824582
Matplotlib package
Figures and subplots frame5.drop([1, 2])
Plot types and styles
Applications
## 0 Daimler 69.20 4456290
Time series
Moving window
## 3 BASF 87.28 1778058
Financial applications ## 4 BMW 87.81 1824582
© 2019 PyEcon.org
Dropping entries 187
Essential
concepts
Getting started
Procedural Dropping column
programming
Object-orientation frame5[:2]
Numerical
programming ## company price volume
NumPy package
Array basics
## 0 Daimler 69.20 4456290
Linear algebra ## 1 E.ON 8.11 3667975
Data formats and
handling frame5.drop("price", axis=1)[:3]
Pandas package
Series ## company volume
DataFrame
Import/Export data
## 0 Daimler 4456290
## 1 E.ON 3667975
Visual
illustrations ## 2 Siemens 3669487
Matplotlib package
Figures and subplots frame5.drop(2, axis=0)
Plot types and styles
Applications
## 0 Daimler 69.20 4456290
Time series
Moving window
## 1 E.ON 8.11 3667975
Financial applications ## 3 BASF 87.28 1778058
## 4 BMW 87.81 1824582
© 2019 PyEcon.org
Indexing, selecting and filtering 188
Essential
concepts
Getting started Indexing of DataFrames works like indexing an numpy array, you can
Procedural
programming use the default index values and a manually set index.
Object-orientation
Numerical
programming
Indexing
NumPy package frame
Array basics
Applications
## 3 BASF 87.28 1778058
Time series
Moving window
## 4 BMW 87.81 1824582
Financial applications
© 2019 PyEcon.org
Indexing, selecting and filtering 189
Essential
concepts
Getting started
Procedural Indexing
programming
Object-orientation frame6 = pd.DataFrame(data, index=["a", "b", "c", "d", "e"])
Numerical frame6
programming
NumPy package
Array basics
## company price volume
Linear algebra ## a Daimler 69.20 4456290
Data formats and
## b E.ON 8.11 3667975
handling ## c Siemens 110.92 3669487
Pandas package
## d BASF 87.28 1778058
## e BMW 87.81 1824582
Series
DataFrame
Import/Export data
Visual
frame6["b":"d"]
illustrations
Matplotlib package ## company price volume
Figures and subplots
## b E.ON 8.11 3667975
Plot types and styles
Pandas layers
## c Siemens 110.92 3669487
Applications
## d BASF 87.28 1778058
Time series
Moving window
Financial applications
When slicing with labels the end element is inclusive.
© 2019 PyEcon.org
Indexing, selecting and filtering 190
Essential
concepts
Getting started DataFrame.loc(): Selects a subset of rows and columns from a
Procedural
programming DataFrame using axis labels.
Object-orientation
DataFrame.iloc(): Selects a subset of rows and columns from a
Numerical
programming DataFrame using integers.
NumPy package
Array basics
Linear algebra
Selection with loc and iloc
Data formats and frame6.loc["c", ["company", "price"]]
handling
Pandas package
## company Siemens
Series
DataFrame
## price 110.92
Import/Export data ## Name: c, dtype: object
Visual
illustrations frame6.iloc[2, [0, 1]]
Matplotlib package
Figures and subplots
Plot types and styles
## company Siemens
Pandas layers ## price 110.92
Applications
## Name: c, dtype: object
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Indexing, selecting and filtering 191
Essential
concepts
Getting started
Procedural Selection with loc and iloc
programming
Object-orientation frame6.loc[["c", "d", "e"], ["volume", "price", "company"]]
Numerical
programming ## volume price company
NumPy package ## c 3669487 110.92 Siemens
Array basics
Linear algebra
## d 1778058 87.28 BASF
## e 1824582 87.81 BMW
Data formats and
handling
Pandas package frame6.iloc[2:, ::-1]
Series
DataFrame ## volume price company
## c 3669487 110.92 Siemens
Import/Export data
Visual
illustrations
## d 1778058 87.28 BASF
Matplotlib package ## e 1824582 87.81 BMW
Figures and subplots
Plot types and styles
Pandas layers
Applications
Both of the indexing functions work with slices or lists of labels,
Time series
Moving window
Many ways to select and rearrange pandas objects.
Financial applications
© 2019 PyEcon.org
DataFrame indexing options 192
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Type Description
Numerical
programming df[val] Select single column or set of columns
NumPy package
Array basics df.loc[val] Select single row or set of rows
Linear algebra
df.loc[:, val] Select single column or set of columns
Data formats and
handling df.loc[val1, val2] Select row and column by label
Pandas package
Series
df.iloc[where] Select row or set of rows by integer position
DataFrame df.iloc[:, where] Select column or set of columns by integer pos.
Import/Export data
Visual
df.iloc[w1, w2] Select row and column by integer position
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Hierarchical indexing 193
Essential
concepts
Getting started Hierarchical indexing enables you to have multiple index levels.
Procedural
programming
Object-orientation Multiindex
Numerical
programming
ind = [["a", "a", "a", "b", "b"], [1, 2, 3, 1, 2]]
NumPy package
frame6 = pd.DataFrame(np.arange(15).reshape((5, 3)),
Array basics index=ind,
Linear algebra
columns=["first", "second", "third"])
Data formats and frame6
handling
Pandas package
Series
## first second third
DataFrame ## a 1 0 1 2
Import/Export data
## 2 3 4 5
Visual ## 3 6 7 8
illustrations
Matplotlib package
## b 1 9 10 11
Figures and subplots ## 2 12 13 14
Plot types and styles
Pandas layers frame6.index.names = ["index1", "index2"]
Applications frame6.index
Time series
Moving window
## MultiIndex(levels=[['a', 'b'], [1, 2, 3]],
Financial applications
## labels=[[0, 0, 0, 1, 1], [0, 1, 2, 0, 1]],
## names=['index1', 'index2'])
© 2019 PyEcon.org
Hierarchical indexing 194
Essential
concepts
Getting started
Procedural Selecting of a multiindex
programming
Object-orientation frame6.loc["a"]
Numerical
programming ## first second third
NumPy package
Array basics
## index2
Linear algebra ## 1 0 1 2
Data formats and
## 2 3 4 5
handling ## 3 6 7 8
Pandas package
Series
frame6.loc["b", 1]
DataFrame
Import/Export data
## first 9
Visual
illustrations ## second 10
Matplotlib package ## third 11
Figures and subplots
## Name: (b, 1), dtype: int64
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Operations between DataFrame and Series 195
Essential
concepts
Getting started
Procedural Series and DataFrames
programming
Object-orientation frame7 = frame[["price", "volume"]]
Numerical frame7.index = ["Daimler", "E.ON", "Siemens", "BASF", "BMW"]
programming series = frame7.iloc[2]
NumPy package
frame7
Array basics
Linear algebra
## price volume
Data formats and
handling ## Daimler 69.20 4456290
Pandas package ## E.ON 8.11 3667975
Series
## Siemens 110.92 3669487
DataFrame
Import/Export data
## BASF 87.28 1778058
Visual
## BMW 87.81 1824582
illustrations
Matplotlib package series
Figures and subplots
Plot types and styles
## price 110.92
Pandas layers
## volume 3669487.00
Applications
Time series
## Name: Siemens, dtype: float64
Moving window
Financial applications
Here the Series was generated from the first row of the DataFrame.
© 2019 PyEcon.org
Operations between DataFrames and Series 196
Essential
concepts
Getting started
Procedural Operations between Series and DataFrames down the rows
programming
Object-orientation frame7 + series
Numerical
programming ## price volume
NumPy package
## Daimler 180.12 8125777.0
Array basics
Linear algebra
## E.ON 119.03 7337462.0
Data formats and
## Siemens 221.84 7338974.0
handling ## BASF 198.20 5447545.0
Pandas package ## BMW 198.73 5494069.0
Series
DataFrame
Import/Export data
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Operations between DataFrames and Series 197
Essential
concepts
Getting started
Procedural Operations between Series and DataFrames down the columns
programming
Object-orientation series2 = frame7["price"]
Numerical frame7.add(series2, axis=0)
programming
NumPy package
Array basics
## price volume
Linear algebra ## Daimler 138.40 4456359.20
Data formats and
## E.ON 16.22 3667983.11
handling ## Siemens 221.84 3669597.92
Pandas package
## BASF 174.56 1778145.28
## BMW 175.62 1824669.81
Series
DataFrame
Import/Export data
Visual
illustrations
Matplotlib package
Here, the Series was generated from the price column,
Figures and subplots
Plot types and styles
The arithmetic operation will be broadcasted along a column
Pandas layers matching the DataFrame’s row index (axis=0).
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Operations between DataFrames and Series 198
Essential
concepts
Getting started
Procedural Pandas vs Numpy
programming
Object-orientation nparr = np.arange(12.).reshape((3, 4))
Numerical row = nparr[0]
programming
nparr-row
NumPy package
Array basics
Linear algebra ## array([[0., 0., 0., 0.],
Data formats and
## [4., 4., 4., 4.],
handling ## [8., 8., 8., 8.]])
Pandas package
Series
DataFrame
Import/Export data
Operations between DataFrames are similar to operations between
Visual
illustrations one- and two-dimensional Numpy arrays,
Matplotlib package
Figures and subplots
As in DataFrames and Series the arithmetic operations will be
Plot types and styles
Pandas layers
broadcasted along the rows.
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
NumPy functions on DataFrames 199
Essential
concepts
Getting started DataFrame.apply(np.function, axis): Applies a NumPy function
Procedural
programming on the DataFrame axis. See also statistical and mathematical NumPy
Object-orientation
functions.
Numerical
programming
NumPy package
Numpy functions on DataFrames
Array basics
Linear algebra
frame7[:2]
Data formats and
handling ## price volume
Pandas package ## Daimler 69.20 4456290
Series
## E.ON 8.11 3667975
DataFrame
Import/Export data
frame7.apply(np.mean)
Visual
illustrations
Matplotlib package ## price 72.664
Figures and subplots ## volume 3079278.400
Plot types and styles
## dtype: float64
Pandas layers
Applications
frame7.apply(np.sqrt)[:2]
Time series
Moving window
Financial applications
## price volume
## Daimler 8.318654 2110.992657
## E.ON 2.847806 1915.195812
© 2019 PyEcon.org
Grouping DataFrames 200
Essential
concepts
Getting started DataFrame.groupby(col1, col2): Groups DataFrame by columns
Procedural
programming (grouping by one or more than two columns is also possible). See also
Object-orientation
how to import data from CSV files.
Numerical
programming
NumPy package Groupby
Array basics
Linear algebra vote = pd.read_csv("data/vote.csv")[["Party", "Member", "Vote"]]
Data formats and vote.head()
handling
Pandas package
## Party Member Vote
## 0 CDU/CSU Abercron yes
Series
DataFrame
Import/Export data ## 1 CDU/CSU Albani yes
Visual ## 2 CDU/CSU Altenkamp yes
illustrations ## 3 CDU/CSU Altmaier absent
Matplotlib package
Figures and subplots
## 4 CDU/CSU Amthor yes
Plot types and styles
Pandas layers
Adding the functions count() or mean() to groupby() returns the
Applications
Time series sum or the mean of the grouped columns.
Moving window
Financial applications
© 2019 PyEcon.org
Grouping DataFrames 201
Essential
concepts
Getting started
Procedural Groupby
programming
Object-orientation res = vote.groupby(["Party", "Vote"]).count()
Numerical res
programming
NumPy package
Array basics
## Member
Linear algebra ## Party Vote
Data formats and
## AfD absent 6
handling ## no 86
Pandas package
## BÜ90/GR absent 9
## no 58
Series
DataFrame
Import/Export data ## CDU/CSU absent 7
Visual ## yes 239
illustrations ## DIE LINKE. absent 7
Matplotlib package
Figures and subplots
## no 62
Plot types and styles ## FDP absent 5
Pandas layers ## no 75
Applications ## Fraktionslos absent 1
Time series ## no 1
Moving window
Financial applications
## SPD absent 6
## yes 147
© 2019 PyEcon.org
Section 3.4 202
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Data formats and handling
Numerical
programming
NumPy package
Array basics
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Reading data in text format 203
Essential
concepts
Getting started ex1.csv
Procedural
programming
Object-orientation a, b, c, d, hello
Numerical
programming
1, 2, 3, 4, world
NumPy package 5, 6, 7, 8, python
2, 3, 5, 7, pandas
Array basics
Linear algebra
© 2019 PyEcon.org
Reading data in text format 204
Essential
concepts
Getting started tab.txt
Procedural
programming
Object-orientation a| b| c| d| hello
Numerical
programming
1| 2| 3| 4| world
NumPy package 5| 6| 7| 8| python
2| 3| 5| 7| pandas
Array basics
Linear algebra
© 2019 PyEcon.org
Reading data in text format 205
Essential
concepts
Getting started ex2.csv
Procedural
programming
Object-orientation 1, 2, 3, 4, world
Numerical
programming
5, 6, 7, 8, python
NumPy package 2, 3, 5, 7, pandas
Array basics
Linear algebra
© 2019 PyEcon.org
Reading data in text format 206
Essential
concepts
Getting started ex2.csv
Procedural
programming
Object-orientation 1, 2, 3, 4, world
Numerical
programming
5, 6, 7, 8, python
NumPy package 2, 3, 5, 7, pandas
Array basics
Linear algebra
© 2019 PyEcon.org
Reading data in text format 207
Essential
concepts
Getting started ex2.csv
Procedural
programming
Object-orientation 1, 2, 3, 4, world
Numerical
programming
5, 6, 7, 8, python
NumPy package 2, 3, 5, 7, pandas
Array basics
Linear algebra
© 2019 PyEcon.org
Reading data in text format 208
Essential
concepts
Getting started ex3.csv
Procedural
programming
Object-orientation 1, 2, 3, 4, world
Numerical
programming
#+#-.,.-'*'-.,
NumPy package 5, 6, 7, 8, python
87646756754456978
Array basics
Linear algebra
© 2019 PyEcon.org
Writing data to text file 209
Essential
concepts
Getting started DataFrame.to_csv("filename"): Writes DataFrame to CSV.
Procedural
programming
Object-orientation Write to CSV
Numerical df = pd.read_csv("data/ex3.csv", skiprows=[1, 3])
programming
NumPy package
df.to_csv("out/out1.csv")
Array basics
Linear algebra
out1.csv
Data formats and
handling
Pandas package ,1, 2, 3, 4, world
0,5,6,7,8, python
Series
DataFrame
Import/Export data
1,2,3,5,7, pandas
Visual
illustrations
Matplotlib package
Figures and subplots In the .csv file, the index and header is included (reason why ,1).
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Writing data to text file 210
Essential
concepts
Getting started
Procedural Write to CSV and settings
programming
Object-orientation df = pd.read_csv("data/ex3.csv", skiprows=[1, 3])
Numerical df.to_csv("out/out2.csv", index=False, header=False)
programming
NumPy package
Array basics out2.csv
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Writing data to text file 211
Essential
concepts
Getting started
Procedural Write to CSV and specify header
programming
Object-orientation df = pd.read_csv("data/ex3.csv", skiprows=[1, 3, 4])
Numerical df.to_csv("out/out3.csv", index=False,
programming
header=["a", "b", "c", "d", "e"])
NumPy package
Array basics
Linear algebra
out3.csv
Data formats and
handling
Pandas package a,b,c,d,e
Series
DataFrame
5,6,7,8, python
Import/Export data
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Reading Excel files 212
Essential
concepts
Getting started pd.read_excel("file.xls"): Reads .xls files.
Procedural
programming
Object-orientation
Numerical
programming
NumPy package
Array basics
Linear algebra
© 2019 PyEcon.org
Reading Excel files 213
Essential
concepts
Getting started
Procedural Excel as a DataFrame
programming
Object-orientation xls_frame[["Adj Close", "Volume", "High"]]
Numerical
programming ## Adj Close Volume High
NumPy package ## 0 1169.939941 1538700 1173.000000
Array basics
Linear algebra
## 1 1167.699951 2412100 1174.000000
## 2 1111.900024 4857900 1123.069946
Data formats and
handling ## 3 1055.800049 3798300 1110.000000
Pandas package ## 4 1080.599976 3448000 1081.709961
Series
## 5 1048.579956 2341700 1081.780029
DataFrame
Import/Export data
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Remote data access 214
Essential
concepts
Getting started Extract financial data from Internet sources into a DataFrame. There
Procedural
programming are different sources offering different kind of data. Some sources are:
Object-orientation
Numerical
Robinhood
programming
NumPy package IEX
Array basics
Linear algebra Yahoo Finance
Data formats and
handling
World Bank
Pandas package
Series
OECD
DataFrame
Import/Export data
Eurostat
Visual
illustrations
A complete list of the sources and the usage can be found here:
Matplotlib package pandas-datareader
Figures and subplots
Plot types and styles
Pandas layers
Import pandas-datareader
Applications from pandas_datareader import data
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Data access: IEX 215
Essential
concepts
Getting started data.DataReader("symbol", "source", "start", "end"): Returns
Procedural
programming financial data of a stock in a certain time period.
Object-orientation
Numerical
programming
IEX get data
NumPy package
ford = data.DataReader("F", "iex", "2017-01-01", "2018-01-31")
ford.head()[["close", "volume"]]
Array basics
Linear algebra
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Data access: IEX 216
Essential
concepts
Getting started
Procedural IEX handle data
programming
Object-orientation ford.index
Numerical ## Index(['2017-01-03', '2017-01-04',...
programming
## dtype='object', name='date',...
NumPy package
Array basics
ford.loc["2018-01-26"]
Linear algebra
Visual
illustrations ## Name: 2018-01-26, dtype: float64
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
DataFrame index
Applications Index of the DataFrame is different at different sources. Always check
Time series
Moving window
DataFrame.index!
Financial applications
© 2019 PyEcon.org
Data access: IEX 217
Essential
concepts
Getting started
Procedural IEX
programming
Object-orientation sap = data.DataReader("SAP", "iex", "2017-01-01", "2018-01-31")
Numerical sap[25:27]
programming
NumPy package ## open high low close volume
Array basics
Linear algebra
## date
## 2017-02-08 89.5382 90.0263 89.4405 89.6065 653804
Data formats and
handling ## 2017-02-09 89.7139 89.9738 89.5284 89.5284 548787
Pandas package
Series sap.loc["2017-02-08"]
DataFrame
## open 89.5382
Import/Export data
Visual
illustrations
## high 90.0263
Matplotlib package ## low 89.4405
Figures and subplots ## close 89.6065
Plot types and styles
## volume 653804.0000
Pandas layers
## Name: 2017-02-08, dtype: float64
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Data access: Eurostat 218
Essential
concepts
Getting started
Procedural Eurostat
programming
Object-orientation population = data.DataReader("tps00001", "eurostat", "2007-01-01",
Numerical "2018-01-01")
programming
NumPy package
population.columns
Array basics
Linear algebra
## MultiIndex(levels=[[Population on 1 January - total], [Albania,
## Andorra, Armenia, Austria, Azerbaijan, Belarus, Belgium, ...
Data formats and
handling
population["Population on 1 January - total", "France"][0:5]
Pandas package
Series ## FREQ Annual
## TIME_PERIOD
DataFrame
Import/Export data
## 2007-01-01 63645065.0
Visual
illustrations ## 2008-01-01 64007193.0
Matplotlib package ## 2009-01-01 64350226.0
Figures and subplots
## 2010-01-01 64658856.0
Plot types and styles
Pandas layers
## 2011-01-01 64978721.0
Applications
Time series Eurostat Database
Moving window
Financial applications
© 2019 PyEcon.org
Read data from HTML 219
Essential
concepts
Getting started Website used for the example: Econometrics
Procedural
programming
Object-orientation Beautiful Soup
Numerical
programming
from bs4 import BeautifulSoup
NumPy package import requests
Array basics url = "www.uni-goettingen.de/de/applied-econometrics/412565.html"
Linear algebra
r = requests.get("https://" + url)
Data formats and
handling
d = r.text
Pandas package soup = BeautifulSoup(d, "lxml")
soup.title
Series
DataFrame
Visual
illustrations
Matplotlib package
Figures and subplots
Reading data from HTML in detail exceeds the content of this course.
Plot types and styles If you are interested in this kind of importing data, you can find detailed
Pandas layers
information on Beautiful Soup here.
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Motivation 220
Essential
concepts
Getting started
Procedural Bollinger
programming
Object-orientation sap = data.DataReader("SAP", "iex", "2017-01-01", "2018-08-31")
Numerical sap.index = pd.to_datetime(sap.index)
programming
boll = sap["close"].rolling(window=20, center=False).mean()
NumPy package
Array basics
std = sap["close"].rolling(window=20, center=False).std()
Linear algebra upp = boll + std * 2
Data formats and low = boll - std * 2
handling fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
Pandas package
Series
DataFrame boll.plot(ax=ax, label="20 days Rolling mean")
Import/Export data upp.plot(ax=ax, label="Upper Band")
Visual low.plot(ax=ax, label="Lower Band")
illustrations
sap["close"].plot(ax=ax, label="SAP Price")
Matplotlib package
Figures and subplots ax.legend(loc="best")
Plot types and styles fig.savefig("out/boll.pdf")
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Motivation 221
Essential
concepts
Getting started
Procedural
programming
125 20 days Rolling mean
Object-orientation Upper Band
120 Lower Band
Numerical
programming
SAP Price
NumPy package
115
Array basics
Linear algebra
110
Data formats and
handling
Pandas package 105
Series
DataFrame
100
Import/Export data
Visual
illustrations 95
Matplotlib package
Figures and subplots 90
Plot types and styles
Pandas layers
85
Applications
1 3 5 7 9 1 1 3 5 7 9
7-0 017-0 017-0 017-0 017-0 017-1 018-0 018-0 018-0 018-0 018-0
Time series
Moving window
201 2 2 2 2 2 2 2 2 2 2
Financial applications date
© 2019 PyEcon.org
Chapter 4 222
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Visual illustrations
Numerical
programming
NumPy package
Array basics
4.1 Matplotlib package
Linear algebra
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Section 4.1 223
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Visual illustrations
Numerical
programming
NumPy package
Array basics
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
matplotlib 224
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Numerical
programming
NumPy package
Array basics
Linear algebra
Visual
Image plots, Contour plots, Scatter plots, Polar plots, Line plots,
illustrations 3D plots,
Matplotlib package
Figures and subplots Variety of hardcopy formats,
Plot types and styles
Pandas layers Works in Python scripts, the Python and IPython shell and the
Applications
Time series
jupyter notebook,
Moving window
Financial applications
Interactive environments.
© 2019 PyEcon.org
matplotlib 225
Essential
concepts
Getting started
Procedural
Usage of matplotlib
programming
Object-orientation matplotlib has a vast number of functions and options, which is hard
Numerical
programming
to remember. But for almost every task there is an example you can
NumPy package take code from. A great source of information is the examples gallery
on the matplotlib homepage. Also note the best practice quick start
Array basics
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Simple plot 226
Essential
concepts
Getting started plt.plot(array): Plots the values of a list, the X-axis has by default
Procedural
programming the range [0, 1, ..., n-1].
Object-orientation
Numerical
programming
Import matplotlib and simple example
NumPy package import matplotlib.pyplot as plt
Array basics
Linear algebra
import numpy as np
plt.plot(np.arange(10))
Data formats and
handling plt.savefig("out/list.pdf")
Pandas package
Series
DataFrame
Import/Export data
Visual
illustrations 8
Matplotlib package
6
Figures and subplots
Plot types and styles
4
Pandas layers
Applications 2
Time series
Moving window 0
0 2 4 6 8
Financial applications
© 2019 PyEcon.org
Section 4.2 227
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Visual illustrations
Numerical
programming
NumPy package
Array basics
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Figures 228
Essential
concepts
Getting started Plots in matplotlib reside in a Figure object:
Procedural
programming plt.figure(...): Creates new Figure object allowing for multiple
Object-orientation
parameters.
Numerical
programming plt.gcf(): Returns the reference of the active figure.
NumPy package
Array basics
Linear algebra
Create Figures
Data formats and fig = plt.figure(figsize=(16, 8))
handling
Pandas package
print(plt.gcf())
Series
DataFrame ## Figure(1600x800)
Import/Export data
Visual
illustrations
Matplotlib package A Figure object can be considered as an empty window,
Figures and subplots
Plot types and styles The Figure object has a number of options, such as the size or
Pandas layers
Applications
the aspect ratio,
Time series
Moving window
You cannot draw a plot in a blank figure. There has to be a
Financial applications subplot in the Figure object.
© 2019 PyEcon.org
Saving plots to file 229
Essential
concepts
Getting started plt.savefig("filename"): Saves active figure to file.
Procedural
programming Available file formats are among others:
Object-orientation
Numerical
programming Filename extension Description
NumPy package
Array basics .png Portable Network Graphics
Linear algebra
.pdf Portable Document Format
Data formats and
handling .svg Scalable Vector Graphics
Pandas package
Series
.jpeg JPEG File Interchange Format
DataFrame
Import/Export data
.jpg JPEG File Interchange Format
Visual .ps PostScript
illustrations
Matplotlib package
.raw Raw Image Format
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Subplots 230
Essential
concepts
Getting started fig.add_subplot(): Adds subplot to the Figure fig.
Procedural
programming Example: fig.add_subplot(2, 2, 1) creates four subplots and se-
Object-orientation
lects the first.
Numerical
programming
NumPy package
Adding subplots
Array basics
Linear algebra
ax1 = fig.add_subplot(2, 2, 1)
Data formats and
ax2 = fig.add_subplot(2, 2, 2)
handling ax3 = fig.add_subplot(2, 2, 3)
Pandas package
ax4 = fig.add_subplot(2, 2, 4)
fig.savefig("out/subplots.pdf")
Series
DataFrame
Import/Export data
Visual
illustrations
Matplotlib package
The Figure object is filled with subplots in which the plots reside,
Using the plt.plot() command without creating a subplot in
Figures and subplots
Plot types and styles
Pandas layers
advance, matplotlib will create a Figure object and a subplot
Applications
Time series
automatically,
Moving window
Financial applications
The Figure object and its subplots can be created in one line.
© 2019 PyEcon.org
Subplots 231
Essential
concepts
Getting started
Procedural
programming
Object-orientation
1.0 1.0
Numerical 0.8 0.8
programming
NumPy package 0.6 0.6
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Subplots 232
Essential
concepts
Getting started
Procedural Filling subplots with content
programming
Object-orientation from numpy.random import randn
Numerical ax1.plot([5, 7, 4, 3, 1])
programming ax2.hist(randn(100), bins=20, color="r")
ax3.scatter(np.arange(30), np.arange(30) * randn(30))
NumPy package
Array basics
Linear algebra ax4.plot(randn(40), "k--")
Data formats and fig.savefig("out/content.pdf")
handling
Pandas package
Series
DataFrame The subplots in one Figure object can be filled with different plot
Import/Export data
Visual
types,
illustrations
Matplotlib package
Using only plt.plot() matplotlib draws the plot in the last
Figures and subplots Figure object and last subplot selected.
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Subplots 233
Essential
concepts
Getting started
Procedural
programming
Object-orientation
7 14
Numerical 6 12
programming 10
5
NumPy package 8
4
Array basics 6
3
Linear algebra 4
2
2
Data formats and 1
0
handling 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 2 1 0 1 2 3
Pandas package
2
Series 40
DataFrame 20 1
Import/Export data
0 0
Visual 20
illustrations 1
40
Matplotlib package 2
Figures and subplots 0 5 10 15 20 25 30 0 5 10 15 20 25 30 35 40
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Standard creation of plots 234
Essential
concepts
Getting started plt.subplots(nrows, ncols, sharex, sharey): Creates figure and
Procedural
programming subplots in one line. If sharex or sharey are True, all subplots share
Object-orientation
the same X- or Y-ticks.
Numerical
programming
NumPy package
Standard creation
Array basics fig, axes = plt.subplots(2, 3, figsize=(16, 8), sharey=True)
Linear algebra
axes[1, 1].plot(np.arange(7), color="r")
Data formats and
handling
axes[0, 2].plot(np.arange(10, 0, -1))
Pandas package fig.savefig("out/standard.pdf")
Series
DataFrame
Import/Export data
Visual 10
illustrations 8
Matplotlib package 6
0
Pandas layers 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8
10
Applications
8
Time series
6
Moving window
4
Financial applications 2
0
0.0 0.2 0.4 0.6 0.8 1.0 0 1 2 3 4 5 6 0.0 0.2 0.4 0.6 0.8 1.0
© 2019 PyEcon.org
Section 4.3 235
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Visual illustrations
Numerical
programming
NumPy package
Array basics
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Plot types 236
Essential
concepts
Getting started ax.scatter(x, y): Creates a scatter plot of x vs y.
Procedural
programming ax.hist(x, bins): Creates a histogram.
Object-orientation
ax.fill_between(x, y, a): Creates a plot of x vs y and fills plot
Numerical
programming between a and y.
NumPy package
Array basics
Linear algebra
Types
Data formats and fig, ax = plt.subplots(1, 3, figsize=(16, 8))
handling
Pandas package
ax[0].hist([1, 2, 3, 4, 5, 4, 3, 2, 3, 4, 2, 3, 4, 4],
Series bins=5, color="yellow")
DataFrame x = np.arange(0, 10, 0.1)
Import/Export data
y = np.sin(x)
Visual
illustrations
ax[1].fill_between(x, y, 0, color="green")
Matplotlib package ax[2].scatter(x, y)
Figures and subplots fig.savefig("out/types.pdf")
Plot types and styles
Pandas layers
Applications A vast number of plot types can be found in the examples gallery.
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Plot types 237
Essential
concepts
Getting started
Procedural
programming
Object-orientation
5 1.00 1.00
Numerical
programming
0.75 0.75
NumPy package
4
Array basics 0.50 0.50
Linear algebra
0.25 0.25
Data formats and 3
handling
0.00 0.00
Pandas package
Series 2 0.25 0.25
DataFrame
Import/Export data 0.50 0.50
1
Visual 0.75 0.75
illustrations
Matplotlib package 1.00 1.00
0
Figures and subplots 1 2 3 4 5 0 2 4 6 8 10 0 2 4 6 8 10
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Adjusting the spacing around subplots 238
Essential
concepts
Getting started plt.subplots_adjust(left, bottom, ..., hspace): Sets the space
Procedural
programming between the subplots. wspace and hspace control the percentage of
Object-orientation
the figure width and figure height, respectively, to use as spacing
Numerical
programming between subplots.
NumPy package
Array basics
Linear algebra
Adjust spacing
Data formats and fig, axes = plt.subplots(2, 2, sharex=True, sharey=True)
handling
Pandas package
for i in range(2):
Series for j in range(2):
DataFrame axes[i][j].plot(randn(10))
Import/Export data
plt.subplots_adjust(wspace=0, hspace=0)
Visual
illustrations
fig.savefig("out/spacing.pdf")
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Adjusting the spacing around subplots 239
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Numerical 3
programming
NumPy package 2
Array basics
Linear algebra 1
Data formats and
handling 0
Pandas package
Series 1
DataFrame
Import/Export data
3
Visual
illustrations 2
Matplotlib package
Figures and subplots 1
Plot types and styles
Pandas layers
0
Applications
Time series 1
Moving window
Financial applications
0 2 4 6 8 0 2 4 6 8
© 2019 PyEcon.org
Colors, markers and line styles 240
Essential
concepts
Getting started ax.plot(data, linestyle, color, marker): Sets data and styles
Procedural
programming of subplot ax.
Object-orientation
Numerical
programming
Styles
NumPy package
fig, ax = plt.subplots(1, figsize=(15, 6))
ax.plot(randn(10), linestyle="--", color="darkcyan", marker="p")
Array basics
Linear algebra
© 2019 PyEcon.org
Plot colors 241
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Numerical
programming
NumPy package
Array basics
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Plot line styles 242
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Numerical
programming
NumPy package
Array basics
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Plot markers 243
Essential
concepts
Getting started
Procedural
Marker Description
programming
Object-orientation
"." point
Numerical "," pixel
programming
NumPy package
"o" circle
Array basics "v" triangle_down
Linear algebra
© 2019 PyEcon.org
Ticks and labels 244
Essential
concepts
Getting started ax.set_xticks(): Sets list of X-ticks, analogously for Y-axis.
Procedural
programming ax.set_xlabel(): Sets the X-label.
Object-orientation
ax.set_title(): Sets the subplot title.
Numerical
programming
NumPy package Ticks and labels - default
Array basics
Linear algebra
fig, ax = plt.subplots(1, figsize=(15, 10))
Data formats and
ax.plot(randn(1000).cumsum())
handling fig.savefig("out/withoutlabls.pdf")
Pandas package
Series
DataFrame
Import/Export data
Here, we create a Figure object as well as a subplot and fill it
Visual
illustrations with a line plot of a random walk,
Matplotlib package
Figures and subplots By default matplotlib places the ticks evenly distributed along the
Plot types and styles
Pandas layers
data range. Individual ticks can be set as follows,
Applications By default there is no axis label or title.
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Ticks and labels 245
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Numerical
programming
NumPy package 60
Array basics
Linear algebra
Applications
Time series
0 200 400 600 800 1000
Moving window
Financial applications
© 2019 PyEcon.org
Ticks and labels 246
Essential
concepts
Getting started
Procedural Set ticks and labels
programming
Object-orientation ax.set_xticks([0, 250, 500, 750, 1000])
Numerical ax.set_xlabel("Days", fontsize=20)
programming ax.set_ylabel("Change", fontsize=20)
NumPy package
ax.set_title("Simulation", fontsize=30)
Array basics
Linear algebra
fig.savefig("out/labels.pdf")
Data formats and
handling
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Ticks and labels 247
Essential
concepts
Getting started
Simulation
Procedural
programming
Object-orientation
Numerical
programming
NumPy package 60
Array basics
Linear algebra
Series
DataFrame
Import/Export data
20
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles 0
Pandas layers
Applications
Time series
0 250 500 750 1000
Moving window Days
Financial applications
© 2019 PyEcon.org
Legends 248
Essential
concepts
Getting started Using multiple plots in one subplot one needs a legend.
Procedural
programming ax.legend(loc): Shows the legend at location loc.
Object-orientation
Some options: "best", "upper right", "center left", ...
Numerical
programming
NumPy package Set legend
Array basics
Linear algebra fig = plt.figure(figsize=(15, 10))
Data formats and ax = fig.add_subplot(1, 1, 1)
handling ax.plot(randn(1000).cumsum(), label="first")
ax.plot(randn(1000).cumsum(), label="second")
Pandas package
Series
DataFrame ax.plot(randn(1000).cumsum(), label="third")
Import/Export data ax.legend(loc="best", fontsize=20)
Visual fig.savefig("out/legend.pdf")
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
The legend displays the label and the color of the associated plot,
Applications Using the option "best" the legend will placed in a corner where
Time series
Moving window is does not interfere the plots.
Financial applications
© 2019 PyEcon.org
Legends 249
Essential
concepts
Getting started
Procedural
programming
Object-orientation
80
Numerical
first
programming second
NumPy package 60 third
Array basics
Linear algebra
40
Data formats and
handling
Pandas package 20
Series
DataFrame
Import/Export data 0
Visual
illustrations
20
Matplotlib package
Figures and subplots
Plot types and styles 40
Pandas layers
Applications
60
Time series
0 200 400 600 800 1000
Moving window
Financial applications
© 2019 PyEcon.org
Annotations on a subplot 250
Essential
concepts
Getting started ax.text(x, y, "text", fontsize): Inserts a text into a subplot.
Procedural
programming ax.annotate("text", xy, xytext, arrwoprops): Inserts an ar-
Object-orientation
row with annotations.
Numerical
programming
NumPy package
Annotations
Array basics
ax.text(400, -30, "here", fontsize=50)
Linear algebra
ax.annotate("there",
Data formats and
handling fontsize=40,
Pandas package xy=(0, 0),
Series
xytext=(400, 8),
arrowprops=dict(facecolor="black",
DataFrame
Import/Export data
Visual
shrink=0.05))
illustrations ax.set_yticks([-40, -30, -20, -10, 0, 10, 20, 30, 40])
Matplotlib package fig.savefig("out/arrow.pdf")
Figures and subplots
Plot types and styles
Pandas layers
© 2019 PyEcon.org
Annotations 251
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Numerical
first
programming second
NumPy package third
Array basics
Linear algebra
40
Data formats and
handling 30
there
Pandas package 20
Series
10
DataFrame
Import/Export data 0
Visual 10
here
illustrations
20
Matplotlib package
Figures and subplots 30
Plot types and styles 40
Pandas layers
Applications
Time series
0 200 400 600 800 1000
Moving window
Financial applications
© 2019 PyEcon.org
Annotations 252
Essential
concepts
Getting started
Procedural Annotation Lehman
programming
Object-orientation import pandas as pd
Numerical
from datetime import datetime
programming
NumPy package
date = datetime(2008, 9, 15)
fig = plt.figure(figsize=(16, 8))
Array basics
Linear algebra
© 2019 PyEcon.org
Annotations 253
Essential
concepts
Getting started
NumPy package
Array basics
22500 Lehman Bankruptcy
Linear algebra 20000
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Drawing on a subplot 254
Essential
concepts
Getting started plt.Rectangle((x, y), width, height, angle): Creates a rect-
Procedural
programming angle
Object-orientation
plt.Circle((x,y), radius): Creates a circle.
Numerical
programming
NumPy package Drawing
Array basics
Linear algebra fig = plt.figure(figsize=(6, 6))
Data formats and ax = fig.add_subplot(1, 1, 1)
handling ax.set_xticks([0, 1, 2, 3, 4, 5])
ax.set_yticks([0, 1, 2, 3, 4, 5])
Pandas package
Series
DataFrame rectangle = plt.Rectangle((1.5, 1),
Import/Export data width=0.8, height=2,
Visual color="red", angle=30)
illustrations
Matplotlib package
circ = plt.Circle((3, 3),
Figures and subplots radius=1, color="blue")
Plot types and styles ax.add_patch(rectangle)
Pandas layers
ax.add_patch(circ)
Applications fig.savefig("out/draw.pdf")
Time series
Moving window
Financial applications A list of all available patches can be found here: matplotlib-patches
© 2019 PyEcon.org
Drawing on a subplot 255
Essential
concepts
Getting started
Procedural
programming
Object-orientation 5
Numerical
programming
NumPy package
Array basics 4
Linear algebra
Visual 2
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
1
Pandas layers
Applications
Time series
Moving window 0
Financial applications 0 1 2 3 4 5
© 2019 PyEcon.org
Best practice: Visual illustrations 256
Essential
concepts
Getting started Step 1
Procedural
programming Create a Figure object and subplots
Object-orientation
Numerical
programming
Best practice Step 1
NumPy package
fig, ax = plt.subplots(1, 1, figsize=(16, 8))
Array basics
Linear algebra
Visual
Best practice Step 2
illustrations
Matplotlib package
x = np.arange(0, 10, 0.1)
Figures and subplots y = np.sin(x)
Plot types and styles
ax.scatter(x, y)
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Best practice: Visual illustrations 257
Essential
concepts
Getting started
Procedural
programming
Object-orientation
1.00
Numerical
programming
0.75
NumPy package
Array basics 0.50
Linear algebra
0.25
Data formats and
handling
0.00
Pandas package
Series 0.25
DataFrame
Import/Export data 0.50
Visual 0.75
illustrations
Matplotlib package 1.00
Figures and subplots 0 2 4 6 8 10
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Best practice: Visual illustrations 258
Essential
concepts
Getting started Step 3
Procedural
programming Set colors, markers and line styles
Object-orientation
Numerical
programming
Best practice Step 3
NumPy package
ax.scatter(x, y, color="green", marker="s")
Array basics
Linear algebra
© 2019 PyEcon.org
Best practice: Visual illustrations 259
Essential
concepts
Getting started
Procedural
Sine wave
programming
Object-orientation
1
Numerical
programming
NumPy package
Array basics
Linear algebra
handling
0
Pandas package
Series
DataFrame
Import/Export data
Visual
illustrations
Matplotlib package 1
Figures and subplots 0.0 2.5 5.0 7.5 10.0
Plot types and styles x-value
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Best practice: Visual illustrations 260
Essential
concepts
Getting started Step 5
Procedural
programming Set labels
Object-orientation
Sine wave
programming
Object-orientation
1
Numerical
programming
NumPy package
Array basics
Linear algebra
handling
0
Pandas package
Series
DataFrame
Import/Export data
Visual
illustrations Linear
Matplotlib package 1 Sine
Figures and subplots 0.0 2.5 5.0 7.5 10.0
Plot types and styles x-value
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Section 4.4 262
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Visual illustrations
Numerical
programming
NumPy package
Array basics
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Line plots 263
Essential
concepts
Getting started DataFrame/Series.plot(): Plots a DataFrame or a Series.
Procedural
programming
Object-orientation Simple line plot
Numerical
programming plt.close("all")
NumPy package p = pd.Series(np.random.rand(10).cumsum(),
Array basics
index=np.arange(0, 1000, 100))
p
Linear algebra
Visual
illustrations
## 400 2.151883
Matplotlib package ## 500 2.776987
Figures and subplots ## 600 2.839751
Plot types and styles
Pandas layers
## 700 3.188431
## 800 4.169061
Applications
Time series
## 900 4.923286
Moving window ## dtype: float64
Financial applications
p.plot()
plt.savefig("out/line.pdf")
© 2019 PyEcon.org
Line plots 264
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Numerical 5
programming
NumPy package
Array basics
Linear algebra 4
Data formats and
handling
Pandas package
3
Series
DataFrame
Import/Export data
Visual 2
illustrations
Matplotlib package
Figures and subplots
Plot types and styles 1
Pandas layers
© 2019 PyEcon.org
Line plots 265
Essential
concepts
Getting started
Procedural Line plots
programming
Object-orientation df = pd.DataFrame(np.random.randn(10, 3), index=np.arange(10),
Numerical columns=["a", "b", "c"])
programming
df
NumPy package
Array basics
Linear algebra ## a b c
Data formats and
## 0 1.703615 -1.376905 -1.336154
handling ## 1 -1.402924 0.812501 1.739143
Pandas package
## 2 0.593504 0.699582 0.423217
## 3 1.140647 -1.454363 0.250578
Series
DataFrame
Import/Export data ## 4 -0.044809 0.438279 -0.821514
Visual ## 5 1.897959 -0.254581 0.157704
illustrations ## 6 0.782639 1.196116 0.763081
Matplotlib package
Figures and subplots
## 7 0.577947 1.815039 1.175842
Plot types and styles ## 8 -0.278585 -0.538956 0.102930
Pandas layers ## 9 -0.091891 0.310788 -0.857167
Applications
Time series df.plot(figsize=(15, 12))
Moving window
plt.savefig("out/line2.pdf")
Financial applications
© 2019 PyEcon.org
Line plots 266
Essential
concepts
Getting started
Procedural
2.0 a
programming b
Object-orientation c
Numerical
1.5
programming
NumPy package
Array basics
Linear algebra 1.0
© 2019 PyEcon.org
Plotting and pandas 267
Essential
concepts
Getting started The plot method applied to a DataFrame plots each column as a
Procedural
programming different line and shows the legend automatically. Plotting DataFrames,
Object-orientation
there are serveral arguments to change the style of the plot:
Numerical
programming
NumPy package
Array basics Argument Description
Linear algebra
kind "line", "bar", etc
Data formats and
handling logy logarithmic scale on Y-axis
Pandas package
Series
use_index If True, use index for tick labels
DataFrame
rot Rotation of tick labels
Import/Export data
Visual
xticks Values for x ticks
illustrations
Matplotlib package
yticks Values for y ticks
Figures and subplots grid Set grid True or False
Plot types and styles
Pandas layers xlim X-axis limits
Applications ylim Y-axis limits
Time series
Moving window
subplots Plot each DataFrame column in a new subplot
Financial applications
© 2019 PyEcon.org
Pandas plot 268
Essential
concepts
Getting started
Procedural
Separated line plots
programming
Object-orientation df.plot(grid=True, rot=45, subplots=True, title="Example",
Numerical figsize=(15, 10))
programming plt.savefig("out/pandas.pdf")
NumPy package
Array basics
Linear algebra
Example
Data formats and
handling
2
a
Pandas package
1
Series
0
DataFrame
Import/Export data 1
Visual b
illustrations 1
Matplotlib package
0
Figures and subplots
1
Plot types and styles
Pandas layers
1.5 c
Applications 1.0
0.5
Time series 0.0
0.5
Moving window
1.0
Financial applications 1.5
0
© 2019 PyEcon.org 8
Standard creation of plots and pandas 269
Essential
concepts
Getting started dataframe.plot(ax=subplot): Plots a dataframe into subplot.
Procedural
programming
Object-orientation Standard creation
Numerical
programming
fig = plt.figure(figsize=(6, 6))
NumPy package ax = fig.add_subplot(1, 1, 1)
Array basics guests = np.array([[1334, 456], [1243, 597], [1477, 505],
Linear algebra
[1502, 404], [854, 512], [682, 0]])
Data formats and canteen = pd.DataFrame(guests,
handling
Pandas package
index=["Mon", "Tue", "Wed",
Series "Thu", "Fri", "Sat"],
DataFrame columns=["Zentral", "Turm"])
Import/Export data
canteen
Visual
illustrations
Matplotlib package
## Zentral Turm
Figures and subplots ## Mon 1334 456
Plot types and styles ## Tue 1243 597
Pandas layers
## Wed 1477 505
Applications
## Thu 1502 404
## Fri 854 512
Time series
Moving window
Financial applications ## Sat 682 0
© 2019 PyEcon.org
Standard creation of plots and pandas 270
Essential
concepts
Getting started
Procedural Bar plot
programming
Object-orientation canteen.plot(ax=ax, kind="bar")
Numerical ax.set_ylabel("guests", fontsize=20)
programming
ax.set_title("Canteen use in Göttingen", fontsize=20)
NumPy package
Array basics
fig.savefig("out/canteen.pdf")
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Bar plot 271
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Canteen use in Göttingen
Numerical Zentral
programming Turm
NumPy package
1400
Array basics
Linear algebra 1200
Data formats and
handling 1000
guests
Pandas package
Series
800
DataFrame
Import/Export data
600
Visual
illustrations
Matplotlib package
400
Figures and subplots
Plot types and styles 200
Pandas layers
Applications 0
Mon
Tue
Wed
Thu
Fri
Sat
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Bar plot 272
Essential
concepts
Getting started
Procedural Bar plot - stacked
programming
Object-orientation canteen.plot(ax=ax, kind="bar", stacked=True)
Numerical ax.set_ylabel("guests", fontsize=20)
programming
ax.set_title("Canteen use in Göttingen", fontsize=20)
NumPy package
Array basics
fig.savefig("out/canteenstacked.pdf")
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Bar plot 273
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Canteen use in Göttingen
Numerical 2000 Zentral
programming Turm
NumPy package Zentral
1750 Turm
Array basics
Linear algebra
1500
Data formats and
handling
1250
guests
Pandas package
Series
DataFrame 1000
Import/Export data
Visual
750
illustrations
Matplotlib package 500
Figures and subplots
Plot types and styles 250
Pandas layers
Applications 0
Mon
Tue
Wed
Thu
Fri
Sat
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Plot financial data 274
Essential
concepts
Getting started
Procedural BTC chart
programming
Object-orientation fig = plt.figure(figsize=(16, 8))
Numerical ax = fig.add_subplot(1, 1, 1)
programming ax.set_ylabel("price", fontsize=20)
NumPy package
ax.set_xlabel("Date", fontsize=20)
Array basics
Linear algebra
BTC = pd.read_csv("data/btc-eur.csv", index_col=0, parse_dates=True)
Data formats and
BTCclose = BTC["Close"]
handling BTCclose.plot(ax=ax)
Pandas package
ax.set_title("BTC-EUR", fontsize=20)
fig.savefig("out/btc.pdf")
Series
DataFrame
Import/Export data
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Plot financial data 275
Essential
concepts
Getting started
Procedural
programming
Object-orientation BTC-EUR
Numerical
15000
programming
NumPy package
12500
Array basics
Linear algebra
10000
price
DataFrame
Import/Export data
2500
Visual 0
illustrations
2 3 4 5 6 7 8 9
201 201 201 201 201 201 201 201
Matplotlib package
Date
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Plot financial data 276
Essential
concepts
Getting started
Procedural Compare - bad illustration
programming
Object-orientation amazon = pd.read_csv("data/amzn.csv", index_col=0,
Numerical parse_dates=True)["Close"]
programming
siemens = pd.read_csv("data/sie.de.csv", index_col=0,
NumPy package
Array basics
parse_dates=True)["Close"]
Linear algebra fig = plt.figure(figsize=(16, 8))
Data formats and ax = fig.add_subplot(1, 1, 1)
handling ax.set_ylabel("price")
amazon.plot(ax=ax, label="Amazon")
Pandas package
Series
DataFrame siemens.plot(ax=ax, label="Siemens")
Import/Export data ax.legend(loc="best")
Visual fig.savefig("out/compare.pdf")
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
In this illustration you can hardly compare the trend of the two
Applications stocks,
Time series
Moving window
Using pandas you can standardize both dataframes in one line.
Financial applications
© 2019 PyEcon.org
Plot financial data 277
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Amazon
Numerical Siemens
1400
programming
NumPy package 1200
Array basics
Linear algebra 1000
800
handling
Pandas package 600
Series
DataFrame 400
Import/Export data
200
Visual
illustrations
7-03 7-0
5
7-0
7
7-0
9
7-1
1
8-0
1
8-0
3
Matplotlib package 201 201 201 201 201 201 201
Date
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Plot financial data 278
Essential
concepts
Getting started
Procedural Compare - good illustration
programming
Object-orientation amazon = amazon/amazon[0] * 100
Numerical siemens = siemens/siemens[0] * 100
programming
fig = plt.figure(figsize=(16, 8))
NumPy package
Array basics
ax = fig.add_subplot(1, 1, 1)
Linear algebra ax.set_ylabel("percentage")
Data formats and amazon.plot(ax=ax, label="Amazon")
handling siemens.plot(ax=ax, label="Siemens")
ax.legend(loc="best")
Pandas package
Series
DataFrame fig.savefig("out/comparenew.pdf")
Import/Export data
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Plot financial data 279
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Amazon
Numerical Siemens
programming
160
NumPy package
Array basics
Linear algebra
140
percentage
Visual
illustrations
7-03 7-0
5
7-0
7
7-0
9
7-1
1
8-0
1
8-0
3
Matplotlib package 201 201 201 201 201 201 201
Date
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Chapter 5 280
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Applications
Numerical
programming
NumPy package
Array basics
5.1 Time series
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Section 5.1 281
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Applications
Numerical
programming
NumPy package
Array basics
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Date and time data types 282
Essential
concepts
Getting started Data types for date and time are included in the Python standard
Procedural
programming library.
Object-orientation
Numerical
programming
Datetime creation
NumPy package from datetime import datetime
Array basics now = datetime.now()
now
Linear algebra
Visual ## 28
illustrations
Matplotlib package
Figures and subplots
now.hour
Plot types and styles
Pandas layers ## 16
Applications
Time series From datetime you can get the attributes year, month, day, hour,
minute, second, microsecond.
Moving window
Financial applications
© 2019 PyEcon.org
Set datetime 283
Essential
concepts
Getting started datetime(year, month, day, ..., microsecond): Sets date and
Procedural
programming time.
Object-orientation
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Time difference 284
Essential
concepts
Getting started timedelta(days, seconds, microseconds): Represents difference
Procedural
programming between two datetime objects.
Object-orientation
Numerical
programming
Datetime difference
NumPy package from datetime import timedelta
Array basics delta = exam - now
delta
Linear algebra
© 2019 PyEcon.org
Convert string and datetime 285
Essential
concepts
Getting started datetime.strftime("format"): Converts datetime object into string.
Procedural
programming datetime.strptime(datestring, "format"): Converts date as a
Object-orientation
string into a datetime object.
Numerical
programming
NumPy package Convert Datetime
Array basics
Linear algebra
stamp = datetime(2018, 4, 12)
Data formats and
stamp
handling
Pandas package ## datetime.datetime(2018, 4, 12, 0, 0)
Series
DataFrame
Import/Export data
print("German date format: " + stamp.strftime("%d.%m.%Y"))
Visual
illustrations
## German date format: 12.04.2018
Matplotlib package
Figures and subplots val = "2018-5-5"
Plot types and styles d = datetime.strptime(val, "%Y-%m-%d")
Pandas layers
d
Applications
Time series
## datetime.datetime(2018, 5, 5, 0, 0)
Moving window
Financial applications
© 2019 PyEcon.org
Convert string and datetime 286
Essential
concepts
Getting started
Procedural Converting examples
programming
Object-orientation val = "31.01.2012"
Numerical d = datetime.strptime(val, "%d.%m.%Y")
programming
d
NumPy package
Array basics
Linear algebra
## datetime.datetime(2012, 1, 31, 0, 0)
Data formats and
handling now.strftime("Today is %A and we are in week %W of the year %Y.")
Pandas package
Series ## 'Today is Sunday and we are in week 16 of the year 2019.'
DataFrame
now.strftime("%c")
Import/Export data
Visual
illustrations
Matplotlib package
## 'Sun 28 Apr 2019 04:26:48 PM '
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Overview: Datetime formats 287
Essential
concepts
Getting started
Procedural
programming Type Description
Object-orientation
%Y 4-digit year
Numerical
programming %m 2-digit month [01, 12]
NumPy package
Array basics %d 2-digit day [01, 31]
Linear algebra
%H Hour (24-hour clock) [00, 23]
Data formats and
handling %I Hour (12-hour clock) [01, 12]
Pandas package
Series
%M 2-digit minute [00, 59]
DataFrame %S Second [00, 61]
Import/Export data
Visual
%W Week number of the year [00, 53]
illustrations
Matplotlib package
%F Shortcut for %Y-%m-%d
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Overview : Datetime formats 288
Essential
concepts
Getting started
Procedural
programming Type Description
Object-orientation
%a Abbreviated weekday name
Numerical
programming %A Full weekday name
NumPy package
Array basics %b Abbreviated month name
Linear algebra
%B Full month name
Data formats and
handling %c Full date and time
Pandas package
Series
%x Locale-appropriate formatted date
DataFrame
Import/Export data
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Generating date ranges with pandas 289
Essential
concepts
Getting started pd.date_range(start, end, freq): Generates a date range.
Procedural
programming
Object-orientation Date ranges
Numerical
programming import pandas as pd
NumPy package index = pd.date_range("2018-01-01", now)
Array basics index[0:2]
Linear algebra
index[15:16]
Data formats and
handling
index = pd.date_range("2018-01-01", now, freq="M")
Pandas package index[0:2]
## DatetimeIndex(['2018-01-01', '2...ype='datetime64[ns]', freq='D')
Series
DataFrame
Import/Export data ## DatetimeIndex(['2018-01-16'], dtype='datetime64[ns]', freq='D')
Visual ## DatetimeIndex(['2018-01-31', '2...ype='datetime64[ns]', freq='M')
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Overview: Time series frequencies 290
Essential
concepts
Getting started
Procedural
programming Alias Offset type
Object-orientation
D Day
Numerical
programming B Business day
NumPy package
Array basics H Hour
Linear algebra
T Minute
Data formats and
handling S Second
Pandas package
Series
M Month end
DataFrame BM Business month end
Import/Export data
Visual
Q-JAN, Q-FEB, ... Quarter end
illustrations
Matplotlib package
A-JAN, A-FEB, ... Year end
Figures and subplots AS-JAN, AS-FEB, ... Year begin
Plot types and styles
Pandas layers
BA-JAN, BA-FEB, ... Business year end
Applications BAS-JAN, BAS-FEB, ... Business year begin
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Resample date ranges 291
Essential
concepts
Getting started DataFrame.resample("frequency"): Resamples time series by a
Procedural
programming specified frequency.
Object-orientation
Numerical
programming
Resample date ranges
NumPy package import numpy as np
Array basics
Linear algebra
start = datetime(2016, 1, 1)
ind = pd.date_range(start, now)
Data formats and
handling numbers = np.arange((now - start).days + 1)
Pandas package df = pd.DataFrame(numbers, index=ind)
Series
DataFrame
Import/Export data
Visual
illustrations
df.head() df.resample("3BM").sum().head()
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers ## 0 ## 0
Applications ## 2016-01-01 0 ## 2016-01-29 406
Time series ## 2016-01-02 1 ## 2016-04-29 6734
Moving window
## 2016-01-03 2 ## 2016-07-29 15015
Financial applications
## 2016-01-04 3 ## 2016-10-31 24205
## 2016-01-05 4 ## 2017-01-31 32246
© 2019 PyEcon.org
Section 5.2 292
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Applications
Numerical
programming
NumPy package
Array basics
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Moving window functions 293
Essential
concepts
Getting started DataFrame.rolling(window): Conducts rolling window computa-
Procedural
programming tions.
Object-orientation
© 2019 PyEcon.org
Moving window functions 294
Essential
concepts
Getting started
Procedural
programming
Object-orientation
1500
Amazon price and rolling mean
Amazon
Numerical Rolling mean
programming 1400
NumPy package
Array basics 1300
Linear algebra
1200
Data formats and
price
handling
1100
Pandas package
Series
1000
DataFrame
Import/Export data
900
Visual
illustrations
7-03 7-0
5
7-0
7
7-0
9
7-1
1
8-0
1
8-0
3
Matplotlib package 201 201 201 201 201 201 201
Date
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Moving window functions 295
Essential
concepts
Getting started
Procedural Standard deviation
programming
Object-orientation fig = plt.figure(figsize=(16, 8))
Numerical ax = fig.add_subplot(1, 1, 1)
programming pfizer = pd.read_csv("data/pfe.csv", index_col=0,
NumPy package
parse_dates=True)["Adj Close"]
Array basics
Linear algebra
pg = pd.read_csv("data/pg.csv", index_col=0,
Data formats and
parse_dates=True)["Adj Close"]
handling prices = pd.DataFrame(index=amazon.index)
Pandas package
prices["amazon"] = pd.DataFrame(amazon)
prices["pfizer"] = pd.DataFrame(pfizer)
Series
DataFrame
Import/Export data prices["pg"] = pd.DataFrame(pg)
Visual prices_std = prices.rolling(window=20).std()
illustrations prices_std.plot(ax=ax)
Matplotlib package
Figures and subplots
ax.set_title("Standard deviation", fontsize=25)
Plot types and styles fig.savefig("out/std.pdf")
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Moving window functions 296
Essential
concepts
Getting started
Procedural
programming
Object-orientation Standard deviation
amazon
Numerical pfizer
70 pg
programming
NumPy package 60
Array basics
50
Linear algebra
Visual 0
illustrations
5 7 9 1 1 3
7-0 7-0 7-0 7-1 8-0 8-0
Matplotlib package 201 201 201 201 201 201
Date
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Moving window functions 297
Essential
concepts
Getting started
Procedural Logarithmic standard deviation
programming
Object-orientation fig = plt.figure(figsize=(16, 8))
Numerical ax = fig.add_subplot(1, 1, 1)
programming
prices_std.plot(ax=ax, logy=True)
NumPy package
Array basics
ax.set_title("Logarithmic standard deviation", fontsize=25)
Linear algebra fig.savefig("out/std_log.pdf")
Data formats and
handling
Pandas package
Series
DataFrame
Import/Export data
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Moving window functions 298
Essential
concepts
Getting started
Procedural
programming
Object-orientation
102
Logarithmic standard deviation
amazon
Numerical pfizer
pg
programming
NumPy package
Array basics
101
Linear algebra
DataFrame
Import/Export data
Visual
illustrations
5 7 9 1 1 3
7-0 7-0 7-0 7-1 8-0 8-0
Matplotlib package 201 201 201 201 201 201
Date
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Exponentially weighted functions 299
Essential
concepts
Getting started DataFrame.ewm(span): Computes exponentially weighted rolling win-
Procedural
programming dow functions.
Object-orientation
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Exponentially weighted functions 300
Essential
concepts
Getting started
Procedural
programming
Object-orientation
1500
Exponentially weighted functions
Rolling mean
Numerical Exp mean
Amazon price
programming 1400
NumPy package
Array basics 1300
Linear algebra
1200
Data formats and
handling
1100
Pandas package
Series
1000
DataFrame
Import/Export data
900
Visual
illustrations
7-03 7-0
5
7-0
7
7-0
9
7-1
1
8-0
1
8-0
3
Matplotlib package 201 201 201 201 201 201 201
Date
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Binary moving window functions 301
Essential
concepts
Getting started DataFrame.pct_change(): Computes the percentage changes per
Procedural
programming period.
Object-orientation
Numerical
programming
Percentage change
NumPy package
fig = plt.figure(figsize=(16, 8))
ax = fig.add_subplot(1, 1, 1)
Array basics
Linear algebra
© 2019 PyEcon.org
Binary moving window functions 302
Essential
concepts
Getting started
Procedural
programming
Object-orientation Returns
amazon
Numerical 0.125 pfizer
pg
programming
NumPy package 0.100
Array basics
0.075
Linear algebra
0.050
Data formats and
handling
0.025
Pandas package
Series 0.000
DataFrame
Import/Export data 0.025
Visual 0.050
illustrations
3 5 7 9 1 1 3
7-0 7-0 7-0 7-0 7-1 8-0 8-0
Matplotlib package 201 201 201 201 201 201 201
Date
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Binary moving window functions 303
Essential
concepts
Getting started DataFrame.rolling().corr(benchmark): Computes correlation be-
Procedural
programming tween two time series.
Object-orientation
Numerical Correlation
programming
NumPy package fig = plt.figure(figsize=(16, 8))
Array basics ax = fig.add_subplot(1, 1, 1)
Linear algebra
DJI = pd.read_csv("data/dji.csv", index_col=0,
Data formats and parse_dates=True)["Adj Close"]
handling
Pandas package
DJI_ret = DJI.pct_change()
Series corr = returns.rolling(window=20).corr(DJI_ret)
DataFrame corr.plot(ax=ax)
ax.grid()
Import/Export data
Visual
illustrations
ax.set_title("20 days correlation", fontsize=25)
Matplotlib package fig.savefig("out/corr.pdf")
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Binary moving window functions 304
Essential
concepts
Getting started
Procedural
programming
Object-orientation 20 days correlation
Numerical 0.8
programming
NumPy package 0.6
Array basics
Linear algebra 0.4
Pandas package
0.0
Series
DataFrame
0.2
Import/Export data
amazon
Visual pfizer
0.4 pg
illustrations
5 7 9 1 1 3
7-0 7-0 7-0 7-1 8-0 8-0
Matplotlib package 201 201 201 201 201 201
Date
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Section 5.3 305
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Applications
Numerical
programming
NumPy package
Array basics
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Cumulative returns 306
Essential
concepts
Getting started
Procedural Returns
programming
Object-orientation fig = plt.figure(figsize=(16, 8))
Numerical ax = fig.add_subplot(1, 1, 1)
programming ret_index = (1 + returns).cumprod()
NumPy package
stocks = ["amazon", "pfizer", "pg"]
Array basics
Linear algebra
for i in stocks:
Data formats and
ret_index[i][0] = 1
handling ret_index.tail()
Pandas package
Series
## amazon pfizer pg
DataFrame
Import/Export data
## Date
Visual
## 2018-02-15 1.715298 1.088693 0.932322
illustrations ## 2018-02-16 1.699961 1.105461 0.934471
Matplotlib package ## 2018-02-20 1.723031 1.097840 0.920217
## 2018-02-21 1.740128 1.090218 0.907772
Figures and subplots
Plot types and styles
Pandas layers ## 2018-02-22 1.742968 1.090218 0.914560
Applications
Time series ret_index.plot(ax=ax)
Moving window ax.set_title("Cumulative returns", fontsize=25)
Financial applications
fig.savefig("out/cumret.pdf")
© 2019 PyEcon.org
Cumulative returns 307
Essential
concepts
Getting started
Procedural
programming
Object-orientation Cumulative returns
amazon
Numerical pfizer
pg
programming
NumPy package 1.6
Array basics
Linear algebra
1.4
Data formats and
handling
Pandas package 1.2
Series
DataFrame
Import/Export data 1.0
Visual
illustrations
7-03 7-0
5
7-0
7
7-0
9
7-1
1
8-0
1
8-0
3
Matplotlib package 201 201 201 201 201 201 201
Date
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Cumulative returns 308
Essential
concepts
Getting started
Procedural Monthly returns
programming
Object-orientation returns_m = ret_index.resample("BM").last().pct_change()
Numerical returns_m.head()
programming
NumPy package
Array basics
## amazon pfizer pg
Linear algebra ## Date
Data formats and
## 2017-02-28 NaN NaN NaN
handling ## 2017-03-31 0.049110 0.002638 -0.013396
Pandas package
## 2017-04-28 0.043371 -0.008477 -0.020604
## 2017-05-31 0.075276 -0.028124 0.008703
Series
DataFrame
Import/Export data ## 2017-06-30 -0.026764 0.028790 -0.010671
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Volatility calculation 309
Essential
concepts
Getting started
Procedural Volatility
programming
Object-orientation fig = plt.figure(figsize=(16, 8))
Numerical ax = fig.add_subplot(1, 1, 1)
programming
vola = returns.rolling(window=20).std() * np.sqrt(20)
NumPy package
Array basics
vola.plot(ax=ax)
Linear algebra ax.set_title("Volatility", fontsize=25)
Data formats and fig.savefig("out/vola.pdf")
handling
Pandas package
Series
DataFrame
Import/Export data
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Volatility calculation 310
Essential
concepts
Getting started
Procedural
programming
Object-orientation Volatility
0.14 amazon
Numerical pfizer
pg
programming
0.12
NumPy package
Array basics
0.10
Linear algebra
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Group analysis 311
Essential
concepts
Getting started DataFrame.describe(): Shows a statistical summary.
Procedural
programming
Object-orientation Describe
Numerical
programming
prices.describe()
NumPy package
Array basics ## amazon pfizer pg
Linear algebra ## count 252.000000 251.000000 252.000000
Data formats and ## mean 1044.521903 33.892665 87.934304
handling
## std 158.041844 1.694680 2.728659
Pandas package
Series
## min 843.200012 30.872143 79.919998
DataFrame ## 25% 953.567474 32.593733 86.241475
Import/Export data
## 50% 988.680023 33.147469 87.863598
Visual ## 75% 1136.952484 35.331834 90.363035
illustrations
Matplotlib package
## max 1485.339966 38.661823 92.988976
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Return analysis 312
Essential
concepts
Getting started
Procedural Histogram
programming
Object-orientation fig, ax = plt.subplots(3, 1, figsize=(10, 8), sharex=True)
Numerical for i in range(3):
programming
ax[i].set_title(stocks[i])
NumPy package
Array basics
returns[stocks[i]].hist(ax=ax[i], bins=50)
Linear algebra fig.savefig("out/return_hist.pdf")
Data formats and
handling
Pandas package
Series
DataFrame
Import/Export data
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Return analysis 313
Essential
concepts
Getting started
Procedural amazon
programming
40
Object-orientation
Numerical
30
programming
20
NumPy package
Array basics 10
Linear algebra
0
Data formats and pfizer
handling
40
Pandas package
Series 30
DataFrame
Import/Export data 20
Visual 10
illustrations
Matplotlib package 0
pg
Figures and subplots
30
Plot types and styles
Pandas layers
20
Applications
Time series
10
Moving window
Financial applications
0
0.050 0.025 0.000 0.025 0.050 0.075 0.100 0.125
© 2019 PyEcon.org
Ordinary Least Squares 314
Essential
concepts
Getting started Using the statsmodels module to determine regressions:
Procedural
programming Series.tolist(): Returns a list containing the DataFrame values.
Object-orientation
sm.OLS(Y, X).fit(): Computes OLS fit of data (X, Y).
Numerical
programming
NumPy package Regression data
Array basics
Linear algebra import statsmodels.api as sm
Data formats and
handling fig = plt.figure(figsize=(16, 8))
Pandas package
Series
ax = fig.add_subplot(1, 1, 1)
DataFrame Y = np.array(amazon.loc["2018-1-1":"2018-1-15"].tolist())
Import/Export data X = np.arange(len(Y))
Visual ax.scatter(x=X, y=Y, marker="o", color="red")
illustrations
fig.savefig("out/reg_data.pdf")
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Ordinary Least Squares 315
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Numerical 1300
programming
NumPy package
Array basics 1280
Linear algebra
1260
Data formats and
handling
Pandas package 1240
Series
DataFrame
1220
Import/Export data
Visual
illustrations 1200
Matplotlib package
Figures and subplots 0 1 2 3 4 5 6 7 8
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Ordinary Least Squares 316
Essential
concepts
Getting started
Procedural Regression
programming
Object-orientation X_reg = sm.add_constant(X)
Numerical res = sm.OLS(Y, X_reg).fit()
programming
b, a = res.params
NumPy package
Array basics
ax.plot(X, a * X + b)
Linear algebra fig.savefig("out/ols.pdf")
Data formats and
handling
Pandas package
Series
DataFrame
Import/Export data
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Ordinary Least Squares 317
Essential
concepts
Getting started Summary of OLS regression. To print in python use res.summary().
Procedural
programming
Object-orientation
Numerical
programming
NumPy package
Array basics
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Ordinary Least Squares 318
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Numerical 1300
programming
NumPy package
1280
Array basics
Linear algebra
1260
Data formats and
handling
Pandas package 1240
Series
DataFrame
1220
Import/Export data
Visual
1200
illustrations
Matplotlib package
Figures and subplots 0 1 2 3 4 5 6 7 8
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Newton-Raphson 319
Essential
concepts
Getting started The Newton-Raphson method is an algorithm for finding successively
Procedural
programming better approximations to the roots of real-valued functions.
Object-orientation
Numerical
programming
Let F : Rk → Rk be a continuously differentiable function and JF (xn )
NumPy package the Jacobian matrix of F . The recursive Newton-Raphson method to
Array basics
Linear algebra find the root of F is given by:
Data formats and
Applications
Time series
Moving window Accordingly, we can determine the optimum of the function f by
Financial applications
applying the method instead to f 0 = df /dx .
© 2019 PyEcon.org
Newton-Raphson 320
Essential
concepts
Getting started As an illustrative application, we consider the function
Procedural
programming
Object-orientation
f (x ) = 3x 3 + 3x 2 − 5x , x ∈ R,
Numerical
programming which is represented by the blue line in the following diagram. The
NumPy package
Array basics
figure depicts the iterative solution path applying the Newton-Raphson
Linear algebra method to find the root, e. g., x solving f (x ) = 0, by tangent points
Data formats and
handling
and tangents starting from the intial guess x0 = −1.
Pandas package
Series
15.0 f(x)
DataFrame
Import/Export data
12.5
Visual
illustrations
10.0
Matplotlib package
Figures and subplots
7.5
Plot types and styles
Pandas layers
5.0
Applications
Time series 2.5
Moving window
Financial applications
0.0
x0 x3 x2 x1
1.5 1.0 0.5 0.0 0.5 1.0 1.5
© 2019 PyEcon.org
Newton-Raphson implementation 321
Essential
concepts
Getting started The first step involves the definition of the function f (x ) and its
Procedural
programming derivation f 0 (x ) in Python:
Object-orientation
Numerical
programming
Newton-Raphson requirements
NumPy package
def f(x):
Array basics
Linear algebra
return 3*x**3 + 3*x**2 - 5*x
Data formats and
handling
Pandas package
def df(x):
return 9*x**2 + 6*x - 5
Series
DataFrame
Import/Export data
© 2019 PyEcon.org
Newton-Raphson implementation 322
Essential
concepts
Getting started
Procedural Newton-Raphson
programming
Object-orientation def newton_raphson(fun, dfun, x0, e):
Numerical delta = abs(fun(x0))
programming
while delta > e:
NumPy package
Array basics
ax.scatter(x0, f(x0), color="red", s=80)
Linear algebra x0 = x0 - fun(x0) / dfun(x0)
Data formats and delta = abs(fun(x0))
handling ax.scatter(x0, f(x0), color="black", s=80)
Pandas package
Series
return(x0)
DataFrame
Import/Export data fig = plt.figure(figsize=(16, 8))
Visual ax = fig.add_subplot(1, 1, 1)
illustrations
x = np.arange(-1.5, 1.7, 0.001)
Matplotlib package
Figures and subplots ax.plot(x, f(x))
Plot types and styles ax.grid()
Pandas layers
x_root = newton_raphson(f, df, -1, 0.1)
Applications fig.savefig("out/newton_raphson_root.pdf")
Time series
print(f"Root at: {x_root:.4f}")
Moving window
Financial applications
## Root at: 0.8878
© 2019 PyEcon.org
Newton-Raphson implementation 323
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Numerical 14
programming
NumPy package 12
Array basics
Linear algebra 10
Matplotlib package
2
Figures and subplots 1.5 1.0 0.5 0.0 0.5 1.0 1.5
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
Newton-Raphson optimization 324
Essential
concepts
Getting started With the definition of the second derivative f 00 , i.e. the derivative of the
Procedural
programming derivative, we can employ the Newton-Raphson method to obtain an
Object-orientation
optimum of the target function f (x ) numerically. Hence, the previous
Numerical
programming example needs only minimal modifications:
NumPy package
Array basics
Linear algebra
Newton-Raphson
Data formats and def ddf(x):
handling
Pandas package
return 18*x + 6
Series
DataFrame fig = plt.figure(figsize=(16, 8))
Import/Export data
ax = fig.add_subplot(1, 1, 1)
Visual
illustrations
x = np.arange(-1.5, 1.7, 0.001)
Matplotlib package ax.plot(x, f(x))
Figures and subplots ax.grid()
Plot types and styles
x_opt = newton_raphson(df, ddf, 1, 0.1)
fig.savefig("out/newton_raphson_optimum.pdf")
Pandas layers
Applications
Time series
print(f"Minimum at: {x_opt:.4f}")
Moving window
Financial applications ## Minimum at: 0.4886
© 2019 PyEcon.org
Newton-Raphson optimization 325
Essential
concepts
Getting started
Procedural
programming
Object-orientation
15.0
Numerical
programming
NumPy package 12.5
Array basics
Linear algebra 10.0
DataFrame
Import/Export data 2.5
Visual
illustrations 0.0
Matplotlib package
Figures and subplots 1.5 1.0 0.5 0.0 0.5 1.0 1.5
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org
The End... but not finally 326
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Numerical
programming
NumPy package
Array basics
Linear algebra
Visual
illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Applications
Time series
Moving window
Financial applications
© 2019 PyEcon.org