100% found this document useful (6 votes)
285 views

Python For Finance

Uploaded by

Eduardo Antonio
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (6 votes)
285 views

Python For Finance

Uploaded by

Eduardo Antonio
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 990

ANALYZE BIG FINANCIAL DATA

Yves Hilpisch
PythonforFinance
YvesHilpisch
Beijing•Cambridge•Farnham•Köln•Sebastopol•Tokyo
Preface
financialindustry.Bycontrast,in2014therearemanyexamplesoflarge
technologywasconsideredexotic—ifnotcompletelyirrelevant—inthe
Nottoolongago,Pythonasaprogramminglanguageandplatform
financialinstitutions—likeBankofAmericaMerrillLynchwithi
Python
maintainsomeoftheircoreITsystems.Therei
project,orJPMorganChasewiththeAthenaproject—thatstrategicallyuse
alongsideotherestablishedtechnologiestobuild,enhance,andts Quartz
s alsoamultitudeoflarger
andsmallerhedgefundsthatmakeheavyuse of Python’sandcapabilitieswhen
itcomestoefficientfinancialapplicationdevelopment productive
financialanalyticse f o r t s .
Similarly,manyoftoday’sMaster of FinancialEngineeringprograms(or
programsawardingsimilardegrees)use Python asoneofthecore
languagesforteachingthetranslationofquantitativefinancetheoryinto
executablecomputercode.Educationalprogramsandtrainingstargetedto
Python hashadsuchrecentsuccessand
financeprofessionalsarealsoincreasinglyincorporating
curricula.Somenowteachitasthemainimplementationlanguage.
Therearemanyreasonswhy Python intotheir
seems
itechnology,anditsstatusasopensource.(SeeChapter
tdevelopersusingPython,itseaseofintegrationwithalmostanyother
itwillcontinuetodosointhefuture.Amongthesereasons why
are
raries availablemoretoits
syntax,theecosystem of scientific anddataanalyticslib1forafew
Forthatreason,there is anabundanceofgoodbooksavailablethatteach
insightsinthisregard.)
Python fromdifferentanglesandwithdifferentfocuses.Thisbookisone
ofthefirsttointroduceandteach Pythonfor finance—inparticular,foretails,
one,inthatimplementationandillustrationcomebeforetheoreticald
quantitativefinanceandforfinancialanalytics.Theapproachisapractical
andthebigpicture is generallymorefocusedonthanthemostarcane
parameterizationoptionsofacertainclassorfunction.
Mostofthisbookhasbeenwritteninthepowerful,interactive,browser­
based IPython Notebook environment(explainedinmoredetail in
Chapter2).Thismakesit possibletoprovidethereaderwithexecutable,
interactiveversionsofalmosta l examplesusedint h i s book.
Thosewhowanttoimmediatelygetstartedwithafull­fledged,interactive
financialanalyticsenvironmentfor
Python­based
Platform(incombinationwiththe bPython
comewiththisbook).Youshouldalsohavealookat Notebook analytics,a Julia)
rary. My(and,forinstance,Rand
financialanalyticsliIPython DXfilesandcodethat
shouldgotohttp://oreilly.quant­platform.comandtryoutthePythonQuant
otherbook,Derivatives
AnalyticswithPython(WileyFinance),presentsmoredetailsonthetheory
andnumericalmethodsforadvancedderivativesanalytics.Italsoprovides
awealthofreadilyusable Python code.Furthermaterial,and,inparticular,
slidedecksandvideosoftalksabout Python forQuantFinancecanbe
If youwanttogetinvolvedin Python forQuantFinancecommunity
foundonmyprivatewebsite.
events,there are opportunitiesinthefinancialcentersoftheworld.For
(example,Imyself(co)organizemeetupgroupswiththisfocusinLondon
cf. http://www.meetup.com/Python­for­Quant­Finance­London/) andNew
TherearealsoForPythonQuantsconferencesandworkshopsseveraltimes
YorkCity(cf.http://www.meetup.com/Python­for­Quant­Finance­NYC/).
ayear( c f . http://forpythonquants.comandhttp://pythonquants.com).
I am reallyexcitedthatPythonhasestablisheditselfasanimportant
technologyinthefinancialindustry.Iamalsosurethatitwillplayaneven
moreimportantrolethereinthefuture,infieldslikederivativesandrisk
helpprofessionals,researchers,andstudentsalikemake
analyticsorhighperformancecomputing.Myhopeisthattthemostof
his bookwillPython
whenfacingthechallengesofthis fascinatingfield.
ConventionsUsedinThisBook
Thefollowingtypographicalconventionsareusedinthis book:
ItalicIndicates newterms,URLs,andemailaddresses.
Constant width istings, as well as withinparagraphstoreferto
softwarepackages,programminglanguages,f
Usedforprograml i l e extensions,filenames,
and keywords.
programelementssuch as variable or functionnames,databases,data
types,environmentvariables,statements,
Constant width italic
Showstextthatshouldbereplacedwithuser­suppliedvaluesorby
valuesdeterminedbycontext.
TIP
Thiselementsignifiesatip orsuggestion.
WARNING
Thiselementindicatesawarningorcaution.
UsingCodeExamples
Supplementalmaterial(inparticular,IPythonNotebooksandPython
scripts/modules)i
platform.com. s availablefordownloada t http://oreilly.quant­
Thisbookis
isofferedwiththere tohelpyou get yourjobdone.Ingeneral,i
documentation.Youdonotneedtocontactusforpermissionunlessyou’re
his book,youmayuseitinyourprogramsand f examplecode
reproducingasignificantportionofthecode.Forexample,writinga
programthatusesseveralchunksofcodefromthis bookdoesnotrequire
permission.SellingordistributingaCD­ROMofexamplesfromO’Reilly
significantamountofexamplecode from this bookintoyourproduct’s
andquotingexamplecodedoesnotrequirepermission.Incorporatinga
booksdoesrequirepermission.Answeringaquestionbycitingthisbook
documentationdoesrequirepermission.
Weappreciate,butdonotrequire,a t r i b u t i o n . Anattributionusually
includesthetitle, author,publisher,andISBN.Forexample:“Pythonfor
FinancebyYvesHilpisch(O’Reilly).Copyright2015YvesHilpisch,978­
1­491­94528­5.”
Ipermissiongivenabove,[email protected].
f youfeelyouruseofcodeexamplesfal s outsidefair useorthe
Safari®BooksOnline
NOTEigital librarythatdelivers
expertcontentinbothbookandvideo
SafariBooksOnlineisanon­demandd
authorsintechnologyandbusiness. formfromtheworld’sleading
Technologyprofessionals,softwaredevelopers,webdesigners,and
businessandcreativeprofessionalsuseSafariBooksOnline as their
primaryresourceforresearch,problemsolving,learning,andcertification
training.
SafariBooksOnlineoffersarangeofplansandpricingforenterprise,
government,education,andindividuals.
prepublicationmanuscripts in onefullysearchabledatabasefrompublishers
Membershaveaccesstothousands of books,trainingvideos,and
likeO’ReillyMedia,PrenticeHallProfessional,Addison­Wesley
Professional,MicrosoftPress,Sams,Que,PeachpitPress,FocalPress,
CiscoPress,JohnWiley&Sons,Syngress,MorganKaufmann,IBM
Redbooks,Packt,AdobePress,FTPress,Apress,Manning,NewRiders,
McGraw­Hill,Jones&Bartlet , CourseTechnology,andhundredsmore.
FormoreinformationaboutSafariBooksOnline,pleasev isit usonline.
HowtoContactUs
Pleaseaddresscommentsandquestionsconcerningthis booktothe
publisher:
O’ReillyMedia,Inc.
1005GravensteinHighwayNorth
Sebastopol,CA95472
800­998­9938(intheUnitedStatesorCanada)
707­829­0515(internationalorlocal)
707­829­0104(fax)
Wehaveawebpageforthisbook,where
additionalinformation.Youcanaccesst h iwe
s list errata,examples,andany
pagea t http://bit.ly/python­
finance.
Tocommentorasktechnicalquestionsaboutthis book,sendemailto
[email protected].
ourwebsiteathttp://www.oreilly.com.
Formoreinformationaboutourbooks,courses,conferences,andnews,see
FindusonFacebook:http://facebook.com/oreilly
FollowusonTwitter:http://twitter.com/oreillymedia
WatchusonYouTube:http://www.youtube.com/oreillymedia
Acknowledgments
Iwanttothankal thosewhohelpedtomakethis bookareality, in
workedoutexamples,likeBenLerner,JamesPowell,MichaelSchwed,
particularthosewhohaveprovidedhonestfeedbackorevencompletely
reviewersHughBrown,JenniferPierce,KevinSheppard,andGalen
ThomasWieckiorFelixZumstein.Similarly,Iwouldliketothank
Wilkerson.Thebookbenefitedfromtheirvaluablefeedbackandthemany
Thebook
suggestions.
hasalsobenefitedsignificantlyasaresultoffeedbackIreceived
fromtheparticipantsofthemanyconferencesandworkshopsIwasable to
presentat in2013and2014:PyData,ForPythonQuants,BigDatainQuant
Finance,EuroPython,EuroScipy,PyConDE,PyConIreland,ParallelData
Analysis,BudapestBIForumandCodeJam.Ialsogotvaluablefeedback
duringmymany presentationsat Python meetupsinBerlin,London,and
whatIlovedoingmostandthis,ingeneral,ratherintensively.Writingand
Lastbutnotleast,Iwanttothankmyfamily,whichfullyacceptsthatIdo
NewYorkCity.
timecommitment—ontopofmyusuallyheavyworkloadandpackedtravel
finishingabookofthislengthoverthecourseofayearrequiresalarge
schedule—and makesi t necessarytositsometimesmorehoursinsolitude
infrontthecomputerthanexpected.Therefore,thankyouSandra,L i l i , and
Henryforyourunderstandingandsupport.Idedicatet
lovelywifeSandra,whoi s theheartofourfamily. his booktomy
YvesSaarland,November2014
PartI . PythonandFinance
Thispartintroduces Python forfinance.It consistsofthreechapters:
s indeedwellsuitedtoaddressthetechnologicalchallengesinthePython
Chapter1brieflydiscussesPythoningeneralandargueswhy
ifinanceindustryandinfinancial(data)analytics.
startedwithinteractiveanalyticsandapplicationdevelopmentin
Chapter2,onPythoninfrastructureandto ls, is meanttoprovidea
conciseoverviewofthemostimportantthingsyouhavetoknowtoget
Python; therelatedAppendixAsurveyssomeselectedbestpractices
forPythondevelopment.
l ustrates howtocalculateimpliedvolatilities ofoptionswith Python,
iChapter3immediatelydivesintothreespecificfinancialexamples;it
NumPy, andhowtoimplementa backtestingPythonforatrend­based
howtosimulateafinancialmodelwith andthearraylibrary
investmentstrategy.Thischaptershouldgivethereaderafeeling for
whatitmeanstouse Pythonfor financialanalytics—detailsarenotthat
importantatthis stage;theyareallexplainedinPartII.
Chapter
Finance?1.WhyPythonfor
Banksareessentiallytechnologyfirms.
—HugoBanziger
WhatIsPython?
findthefollowingexecutivesummary
awiderangeofdomainsandtechnicalf (cife.lds. On the Python websiteyou
Python is ahigh­level,multipurposeprogramminglanguagethati s usedin
https://www.python.org/doc/essays/blurb):
Pythonis aninterpreted,object­oriented,high­levelprogramminglanguage
withdynamicsemantics.Itshigh­levelbuiltindatastructures,combined
withdynamictypinganddynamicbinding,makeit veryattractiveforRapid
syntaxemphasizesreadabilityandthereforereducesthecost of program
toconnectexistingcomponentstogether.Python’ssimple,easytolearn
ApplicationDevelopment,aswellasforuseasascriptingorgluelanguage
maintenance.Pythonsupportsmodulesandpackages,whichencourages
programmodularityandcodereuse.ThePythoninterpreterandthe
extensivestandardlibraryareavailableinsourceorbinaryformwithout
chargeforallmajorplatforms,andcanbefreelydistributed.
Thisprettywelldescribeswhy
asof Python hasevolvedintooneofthemajor
programminglanguages today.Nowadays, Python is usedbythe t
beginnerprogrammeraswellasbythehighlyskilledexpertdeveloper,a
schools,inuniversities,a t webcompanies,inlargecorporationsand
financialinstitutions, as wellasinanys cientific field.
Amongothers, Python i s characterizedbythefollowingfeatures:
Pythonand themajorityofsupportinglibraries andtoolsavailableare
Opensource
opensourceandgenerallycomewithquiteflexibleandopenlicenses.
Interpreted
Thereference CPython implementationi s an interpreterofthelanguage
thattranslates Python codeat runtimetoexecutablebytecode.
Multiparadigm
Python supportsdifferentprogrammingandimplementationparadigms,
suchasobjectorientationandimperative,functional,orprocedural
programming.
Multipurpose
Python canbeusedforrapid,interactivecodedevelopmentaswell as
forbuildinglargeapplications;itcanbeusedforlow­levelsystems
operationsaswellasforhigh­levelanalyticstasks.
Cross­platform
Pythonis availableforthemostimportantoperatingsystems,suchas
Windows,Linux,
applications; it ason
serversaswell Mac OS;it isusedtobuilddesktopaswellasweb
canandbeusedonthelargestclustersandmostpowerful
suchsmalldevicesastheRaspberry Pi (cf.
http://www.raspberrypi.org).
Dynamicallytyped
statical yPython
Typesin areingeneralinferredduringruntimeandnot
declaredasinmostcompiledlanguages.
Indentationaware
Incontrasttothemajority ofotherprogramminglanguages, Python
usesindentationformarkingcodeblocksinsteadofparentheses,
brackets,orsemicolons.
Pythonhas automatedgarbage collection,avoidingtheneedforthe
Garbagecollecting
programmertomanagememory. al about,Python
Whenitcomesto Python syntaxandwhat Python i s
EnhancementProposal20—i.e.,theso­called“ZenofPython”—provides
themajorguidelines.Itcanbeaccessedfromeveryinteractiveshellwith
thecommand import this:
$ipython
Python 2.7.6 |Anaconda 1.9.1 (x86_64)| (default, Jan10 2014,
11:23:15) "credits" or "license" for more information.
Type"copyright", enhanced
IPython
?help 2.0.0--An
-> Introduction Interactive Python.
%quickref ->Python's ownhelp system. of IPython's features.
->Quickreference. andoverview
object?
details. ->Details about 'object',use 'object??' for extra
In [1]: import this
The Zen of Python, by Tim Peters
Beautiful
Explicit isbetter
is better thanthanimplicit.
ugly.
Simple
Complexisisbetter
betterthanthancomplex.
complicated.
Flat is better than nested.
Sparse
Readability
Special iscasesbetter
counts. dense. enough
aren'tthanspecial
Although practicality beats purity. to break the rules.
Errors
InUnless shouldofbenever
Theretheface
should
explicitly pass preferably
ambiguity,
one--and
silenced. silently.
refusetheonlytemptationtoguess.
one--obvious waytodo it.
Althoughthat
better waythannever.
IfNowistheimplementation
Althoughneveris maynot hardbe obvious
oftenisbetter at firstit'snow.unless
tothan*right*
explain, a bad you're
idea. Dutch.
Ifthe implementation is easy to explain,it
Namespaces are one honking great idea--let's do more of those! may be a good idea.
BriefHistoryofPython
Although Python mightstil havetheappealofsomethingnewtosome
people, i
effortsbegant hasbeenaroundforquitealongtime.
in by In f a c t , development
the1980s GuidovanRossumfromtheNetherlands.He
iBenevolentDictatorforLifebythe
s stil activeinPythondevelopment and hasbeenawardedthetitle of
Python community(cf.
http://en.wikipedia.org/wiki/History_of_Python).Thefollowingcanbe
consideredmilestonesinthedevelopmentof Python:
Python0.9.0releasedin1991(firstrelease)
Python1.0releasedin1994
Python2.0releasedin2000
Python2.6releasedin2008
Python2.7releasedin2010
Python3.0releasedin2008
Python3.3releasedin2010
Python3.4releasedin2014
Itisremarkable,andsometimesconfusingto
importantly, in parallelusesince2008.Asoft
aretwomajorversionsavailable,s Python
til beingdevelopedand,more newcomers,thatthere
his writing,thiswill keepon
forquiteawhilesinceneitheristhere100%codecompatibilitybetweenthe
versions,norarea l popularlibraries availablefors stil Python
majorityofcodeavailableandinproductioni Python3.x.2.6/2.7,
The and
his bookis basedonthe 2.7.x version,althoughthemajorityofcode
texamplesshouldworkwithversions 3.x aswell.
ThePythonEcosystem
tools.Thesel
AmajorfeatureofPython
programminglanguage,i ibraries andtoolsgenerallyhave
s theavailability to beimportedwhenneeded
of alargenumberoflibrariesand
asan ecosystem,comparedtojustbeinga
to bestartedasaseparatesystemprocess
(libraryavailabletothecurrentnamespaceandthecurrent
e(e..gg..,, aplottinglibrary)orhave
a Python developmentenvironment).Importingmeansmakinga Python
interpreterprocess.
Python i t s e l f alreadycomeswithalargesetofl i
basicinterpreterindifferentdirections.Forexample,basicmathematical
calculationscanbedonewithoutanyimporting,while complex b r a r i e s thatenhancethe
more
mathematicalfunctionsneedto be importedthroughthe math library:
In [2]: 100 * 2.5 + 50
Out[2]: 300.0
In [3]: log(1)
...
NameError: name 'log' is not defined
In [4]: from math import *
In [5]: log(1)
Out[5]: 0.0
Althoughtheso­called“starimport”(
everythingfromalibraryvia i . e . , thepracticeofimporting
from library import *) is sometimes
convenient,oneshouldgenerallyuseanalternativeapproachthatavoids
ibraries. Thisthentakesonthe form: and relationshipsoffunctionsto
lambiguitywithregardtonamespaces
In [6]: import math
In [7]: math.log(1)
Out[7]: 0.0
While
there math
are i s astandard
more Python libraryavailablewithanyi
manyverysamefashionasthestandardlibraries.Suchl n s t a l a
andthatcan
librariesthatcanbeinstalledoptionally ibraries aret i o n ,
beused
tousea inthe
Python distributionthatmakessurethatalllibrariesareconsistent
availablefromdifferent(web)sources.However,itisgenerallyadvisable
each other (see Chapter2formore on this topic).
withThecodeexamplespresentedso farall use IPython (cf.
developmentenvironment(IDE)for Python. Althoughitstartedoutasan
http://www.ipython.org),whichis probablythemostpopularinteractive
(enhancedshellonly,ittodayhasmanyfeaturestypicallyfoundinIDEs
typicallyprovidedbyadvancedtext/codeeditors,likeSublime are
e.g., supportforprofilinganddebugging).ThosefeaturesmissingText(cf.
IPython withone’stext/code editorofchoicetoformthebasictoolsetfor
http://www.sublimetext.com).Therefore,itisnotunusualtocombine
a Python developmentprocess.
IPython is alsosometimescalledthekil er applicationofthe Python
example,itprovidesimprovedcommand­linehistoryfunctionsandallows
ecosystem.Itenhancesthestandardinteractiveshellinmanyways.For
printedbyjustaddinga ?behind thefunctionname(adding??willprovide
foreasyobjectinspection.Forinstance,thehelptextforafunctionis
evenmoreinformation):
In [8]: math.log?
Type:
String builtin_function_or_method
Form:<built-infunctionlog>
log(x[, base])
Docstring:
Return
Ifx. the thebasenot
logarithm of x toreturns
specified, the giventhe base.
natural logarithm (base e) of
In [9]:
IPython comesinthreedifferentversions:ashellversion,onebasedona
QT graphicaluserinterface(the QT console), andabrowser­basedversion
(the Notebook). Thisis just meantasateaser;thereis noneedtoworry
aboutthedetailsnowsinceChapter2introduces IPython inmoredetail.
PythonUserSpectrum
Python doesnotonlyappeal to professionalsoftwaredevelopers;ictieisalso
ofuseforthecasualdeveloperaswellasfordomainexpertsands
developers. n t i f i c
Professionalsoftwaredevelopersfindallthattheyneedtoefficientlybuild
addressedwith al programmingparadigmsaresupported;there
Python. Thesetypesofuserstypicallybuildt
Pythonand
sarepowerfuldevelopmenttoolsavailable;andanytaskcan,inprinciple,be
frameworksandclasses,alsoworkonthefundamental
largeapplications.Almost
cientific stack,andstrivetomakethemostoftheecosystem. heir own
Scientificdevelopersordomainexperts
lenhanceandoptimizeovertime,andt are generallyheavyusersofcertain
ibraries andframeworks,havebuilt thaeilrorownapplicationsthatthey
theecosystemtotheir specific
needs.Thesegroupsofusersalsogenerallyengageinlongerinteractive
sessions,rapidlyprototyping new code as wellasexploringandvisualizing
tCasualprogrammersliketouse
heir researchand/ordomaindatasPython ets. generallyforspecificproblems
theyknowthatPythonhasi t s strengthsin.Forexample,visitingthegallery
pageof
Therei casesmatplotlib, copyingacertainpieceofvisualizationcodeprovided
usethere,andadjustingthecodetot
alsoanotherimportantgroup
formembersofthisgroup. heir specificneedsmightbeabeneficial
programmers,ibecomeaverypopularlanguagea of Python users:beginner
.e., thosethatarejust startingtoprogram.Nowadays,
Pythonhas
tevenschools to t universities,colleges,and
his is thatits basicsyntaxiseasytolearnandeasytounderstand,evenfor
introducestudentstoprogramming.[1]Amajorreasonfor
thenondeveloper.Inaddition,i
programmingstyles.[2] t i s helpfulthat Python supportsalmosta l
TheScientificStack
Thereisacertainsetoflibrariesthati s collectivelylabeledthescientific
stack.Thisstackcomprises,amongothers,thefollowinglibraries:
NumPyNumPyprovides amultidimensionalarrayobjecttostorehomogenousor
operateonthis arrayobject.
heterogeneousdata;italsoprovidesoptimizedfunctions/methodsto
SciPySciPy is acollection of sublibrariesandfunctionsimplementing
as fornumericalintegration.
example,youwillfindfunctions for cubicsplinesinterpolationaswell
importantstandardfunctionalityoftenneededinscienceorfinance;for
matplotlib
Thisi . Python,
s themostpopularplottingandvisualizationlibraryfor
providingboth2Dand3Dvisualizationcapabilities
PyTables
PyTables is apopularwrapperforthetis HDF5
http://www.hdfgroup.org/HDF5/);i alibrarydatastoragelibrary(
to implement cf.
optimized,disk­basedI/Ooperationsbasedonahierarchical
database/fileformat.
pandaspandas buildson NumPy andprovidesricherclassesforthemanagement
andanalysisoftimeseriesandtabulardata;i t is tightlyintegratedwith
matplotlib forplottingand PyTables fordatastorageandr etrieval.
Dependingonthespecificdomainorproblem,t
additionallibraries, whichmoreoftenthannothaveincommonthatthey
his stackisenlargedby
buildontopofoneormoreofthesefundamentallibraries.However,the
leastcommondenominatororbasicbuildingblockingenerali s the NumPy
ndarray class(casaf. Chapter4).
TakingPython programminglanguagealone,there are
elegance.Forexample,
otherlanguagesavailablethat i s
can probably upwithitsanumberof
keep
Ruby quiteapopularlanguageoftencompared
syntaxand to
Python. Onthelanguage’swebsiteyoufindthefollowingdescription:
Adynamic,opensourceprogramminglanguagewithafocusonsimplicity
andproductivity.Ithasanelegantsyntaxthati
write. s naturaltoreadandeasyto
Ruby is theavailabilityofthes
agoodandelegantlanguagetouse,
Themajorityofpeopleusing
distinguishes
exactsamestatementbeing tibutfiwouldprobablyalsoagreewiththe
Python for manyusersfromequallyappealinglanguageslike
madePython
cienabout Python itself.sHowever,what
c stack.Thismakes
alsoonethati Pythonnot only
capableofreplacing
domain­specificlanguagesandtoolsetslike Matlab or R. Inaddition,i
providesbydefaultanythingthatyouwouldexpect,say,asaseasonedweb t
developerorsystemsadministrator.
TechnologyinFinance
Nowthatwehavesomeroughideasofwhat Python is al about,it makes
sensetostep back abitandtobrieflycontemplatetheroleoftechnologyin
finance.Thiswillputusinapositiontobetterjudgetherole Python
alreadyplaysand,evenmoreimportantly,willprobablyplayinthe
financialindustryofthefuture.
Inasense,technologyperseisnothingspecialtofinancialinstitutions(as
(ascomparedtoothercorporatefunctions,likel
compared,forinstance,toindustrialcompanies)ortothefinancefunction
financialin
recentyears,spurredbyinnovationandalsoregulation,banksandother
stitutionslikehedgefundshaveevolvedmoreandmoreinto
ogistics). However,in
as disadvantages.Some ust financialintermediaries. as
wellaroundtheglobe,havingthepotentialtoleadtocompetitiveadvantages
Technologyhasbecomeamajorassetforalmostanyfinancialinstitution
technologycompaniesinsteadofbeingj
reasonsforthis development.backgroundinformationcanshedlight onthe
TechnologySpending
mostontechnologyonanannualbasis.Thefollowingstatementtherefore
Banksandfinancialinstitutionstogetherformtheindustrythatspendsthe
showsnotonlythattechnologyi
thatthefinancialindustryi s alsoreallyimportantto the technologysector:
s important for thefinancialindustry,but
Bankswillspend4.2%more on technologyin2014thantheydidin2013,
accordingtoIDCanalysts.OverallITspendinfinancialservicesglobally
willexceed$430billionin2014andsurpass$500billionby2020,the
analystssay.
—Crosman2013
Large,multinationalbankstodaygenerallyemploythousandsofdevelopers
thatmaintainexistingsystemsandbuildnewones.Largeinvestmentbanks
withheavytechnologicalrequirementsshowtechnologybudgetsoftenof
severalbillionUSDperyear.
TechnologyasEnabler
Thetechnologicaldevelopmenthasalsocontributedtoinnovationsand
efficiencyimprovementsinthefinancialsector:
Technologicalinnovationshavecontributedsignificantlytogreater
efficiencyinthederivativesmarket.Throughinnovationsintrading
technology,tradesatEurexaretodayexecutedmuchfasterthantenyears
agodespitethestrongincreaseintradingvolumeandthenumberofquotes
Thesestrongimprovementshaveonlybeenpossibleduetotheconstant,
highITinvestmentsbyderivativesexchangesandclearinghouses.
—DeutscheBörseGroup2008
developments.Inasimilarvein,“algorithmsandcomputersgonewild”also
2007and2008t
representapotentialrisktothefinancialmarkets;t
oversightandregulationmoreandmored
turninherentlyincreasesrisksandmakesriskmanagementaswellas
oftenbelookedforinevermorecomplexproductsortransactions.Thisin
Asasideeffectoftheincreasingefficiency,competitiveadvantagesmust
el s thestoryofpotentialdangersresultingfromsuch
if icult. Thefinancialc
his materializedrisis of

dramaticallyintheso­calledflashcrash
in ofMay 2010,whereautomated
sellingledtolargeintradaydrops certainstocksandstockindices(cf.
http://en.wikipedia.org/wiki/2010_Flash_Crash).
TechnologyandTalentasBarrierstoEntry
Ontheonehand,technologyadvancesreducecostovertime,ceteris
paribus.Ontheotherhand,financialinstitutionscontinuetoinvestheavily
Tobeactiveincertainareasinfinancetodayoftenbringswithittheneed
intechnologytobothgainmarketshareanddefendtheircurrentpositions.
forlarge­scaleinvestments in bothtechnologyandskilledstaff.As an
example,considerthederivativesanalyticsspace(seealsothecasestudyin
PartIIIofthebook):
Aggregatedoverthet
strategiesforOTC[derivatives]pricingwillrequireinvestmentsbetween
otal softwarelifecycle, firmsadoptingin­house
$25millionand$36millionalonetobuild,maintain,andenhancea
completederivativeslibrary.
—Ding2010 to
Notonlyisitcostlyandtime­consuming
theseexpertshaveto
analyticslibrary, butyoualso need buildafull­fledgedderivatives
tohaveenoughexpertstodoso.And
accomplisht h e i r havetherighttoolsandtechnologiesavailableto
tasks.
AnotherquoteabouttheearlydaysofLong­TermCapitalManagement
(LTCM),formerlyoneofthemostrespectedquantitativehedgefunds—
which,however,wentbustinthel
abouttechnologyandt a l e n t : a t e 1990s—furthersupportsthisinsight
industriall
hiredacrackteamoffinancialengineers
setupshopinGreenwich,Connecticut.I t towasriskmanagementon
Meriwetherspent$20milliononastate­of­the­artcomputersystemand
evel. an
runtheshowatLTCM,which
—Patterson2010
ThesamecomputingpowerthatMeriwetherhadtobuyformillionsof
trading,pricing,andriskmanagementhavebecomesocomplexforlarger
dollarsistodayprobablyavailableforthousands.Ontheotherhand,
financialin stitutionsthattodaytheyneedtodeployITinfrastructureswith
tensofthousands of computingcores.
Ever­IncreasingSpeeds,Frequencies,DataVolumes
Thereisonedimensionofthefinanceindustrythathasbeeninfluenced
mostbytechnologicaladvances:thespeedandfrequencywithwhich
financialtransactionsaredecidedandexecuted.TherecentbookbyLewis
(2014)describesso­calledflashtrading—i.e.,tradingatthehighestspeeds
possible—invividdetail.
Ontheonehand,increasingdataavailabilityonever­smallerscalesmakes
andfrequencyoftradingl et thedatavolumesfurtherincrease.Thisleadsto
itnecessarytoreactinrealtime.Ontheotherhand,theincreasingspeed
processesthatreinforceeachotherandpushtheaveragetimescalefor
financialtransactionssystematicallydown:
Renaissance’sMedallionfundgainedanastonishing80percentin2008,
capitalizingonthemarket’sextremev
computers.JimSimonswas thehedgefundworld’stopearnerfortheyear,
olatility withits lightning­fast
pocketingacool$2.5b
—Patterson2010 i l i o n .
Thirtyyears’worthofdailystockpricedataforasinglestockrepresents
roughly7,500quotes.Thiskindofdatai s whatmostoftoday’sfinance
(MPT),thecapitalassetpricingmodel(CAPM),andvalue­at­risk(VaR)a
theoryisbasedon.Forexample,theorieslikethemodernportfoliotheory l
havetheirfoundationsindailystockpricedata.
Incomparison,onatypicaltrading day thestockpriceofAppleInc.
(AAPL)isquotedaround15,000times—twotimesasmanyquotesasseen
numberofchallenges:
forend­of­dayquotingoveratimespanof30years.Thisbringswithit a
Dataprocessing
Itdoesnotsufficetoconsiderandprocessend­of­dayquotesforstocks
orotherfinancialinstruments;“toomuch”happensduringthedayfor
someinstrumentsduring24hoursfor7daysaweek.
Analyticsspeed
Decisionsoftenhavetobemadeinmillisecondsorevenf aster, making
iTheoreticalfoundations
tlargeamountsofdatainrealtime.
necessarytobuildtherespectiveanalyticscapabilitiesandtoanalyze
Althoughtraditionalfinancetheoriesandconcepts arefar frombeing
perfect,theyhavebeenwelltested(andsometimeswellrejected)over
time;forthe millisecondscalesimportantasoftoday,consistent
conceptsandtheoriesthathaveproventobesomewhatrobustovertime
arestil missing.
Allthesechallengescaninprincipleonly be iaddressedbymodern
technology.Somethingthatmightalsobeal t le bitsurprisingisthatthe
inthathigh­speedalgorithmsexploitmarketmicrostructureelements(e.g.,
lackofconsistenttheoriesoftenisaddressedbytechnologicalapproaches,
orderflow,bid­askspreads)ratherthanrelyingonsomekindoffinancial
reasoning.
TheRiseofReal­TimeAnalytics
There is onedisciplinethathasand dataanalytics.Thisphenomenonhasa
seen astrongincreaseinimportanceinthe
closerelationshiptotheinsightthatspeeds,frequencies,anddatavolumes
financeindustry:financial
t arapidpaceintheindustry.hisInftrend.act, real­timeanalyticscan be
consideredtheindustry’sanswertot
increasea
Roughlyspeaking,“financialanddataanalytics”referstothedisciplineof
algorithmsandmethodstogather,process,andanalyzedatainordertogain
applyingsoftwareandtechnologyincombinationwith(possiblyadvanced)
insights,tomakedecisions,ortofulfil regulatoryrequirements,for
instance.Examplesmightincludetheestimationofsalesimpactsinduced
byachangeinthepricingstructure for afinancialproductintheretail
branchofabank.Anotherexamplemightbethelarge­scaleovernight
calculationofcreditvalueadjustments(CVA) for complexportfoliosof
Therearetwomajorchallengesthatfinancial institutionsfaceinthis
derivativestradesofaninvestmentbank.
context:
Bigdata
Banks and otherfinancialinstitutionshadtodealwithmassiveamounts
ofdataevenbeforetheterm“bigdata”wascoined;however,the
amountofdatathathastobeprocessedduringsingleanalyticstaskshas
increasedtremendouslyovertime,demandingbothincreased
computingpowerandever­largermemoryandstoragecapacities.
decision,and(risk) managementprocesses,whereastheytodayface the
Real­timeeconomy
Inthepast,decisionmakerscouldrelyonstructured,regularplanning,
needtotake careof thesefunctionsinrealtime;severaltasksthathave
beentakencareofinthepastviaovernightbatchrunsinthebackoffice
havenowbeenmovedtothefrontofficeandareexecutedinrealtime.
Again,onecanobserveaninterplaybetweenadvancesintechnologyand
financial/businesspractice.Ontheonehand,thereistheneedtoconstantly
improveanalyticsapproachesintermsofspeedandcapabilitybyapplying
moderntechnologies. of yearsorevenmonthsago.
were consideredimpossible(or
On theotherhand,advancesonthetechnologyside
allownewanalyticsapproachesthat
infeasibleduetobudgetconstraints)acouple
Onemajortrendintheanalyticsspace hasbeen theutilizationofparallel
architecturesontheCPU(centralprocessingunit)sideandmassively
parallelarchitecturesontheGPGPU(general­purposegraphicalprocessing
units)side.CurrentGPGPUsoftenhavemorethan1,000computingcores,
makingnecessaryasometimesradicalrethinkingofwhatparallelismmight
powerofsuchhardware.[3]
usersgenerallyhavetolearnnewparadigmsandtechniquestoharnessthe
meantodifferentalgorithms.Whatisstillanobstacleinthisregardisthat
PythonforFinance
Theprevioussectiondescribessomeselectedaspectscharacterizingthe
roleoftechnologyinfinance:
Technology andasantalentenablerfornewbusinessandinnovation
Costsfortechnology inasthefinanceindustry
barrierstoentryinthe
financeindustry
Increasingspeeds,frequencies,anddatavolumes
Ther i s e ofreal­timeanalytics
In this section,wewanttoanalyzehow Python can helpinaddressing
severalofthechallengesimpliedbytheseaspects.Butf
fundamentallevel, let usexamine Python forfinancefromalanguageand
syntaxstandpoint. i r s t , onamore
FinanceandPythonSyntax
Mostpeoplewho make their first stepswith Pythonin afinancecontext
mayattackanalgorithmicproblem.Thisi
integral, orsimplywantstovisualizesomedata.Ingeneral,att
example,wantstosolveadifferentialequation,wantstoevaluatean
s similartoascientistwho,for
his stage,
thereis onlylittlethoughtspentontopicslikeaformaldevelopment
process,t e s t i n g , documentation,ordeployment.However,thisespecially
seemstobethestagewhenpeoplefallinlovewith Python.
forthis mightbethatthe Python syntaxis generallyquiteclosetothe
mathematicalsyntaxusedtodescribescientificproblemsorfinancial
Wecan
algorithms. Amajorreason
i l u s t r a t e t h i s phenomenonbyasimplefinancialalgorithm,namely
thevaluationofaEuropeancalloptionbyMonteCarlosimulation.Wewill
Supposewehavethefollowingnumericalparametervaluesforthe
motion.
whichtheoption’sunderlyingriskfactorfollowsageometricBrownian
consideraBlack­Scholes­Merton(BSM)setup(seealsoChapter3)in
valuation:
Initial stockindexlevelS0=100
StrikepriceoftheEuropeancal optionK=105
Time­to­maturityT=1year
Constant,risklessshortrater=5%
Constantvolatility ᵰ=20%
IntheBSMmodel,theindexlevela t maturityi s arandomvariable,given
byEquation1­1withzbeingastandardnormallydistributedrandom
variable.
Equation1­1.Black­Scholes­Merton(1973)indexlevelatmaturity
Thefollowing is analgorithmicdescriptionoftheMonteCarlovaluation
procedure:
1. standardnormaldistribution. ò
DrawI(pseudo)randomnumbersz(i), i {1,2, ,I}, fromthe
3.2. Calculatea
andCalculatea ll innervaluesoftheoptiona
resultingindexlevelsat maturityS
Equation1­1. T(i) forgivenz(i)
t maturityash
4.Estimatethe
max(ST(i) –K,0). T(i) =
optionpresentvalueviatheMonteCarloestimator
giveninEquation1­2.
Equation1­2.MonteCarloestimatorforEuropeanoption
Wearenowgoingtotranslatethis problemandalgorithminto Python
code.Thereadermightfollowthesinglestepsbyusing,forexample,
IPython—this i s , however,notreallynecessarya t t h i s
First, let usstart withtheparametervalues.Thisis reallyeasy: stage.
S0K ==105.100.
Tr == 1.00.05
sigma = 0.2
Next,thevaluationalgorithm.Here,wewillforthef
whichmakeslifequiteeasyforoursecondtask: i r s t timeuse NumPy,
from numpy import *
I = 100000
zST==S0random.standard_normal(I)
* exp((r-0.5* sigma ** 2) * T + sigma * sqrt(T) * z)
hTC0 == maximum(ST -K,0)
exp(-r* T) * sum(hT) /I
Third,weprinttheresult:
print "Value of the European Call Option %5.3f" % C0
Theoutputmightbe:[4]
Value of the European Call Option 8.019
Threeaspectsareworthhighlighting:
SyntaxThe Python syntaxis indeedquiteclosetothemathematicalsyntax,
e.g., whenitcomestotheparametervalueassignments.
Translation
Everymathematicaland/oralgorithmicstatementcangenerallybe
translatedintoasinglelineof Python code.
Vectorization
Oneofthestrengthsof NumPy is thecompact,vectorizedsyntax,e.g.,
allowingfor100,000calculationswithinasinglelineofcode.
Thiscodecanbeusedinaninteractiveenvironmentlike
However,codethati s IPython.
meanttobereusedregularlytypicallygetsorganized
inso­calledmodules(ors cripts), whicharesingle Python(i.e., text) files
withthesuffix.py.SuchamodulecouldinthiscaselooklikeExample1­1
andcouldbesavedasaf i l e named bsm_mcs_euro.py.
Example1­1.MonteCarlovaluationofEuropeancalloption
## Monte Carlo valuation of European call option
## bsm_mcs_euro.py
in Black-Scholes-Merton model
#importnumpy as np
S0=
# Parameter
100. #initialindex
Values level
105. ##strike
TrK === 1.00.05 price
time-to-maturity
sigma = 0.2# risklessshort
# volatility rate
IST== 100000 # number-0.5* of simulations
z# =Valuation Algorithm
np.random.standard_normal(I)
S0 * np.exp((r # pseudorandom numbers
#indexvalues at maturity sigma ** 2) * T + sigma * np.sqrt(T) * z)
hTC0 =np.maximum(ST 0) # inner/ Ivalues
= np.exp(-r * T)-*K,np.sum(hT) # Monteat Carlo
maturityestimator
#Result
print"ValueOutputofthe European Call Option %5.3f" % C0
Therathersimplealgorithmicexample
Python,
ofscientificlanguages,Englishi s and in t h i s subsectionillustratesthat
withitsverysyntax, wellsuitedtocomplementtheclassicduo
Mathematics.It seemsthatadding
Python to thesetofscientific languagesmakesit morewellrounded.We
haveEnglishforwriting,talkingaboutscientific andfinancialproblems,etc.
Mathematicsforconciselyandexactlydescribingandmodeling
Pythonfortechnicallymodelingandimplementingabstractaspects,
abstractaspects,algorithms,complexquantities,etc.
algorithms,complexquantities, etc.
MATHEMATICSAND PYTHON SYNTAX
Thereishardlyanyprogramminglanguagethatcomesascloseto
Pythonic
mathematicalsyntaxas Python.Numerical algorithmsaretherefore
simpletotranslatefromthemathematicalrepresentationintothe
implementation.Thismakesprototyping,development,
codemaintenanceinsuchareasquiteefficientwith Python. and
introduceafourthlanguagefamilymember.Theroleofpseudocodeisto
Insomeareas,i t iscommonpracticetousepseudocodeandtherewithto
tothetechnicalimplementation.Inadditiontothealgorithmitself,
pseudocodetakesintoaccounthowcomputersworkinprinciple.
is bothstillclosetothemathematicalrepresentationandalreadyquiteclose
represent,forexample,financialalgorithmsinamoretechnicalfashionthat
Thispracticegenerallyhasi t s causeinthe
languagesthetechnicalimplementation i s fact
quitethatwithmostprogramming
“far away”fromits formal,
mathematicalrepresentation.Themajority of programminglanguagesmake
thatitishardtoseetheequivalencebetweenthemathematicsandthecode.
itnecessarytoincludesomanyelementsthatareonlytechnicallyrequired
Nowadays, Python is oftenusedinapseudocodewaysinceits syntaxis s
almostanalogoustothemathematicsandsincethetechnical“overhead”i
conceptsembodied in thelanguagethatnotonlyhavetheiradvantages but
kepttoaminimum.Thisisaccomplishedbyanumberofhigh­level
thatwith PythonandInyoucan,whenevertheneeda
fromtheoutset. thatsense, Python can providethebestofbothworlds:
alsocomeingeneralwithrisksand/orothercosts.However,i
implementation codingpracticesthatotherlanguagesmightrequire
rises, followthesames
t issafetosay
trict
high­levelabstractionandrigorousimplementation.
EfficiencyandProductivityThroughPython
Atahigh level, benefitsfromusing Python canbemeasuredinthree
dimensions:
Efficiency
Howcan Python helpingettingresultsfaster, insavingcosts,andin
savingtime?
Productivity
How can Pythontchelpingettingmoredonewiththesameresources
(people,assets,e .)?
QualityWhatdoes Python allowustodothatwecouldnotdowithalternative
technologies? canby
canhighlightsomearguments as astartingpoint.
Adiscussionoftheseaspects naturenotbeexhaustive.However,it
Shortertime­to­results
ield wheretheefficiencyof Python becomesquiteobviousis interactive
Afdataanalytics.Thisisafieldthatbenefitsstronglyfromsuchpowerful
toolsas IPython andl i b r a r i e s like pandas.
Considerafinancestudent,writinghermaster’sthesisandinterestedin
for, say,five years toseehowthevolatility ofthestockpricehasfluctuated
Googlestockprices.Shewantstoanalyzehistoricalstockpriceinformation
overtime.Shewants to findevidencethatv o l a
typicalmodelassumptions,fluctuatesovertimeandisfarfrombeingt i l i t y , in contrast to some
constant.Theresultsshouldalsobevisualized.Shemainlyhastodothe
following:
DownloadGooglestockpricedatafromtheWeb.
Calculatetherollingstandarddeviationofthelogreturns(volatility).
Plotthestockpricedataandtheresults.
consideredthem tobe something for professionalfinancialanalysts.Today,
Thesetasksarecomplexenoughthatnottoolongagoonewouldhave
howexactlyt his works—withoutworryingaboutsyntax detailsaLettusthseeis stage
eventhefinancestudentcaneasilycopewithsuchproblems.
(everythingi s explainedindetailinsubsequentchapters).
First, makesuretohaveavailableal necessarylibraries:
In [1]: import
import numpy
pandasasasnppd
import pandas.io.data as web
Second,retrievethedata from, say,Googleitself:
In [2]: goog = web.DataReader('GOOG', data_source='google',
start='3/14/2009', end='4/14/2014')
goog.tail()
Out[2]: ​Date Open High Low Close Volume
2014-04-08 559.62
2014-04-09 542.60 565.37
555.00 552.95
541.61 564.14
554.90 3324742
3152406
2014-04-10 532.55
2014-04-11 565.00 526.53
565.00 540.00 540.95 3916171
539.90 530.60 4027743
2014-04-14 538.25 544.10 529.56 532.52 2568020
5 rows × 5 columns
Third,implementthenecessaryanalyticsforthevolatilities:
In [3]: goog['Log_Ret'] = np.log(goog['Close'] /
goog['Close'].shift(1))
goog['Volatility'] = pd.rolling_std(goog['Log_Ret'],
np.sqrt(252) window=252) *
Fourth,plottheresults.Togenerate an inlineplot,weusethe
magiccommand %matplotlib withtheoption inline: IPython
%matplotlib inline'Volatility']].plot(subplots=True,
In [4]: goog[['Close',
color='blue', figsize=(8, 6))
IPython.
financialanalytics:datagathering,complex
Figure1­1showsthegraphicalresultoft hisandbriefinteractivesessionwith
It canbeconsideredalmostamazingthatfourlinesofcode
sufficetoimplementthreerathercomplextaskstypicallyencounteredin
repeatedmathematical
pandas
calculations,andvisualization of results. Thisexampleillustratesthat
makesworkingwithwholetimeseriesalmostassimpleasdoing
mathematicaloperationsonfloating­pointnumbers.
Figure1­1.Googleclosingpricesandyearlyvolatility
Translatedtoaprofessionalfinancecontext,theexampleimpliesthat
financialanalystscan—whenapplyingtherightheirPythontools andlibraries,
providinghigh­levelabstraction—focusont
thetechnicalintrinsicalities.Analystscan verydomainandnoton
react faster, providingvaluable
insightsalmostinrealtimeandmakingsuretheyareonestepaheadofthe
competition.Thisexampleofincreasedefficiency
measurablebottom­linee f ects. can easilytranslateinto
Ensuringhighperformance
Ingeneral,itisacceptedthat Python hasaratherconcisesyntaxandthatit
iPython
s relativelyefficienttocodewith.However,duetotheverynatureof Python
generallyibeinganinterpretedlanguage,theprejudicepersiststhat
dependingonthespecificimplementationapproach, Python canbereally
s tooslowforcompute­intensivetasksinfinance.Indeed,
slow. But itdoesnothavearea.tobeInprinciple,
almostanyapplication slow—it canonecandistinguisha
behighlyperformingin t l e a s t three
differentstrategiesforbetterperformance:
ParadigmIngeneral,manydifferentwayscanleadtothesameresultin Python,
butwithratherdifferentperformancecharacteristics;“simply”choosing
therightway( e.g., aspecificlibrary)canimproveresultssignificantly.
Compiling
providecompiledversionsofimportantfunctionsorthatcompile
Nowadays,thereareseveralperformancelibraries availablethat
Python codestatical ycanbe
machinecode,which orders of magnitudef
ordynamically( at runtimeorcastearl; popularones
time)to
areCythonand Numba.
Parallelization
Manycomputationaltasks,inparticularinfinance,canstronglybenefit
fromparallelexecution;t his isnothingspecialto Python butsomething
thatcaneasilybeaccomplishedwithit.
PERFORMANCE COMPUTING WITH PYTHON
Python perseisnotahigh­performancecomputingtechnology.
However, Python hasdevelopedintoanidealplatformtoaccess
currentperformancetechnologies.Inthatsense, Python hasbecome
somethinglikeagluelanguageforperformancecomputing.
wanttostickto
Laterchaptersilasimple,buts
ustrate al threetechniquesindetail.Forthemoment,we
t i l r e a l i s t i c , examplethattouchesupona l
threetechniques.
Aquitecommontaskinfinancialanalyticsi
mathematicalexpressionsonlarge arraysofnumbers.Tot s toevaluatecomplex
itself provideseverythingneeded: his end, Python
In [1]: loops= import*
from math25000000
adef= f(x):
range(1, loops)
%timeitreturnr =3[f(x)
* log(x)for x+ incos(x)a] ** 2
Out[1]: 1loops, best of 3: 15 s per loop
function f25,000,000 times. in this casetoevaluatethe
ThePythoninterpreterneeds15seconds
Thesametaskcanbeimplementedusing NumPy, whichprovidesoptimized
(i.e., pre­compiled),functionstohandlesucharray­basedoperations:
In [2]: import numpy as nploops)
a=%timeitnp.arange(1,
r= 3* np.log(a) + np.cos(a) ** 2
Out[2]: 1 loops, best of 3: 1.69 s per loop
Using NumPy considerablyreducestheexecutiontimeto1.7seconds.
However,thereisevenalibraryspecificallydedicatedtot h i s kindoftask.
It is called numexpr, for“numericalexpressions.”It compilesthe
expressiontoimproveupontheperformanceof NumPy’s general
functionalityby,forexample,avoidingin­memorycopiesofarraysalong
theway:
In [3]:importnumexpras ne
ne.set_num_threads(1)
f='3 *rlog(a)
%timeit +cos(a)** 2'
= ne.evaluate(f)
Out[3]: 1 loops, best of 3: 1.18 s per loop
seconds.However, numexpr alsohasbuilt­in capabilitiestoparallelizethe
Usingthismorespecializedapproachfurtherreducesexecutiontimeto1.2
executionoftherespectiveoperation.Thisallowsustousea
threadsofaCPU: l available
In [4]: %timeit
ne.set_num_threads(4)
r = ne.evaluate(f)
Out[4]: 1 loops, best of 3: 523 ms per loop
Thisbringsexecutiontimefurtherdownto0.5secondsint
twocoresandfourthreadsu t i l i z e d . Overall,t h i s his case,with
isaperformance
improvementof30times.Note,inparticular,thatt
ispossiblewithoutalteringthebasicproblem/algorithmandwithout his kindofimprovement
knowinganythingaboutcompilingandparallelizationissues.The
capabilities are accessible froma highlevelevenbynonexperts.However,
be
onehasto aware,ofcourse,ofwhichcapabilitiese
TheexampleshowsthatPythonprovidesanumberofoptions xist.
moreoutofexistingresources—i.e.,toincreaseproductivity.Withthe tomake
sequentialapproach,about21mnevaluationspersecondareaccomplished,
whiletheparallelapproachallowsforalmost48mnevaluationspersecond
—int ust one. el ing Python touseal availableCPUthreads
his casesimplybyt
insteadofj
FromPrototyping to Production
in
executionspeedarecertainlytwobenefitsof
Efficiency interactiveanalyticsandperformancewhenitcomestoPython to consider.Yet
anothermajorbenefit of using Python forfinancemightat firstsight seem
abproduction.
ifactor.It subtler;atsecondsighti t mightpresenti t s e l f asanimportantstrategic
t is thepossibilitytouse Python endtoend,fromprototypingto
stepprocess.Ontheonehand,there
financialdevelopmentprocesses,i are thequantitativeanalysts(“quants”)
Today’spracticeinfinancialinstitutionsaroundtheglobe,whenitcomesto
s oftencharacterizedbyaseparated,two­
usetoolsandenvironmentslike Matlab and R thatallowforrapid,
responsibleformodeldevelopmentandtechnicalprototyping.Theyliketo
interactiveapplicationdevelopment.Att
f orts, issueslikeperformance,stability,hexceptionmanagement,
is stageofthedevelopment
eseparationofdataaccess,andanalytics,amongothers,arenotthat
important.Oneismainlylookingforaproofofconceptand/oraprototype
Oncethe
application.
thatexhibitsthemaindesiredfeaturesofanalgorithmorawhole
prototypei
roverandareresponsible s finished,ITdepartmentswitht
for
eliable, maintainable,andperformantproductioncode.Typically, h e i r developerstake
translatingtheexistingprototypecodeintoat this
stagethereisaparadigmshiftinthatlanguageslike
usedtofulfil therequirementsforproduction.Also,aformaldevelopment C++ or Javaare now
processwithprofessionalt o l s , versioncontrol,e
Thistwo­stepapproachhasanumberofgenerallyunintended t c . i s applied.
consequences:
Inefficiencies
Prototypecodeisnotreusable;algorithmshaveto beimplemented
twice;redundanteffortstaketimeandresources.
Diverseskillsets
Differentdepartmentsshowdifferentskil setsandusedifferent
languagestoimplement“thesamethings.”
Codeis availableandhastobemaintained in differentlanguages,often
Legacycode
usingdifferentstylesofimplementation(
pointofview). e.g., fromanarchitectural
fromthefirstinteractiveprototypingsteps to highlyreliableandefficiently
UsingPython,ontheotherhand,enablesastreamlinedend­to­endprocess
maintainableproductioncode.Thecommunicationbetweendifferent
developmentprocess.Allinall,
financialapplicationbuilding.I tPython canprovideaconsistent of
redundancieswhenusingdifferenttechnologiesindifferentstepsofthe
streamlinedinthattherei
departmentsbecomeseasier.Thetrainingoftheworkforceisalsomore
s onlyonemajorlanguagecoveringallareas
alsoavoidstheinherentinefficienciesand
technologicalframeworkforalmostalltasksinfinancialapplication
developmentandalgorithmimplementation.
Conclusions
Pythonasa language—butmuchmoresoasanecosystem—isanideal
technologicalframeworkforthefinancialindustry.Itischaracterized
an bya
Withi
numberofbenefits,like ibraries andtools, Pythonseems
approaches,andusabilityforprototypingandproduction,amongothers.
ts hugeamountofavailablel
elegantsyntax,efficientdevelopment
tohave
answerstomostquestionsraisedbyrecentdevelopmentsinthefinancial
industryintermsofanalytics,datavolumesandfrequency,compliance,
andregulation,aswellastechnologyi tself.It hasthepotentialtoprovidea
single,powerful,consistentframeworkwithwhichtostreamlineend­to­end
developmentandproductioneffortsevenacrosslargerfinancial
institutions.
FurtherReading
Therearetwobooksavailablethatcovertheuseof Pythonin finance:
Fletcher,ShayneandChristopherGardner(2009):FinancialModelling
inPython.JohnWiley&Sons,Chichester,England.
Hilpisch,Yves(2015):DerivativesAnalyticswithPython.Wiley
Finance,Chichester,England.http://derivatives­analytics­with­
python.com. are from
Thequotesint h i s chapter taken thefollowingresources:
Crosman,Penny(2013):“Top8WaysBanksWillSpendTheir2014
Budgets.”BankTechnologyNews. IT
DeutscheBörseGroup(2008):“TheGlobalDerivativesMarket—An
Introduction.”Whitepaper.
Ding,Cubillas(2010):“OptimizingtheOTCPricingandValuation
Infrastructure.”Celentstudy.
Lewis,Michael(2014):Flash
York. Boys.W.W.Norton&Company,New
Patterson,Scott(2010):TheQuants.CrownBusiness,NewYork.
[1] Python, forexample,is amajorlanguageusedintheMasterofFinancial
EngineeringprogramatBaruchCollegeoftheCityUniversityofNewYork(
[2] Cf.http://wiki.python.org/moin/BeginnersGuide,whereyouwillfindlinkstocf.
many
http://mfe.baruch.cuny.edu).
valuableresourcesforbothdevelopersandnondevelopersgettingstartedwith
[3]Python.
Chapter8providesanexampleforthebenefits
contextofthegenerationofrandomnumbers. ofusingmodernGPGPUsinthe
Theoutputofsuchanumericalsimulationdependsonthepseudorandom
[4]numbersused.Therefore,resultsmightvary.
Chapter2.Infrastructureand
Tools
Infrastructurei s
—RemKoolhaas muchmoreimportantthanarchitecture.
Youcouldsayinfrastructurei
everythingcanbenothing—beis noteverything,butwithoutinfrastructure
softwarecomponentsthatallowthedevelopmentandexecutionofasimple
dowemeanthenbyinfrastructure?Inprinciple,iti
t intherealworldorintechnology.What
s thosehardwareand
Python scriptormorecomplex
However,thischapterdoesnot Python
go applications.
intodetailwithregardtohardware
infrastructure,sincea l Python
almostanyhardware.[5]Nordoes codeandexamplesshouldbeexecutableon
it discussdifferentoperatingsystems,
sincethecodeshouldbeexecutableonanyoperatingsystemonwhich
Python, inprinciple,is available.Thischapterratherfocusesonthe
followingtopics:
Deployment
HowcanImakesuretohaveeverythingneededavailableina
toAnaconda,
deploy Pythoncode
deploymentquiteefficient,aswellasthe
consistentfashion
chapterintroduces Python Quant Platform,
aPythondistributionthatmakes
andapplications?This
whichallowsforaweb­andbrowser­baseddeployment.
ToolsWhichtoolsshallIusefor(interactive) Python developmentanddata
analytics?Thechapterintroducestwoofthemostpopulardevelopment
environmentsfor Python, namely IPython and Spyder.
ThereisalsoAppendixA,on:
Bestpractices
WhichbestpracticesshouldIfollowwhendeveloping Python
Theappendixbrieflyreviewsfundamentalsof,forexample, code?
Python
codesyntaxanddocumentation.
PythonDeployment
viathewebbrowser. Python locally(oronaserver)aswellas
Thissectionshowshowtodeploy
Anaconda
Anumberofoperatingsystems come withaversionof
numberofadditionallibraries alreadyi nstal ed. Thisis tPythonand
rue, forexample,of
a
Linux operatingsystems,whichoften rely on Python astheir main
language(forpackaging,administration,e t c . ) . However,inwhatfollows
weassumethatPythonisnotinstalledorthatweareinstallingan
additionalversionof
Anaconda Python
distribution. (inparalleltoanexistingone)usingthe
Youcandownload Anaconda foryouroperatingsystemfromthewebsite
http://continuum.io/downloads.Thereareacoupleofreasonstoconsider
usingAnacondaforPythondeployment.Amongthem are:
Libraries/packages
Yougetmorethan100ofthemostimportant Python l i b r a r i e s and
packagesinasingleinstallationstep;inparticular,yougeta
installed in aversion­consistentmanner(i.e.,alllibraries andpackages l these
workwitheachother).[6]
The Anaconda distributionis freeofchargeingeneral,[7] asareal
Opensource
libraries andpackagesincludedinthedistribution.
Crossplatform
It is availablefor Windows,Mac OS, and Linux platforms.
Separateinstallation
iIntsitnasltaaltisointoaseparatedirectorywithoutinterferingwithanyexisting
n; noroot/adminrightsareneeded.
Automaticupdates
LibrariesandpackagesincludedinfreeAnaconda
(semi)automaticallyupdatedvia canbe
onlinerepositories.
Condapackagemanager
multipleversionsofl
environments. of multiple Python versionsand
development/testingpurposes);italsohasgreatsupportforvirtual
Thepackagemanagerallowstheuseibraries inparallel(forexperimentationor
Afterhavingdownloadedthei
fgeneralisquiteeasy.On n s t a l e r for Anaconda,
ile andfollowtheinstructions.UnderLinux,openashell,changetothe theinstallationin
Windows platforms,just double­clicktheinstaller
directorywheretheinstallerfileislocated, and type:
$ bash Anaconda-1.x.x-Linux-x86[_64].sh
Replacingthefile namewiththerespectivenameofyouri
Thenagainfollowtheinstructions.I t is thesameonan Applenstalcomputer;
er file.
just type:
$ bash Anaconda-1.x.x-MacOSX-x86_64.sh
makingsureyoureplacethenamegivenherewiththecorrectone.
Alternatively,youcanusethegraphicali n s t a l e r thati s available.
availablethatyoucanuseimmediately.Amongthes
Aftertheinstallationyouhavemorethan100librariecsieandpackagesntific anddata
analyticspackagesarethoselisted inTable2­1.
Table2­1.SelectedlibrariesandpackagesincludedinAnaconda
Name Description
BitArray ObjecttypesforarraysofBooleans
Cubes OLAP Frameworkfor Online Analytical Processing (OLAP)applications
Disco mapreduce implementationfordistributedcomputing
Gdata Implementationof Google Data Protocol
h5py Python wrapperaround HDF5 file format
HDF5 Fileformatforfast I/Ooperations
IPython Interactivedevelopmentenvironment(IDE)
lxml Processing XML and HTML with Python
matplotlib Standard2Dand3Dplottinglibrary
computationParsing Interface(MPI) implementationforparallel
MPI4Py Message
MPICH2 AnotherMPIimplementation
NetworkX Buildingandanalyzingnetworkmodelsandalgorithms
numexpr Optimizedexecutionofnumericalexpressions
NumPy Powerfularrayclassandoptimizedfunctionsonit
pandas Efficienthandlingoftimeseriesdata
PyTables Hierarchicaldatabaseusing HDF5
SciPy Collectionofscientific functions
Learn Machinelearningalgorithms
Scikit-
Python IDEwithsyntaxchecking,debugging,andinspection
Spyder capabilities
statsmodels Statistical models
SymPy Symboliccomputationandmathematics
Theano Mathematicalexpressioncompiler
Iterminalwindowandshouldthenbeable,forexample,tos
f theinstallationprocedurewassuccessful,youshouldopenanew
tart the Spyder
IDEbysimplytypingintheshel :
$ spyder
Alternatively,youcanstart a Python sessionfromtheshellasfollows:
$Python
python2.7.6 |Anaconda 1.9.2 (x86_64)| (default, Feb 10 2014,
17:56:29)
[GCC4.0.1 (Apple Inc. build"credits"
5493)] onor darwin
Type "help",
information. "copyright", "license" for more
>>>$ exit()
Anaconda by defaultinstal s, atthetimeofthiswriting,with Python
2.7.x. Italwayscomeswith conda, theopensourcepackagemanager.
Usefulinformationaboutthis toolcanbeobtainedbythecommand:
$Current
conda conda
info install:
python
condaplatform
version : /Library/anaconda
osx-64
3.4.1
root environment 2.7.6.final.0 (writable)
default
envs directories
environment
URLs : /Library/anaconda
/Library/anaconda/envs
package
channel /Library/anaconda/pkgs
cache : http://repo.continuum.io/pkgs/free/osx-64/
http://repo.continuum.io/pkgs/pro/osx-64/
is foreign system False
config file : None
$
conda allowsonetosearchforlibraries andpackages,bothlocallyandin
availableonlinerepositories:
$ conda search
Fetching pytables
package metadata:...2.4.0
pytables
defaults np17py27_0
2.4.0 np17py26_0
defaults 2.4.0 np16py27_0
defaults 2.4.0 np16py26_0
defaults . 3.0.0 np17py27_0
defaults 3.0.0 np17py26_0
defaults 3.0.0 np16py27_0
defaults 3.0.0 np16py26_0
defaults . 3.0.0 np17py33_1
defaults . 3.0.0 np17py27_1
defaults 3.0.0 np17py26_1
defaults . 3.0.0 np16py27_1
defaults 3.0.0 np16py26_1
defaults 3.1.0 np18py33_0
defaults * 3.1.0 np18py27_0
defaults 3.1.0 np18py26_0
defaults 3.1.1 np18py34_0
defaults 3.1.1 np18py33_0
defaults 3.1.1 np18py27_0
defaults 3.1.1 np18py26_0
defaults
PyTables thatareavailablefor
downloadandinstallationinthiscaseandthatareinstalled(indicatedby
Theresultscontainthoseversionsof
theasterisk). Similary,the
packagesthat listcommand givesalllocallyinstalled
start with“pyt”:matchacertainpattern.Thefollowinglists al packagesthat
# packageslistin^pytenvironment at /Library/anaconda:
$conda
#pytables
pytest 3.1.0
2.5.2 np18py27_0
py27_01
python
python-dateutil 2.7.6
1.51.2 <pip>
python.app
pytz 2014.2 py27_1
py27_0
Morecomplexpatterns,basedonregularexpressions,arealsopossible.For
example:
$# conda listin^p.*les$
#pytables environment at3.1.0/Library/anaconda:np18py27_0
packages
$
Supposewewanttohave Python
version.Thepackagemanager 3.x availableinadditiontothe 2.7.x
conda allowsthecreationofanenvironment
inwhichtoaccomplisht
in
works principle: h i s goal.Thefollowingoutputshowshowt h i s
$ conda create
Fetching -nmetadata:
package py33test anaconda=1.9
.. python=3.3 numpy=1.8
Solving package specifications: .
Package plan for installation in environment
/Library/anaconda/envs/py33test:
The following packages will be downloaded:
package | build
---------------------------|-----------------
anaconda-1.9.2
...xlsxwriter-0.5.2 || np18py33_0 2 KB
py33_0 168 KB
The following packages will be linked:
package | build
---------------------------|-----------------
anaconda-1.9.2
...zlib-1.2.7 | np18py33_0 hard-link
| 1 hard-link
Proceed ([y]/n)?
Whenyoutype y toconfirmthecreation, conda willdo
downloading,extracting,andlinkingthepackages): as proposed( i . e . ,
*******UPDATE**********
Fetching packages ...
anaconda-1.9.2-np18py33_0.tar.bz2 100% |##########| Time: 0:00:00
...173.62kB/s
xlsxwriter-0.5.2-py33_0.tar.bz2
131.32 kB/s 100% |############| Time: 0:00:01
[ExtractingCOMPLETE
packages...] |##########################| 100%
Linking
[# COMPLETEpackages ... ] |##########################| 100%
# To activate this environment,use:
# $source activate py33test
## To$ source
deactivate this environment, use:
deactivate
#
Nowactivatethenewenvironmentasadvisedby conda:
prependingactivatepy33test
$source
discarding /Library/anaconda/binfromPATH to PATH
/Library/anaconda/envs/py33test/bin
(py33test)$
Python 3.3.4python
|Anaconda 1.9.2 (x86_64)| (default, Feb 10 2014,
17:56:29)
[GCC4.0.1 (Apple Inc. build"credits"
5493)] onor darwin
Type"help",
information. "copyright", "license" for more
>>>print line 13.3" # this shouldn't work with Python3.3
"Hello Python
File"<stdin>",
print "Hello Pythonsyntax3.3"^ # this shouldn't work with Python3.3
SyntaxError:("Hello
>>>print invalidPython 3.3") # this syntax should work
$>>>Helloexit()
Python3.3
Obviously,weindeedarenowinthe Python 3.3 world,whichyoucan
judgefromthePythonversionnumberdisplayedandthefactthatyouneed
parenthesesforthe printstatement to workcorrectly.[8]
MULTIPLE
separated
PYTHON ENVIRONMENTS
Pythonenvironments onasinglemachine.This,among
otherfeatures,simplifies
Withthecondapackagemanageryoucaninstallandusemultiple
testingof
different Python versions. Python codeforcompatibilitywith
Singlel i b r a r i e s andpackagescanbeinstalledusingthe
command,eitherinthegeneral Anaconda instal ation: conda install
$ conda install scipy
orforaspecificenvironment, as in:
$ conda install -n py33test scipy
Here, py33test is theenvironment we createdbefore.Similarly,youcan
updatesinglepackageseasily:
$ conda update pandas
Thepackagestodownloadandlinkdependontherespectiveversionofthe
packagethat is instal eofd. These
packagehasanumber can beveryfor which
dependencies no currentversionis
few tonumerous,e.g.,whena
iform:
nstal ed. Forournewlycreatedenvironment,theupdatingwouldtakethe
$ conda update -n py33test pandas
conda makesiteasytoremovepackageswiththeremove
commandfromthemaininstallationoraspecificenvironment.Thebasic
Finally,
usageis:
$ conda remove scipy
Foranenvironmentit is:
$ conda remove -n py33test scipy
runthecommand:
Sincetheremovalis asomewhat“final”operation,youmightwanttodry
$conda remove --dry-run -n py33test scipy
If youaresure,you
theoriginal Python cango
and Anaconda
aheadwiththeactualremoval.Togetbackto
version,deactivatetheenvironment:
$ source deactivate
Finally,wecancleanupthewholeenvironmentbyuseof
option --all: remove withthe
$ conda remove --all -n py33test
Thepackagemanager conda makes Python
Apartfromthebasicfunctionalitiesi l u s t r a t deploymentquiteconvenient.
e d inthissection,therearealso
anumberofmoreadvancedfeaturesavailable.Detaileddocumentationi
foundat http://conda.pydata.org/docs/. s
PythonQuantPlatform
Thereareanumberofreasonswhyonemightliketodeploy Python viaa
webbrowser.Amongthemare:
Noneedforinstallation
LocalinstallationsofacompletePythonenvironmentmightbeboth
complex( e . g . , inalargeorganizationwithmanycomputers),
tosupportandmaintain;making Python and
availableviaawebbrowser costly
makesdeploymentmuchmoreefficientincertainscenarios.
Useof(better)remotehardware
Whenitcomes to complex,compute­andmemory­intensiveanalytics
maybeGPGPUsmakessuchtaskspossibleandmoree
of(multiple)sharedserverswithmultiplecores,largermemories,and
tasks,alocalcomputermightnotbeabletoperformsuchtasks;theuse f icient.
Collaboration
Working,forexample,withateam and on asingleormultipleservers i s
makescollaborationsimpler
moved toevery alsoincreasesefficiency:data
localmachine,nor, a f t e r theanalyticstasks are not
finished,
distributedamongtheteammembers.
aretheresultsmovedbacktosomecentralstorageunitand/or
Python Quant Platform is aweb­andbrowser­basedfinancial
TheanalyticsandcollaborationplatformdevelopedandmaintainedbyThe
PythonQuantsGmbH.Youcanregisterfortheplatformathttp://quant­
platform.com.Itfeatures,amongothers,thefollowingbasiccomponents:
Filemanager
Atooltomanagefile up/downloadsandmoreviaawebGUI.
LinuxA Linux
terminalterminaltoworkwiththeserver(forexample,avirtualserver
instanceinthecloudoradedicatedserverrunon­premisebya
company);youcanuse Vim,Nano, etc.forcodeeditingandworkwith
Git repositoriesforversioncontrol.
Anaconda
AnAnacondainstallationthatprovidesallthefunctionalitydiscussed
previously;bydefaultyoucanchoosebetween
Python 3.4. Python 2.7 and
PythonThestandard
shell Python shel .
IPython Shell
Anenhanced IPython shel .
IPython Notebook
o l. of IPython. Youwillgenerallyusethis as the
Thebrowserversion
centralt
Chatro m/forumTocol aborate, exchangeideas,andtoup/download,forexample,
researchdocuments.
Advancedanalytics
Inadditiontothe Linux serverandor, Python environments,theplatform
providesanalyticalcapabilitiesf e . g . , portfolio,risk,andderivatives
analyticsaswellasforbacktestingtradingstrategies(inparticular,DX
library); there is alsoI Ianforasimplifiedbutfullyfunctionalversionofthe
analytics;seePart R stackavailabletocall,forexample, R
functionsfromwithin IPython Notebook.
StandardAPIs
Standard Python­based APIsfordatadeliveryservicesofleading
financialdataproviders.
Whenitcomestocollaboration,thePython
rightsfordifferent Quant Platform
onetodefine—undera“company”—certain“usergroups”withcertain alsoallows
Pythonprojects (i.e.,directoriesandfiles).Theplatform
iseasilyscalable andis deployedvia Docker containers.Figure2­1shows
ascreenshotofthemainscreenofthe Python Quant Platform.

Figure2­1.ScreenshotofPythonQuantPlatform
Tools
Thesuccessandpopularityofaprogramminglanguageresulttosome
are has
extentfromthetoolsthat
longbeenthe case that availabletoworkwiththelanguage.I
Python t
wasconsideredanice,easy­to­learnand
easy­to­uselanguage,butwithoutacompellingsetoftoolsforinteractive
analyticsordevelopment.Thishaschanged.Therearenowalargenumber
oftoolsavailablethathelpanalystsanddeveloperstobeasproductiveas
possiblewithPython.Ititsisnotpossibletogiveevenasomewhatexhaustive
overview.However,i possibletohighlighttwoofthemostpopulartools
inusetoday: IPython and Spyder.[9]
Python
simplytyping python:let usfirstconsiderusingthestandardPythonPythonis invokedby
Forcompleteness,
itself. Fromthesystemshell/command­lineinterface, interpreter
$python
Python 2.7.6 |Anaconda 1.9.2 (x86_64)| (default, Feb10 2014,
17:56:29)
[GCC4.0.1 (Apple Inc. build"credits"
5493)] onor darwin
Type"help",
information. "copyright", "license" for more
>>>HelloprintPython"Hellofor Python
FinanceforWorld.Finance World."
$>>> exit()
Althoughyoucandoquiteab
peopleprefertouse i t of Python withthestandardprompt,most
IPython bydefaultsincethis environmentprovides
everythingthatthestandardinterpreterpromptoffers,andmuchmoreon
topofthat.
IPython
IPython wasusedinChapter1topresentthef i r s t examplesof Python
code.Thissectiongives an overview of thecapabilities of IPython through
specificexamples.Acompleteecosystemhasevolvedaround IPython that
ibasicapproachandarchitecturei
s sosuccessfulandappealingthatusersofotherlanguagesmakeuseofthe
t provides.Forexample,therei s aversion
of IPython forthe Julia language.
Fromshelltobrowser
IPython comesinthreeflavors:
ShellTheshellversion is basedonthesystemand Python shell,asthename
suggests;therearenographicalcapabilitiesincluded(apartfrom
displayingplotsinaseparatewindow).
QTconsole
inlinegraphics.
hThisversionisbasedonthe QT graphicaluserinterfaceframework(cf.
t p:/ qt­project.org), is morefeature­rich,andallows,forexample,for
Notebook
communityfavoriteforinteractiveanalyticsandalsoforteaching,
Thisis aJavaScript­basedwebbrowserversionthathasbecomethe
presenting,etc.
Theshellversionis invokedbysimplytyping ipython intheshel :
$Python
ipython2.7.6 |Anaconda 1.9.2 (x86_64)| (default, Feb 10 2014,
17:56:29)
Type "copyright", "credits" or "license" for more information.
IPython2.0.0-> Introduction
?%quickref -- AnenhancedandInteractive
overview ofPython.
IPython's features.
help -> Quick reference.
->Details abouthelp'object',
object? ->Python'sown system.use 'object??' for extra
details.
In [1]: 3 + 4 * 2
Out[1]: 11
In [2]:
Usingtheoption --pylab importsalargesetofs
libraries, like NumPy, inthenamespace: c i e n t i f i c anddataanalysis
Python
$ ipython2.7.6--pylab
|Anaconda 1.9.2 (x86_64)| (default, Feb 10 2014,
17:56:29)
Type"copyright", "credits" or "license" for more information.
IPython
?%quickref2.0.0->Introduction
-- An enhancedandInteractive
overview ofPython.
IPython's features.
help ->Quick
->Python's reference.
own help'object',
about system.use 'object??' for extra
object?
details. ->Details
Using matplotlib backend: MacOSX
In [1]: a = linspace(0, 20, 5) # linspace from NumPy
In [2]: a
Out[2]: array([ 0., 5., 10., 15., 20.])
In [3]:
Similarly,the
command: QT console of IPython i s invokedbythefollowing
$ ipython qtconsole --pylab inline
Usingtheinlineparameterinadditiontothe--pylaboptionl e t s IPython
withplotaanl graphicsi
Finally,the Notebooknline.versioni
inlineplot. s invokedasfollows: QT console
Figure2­2showsascreenshotofthe
$ ipython notebook --pylab inline
Figure2­3showsascreenshotofan
inline IPython Notebook session.The
optionagainhastheeffectthatplotswillbedisplayedin IPython
Notebook andnotinaseparatewindow.
kernel.Youcangetalistingofal theoptionsbytyping: IPython
Allinall,therearealargenumberofoptionsforhowtoinvokean
$ipython --h
RefertotheIPythondocumentationfordetailedexplanations.
Figure2­2.IPython’sQTconsole
Basicusage
Inwhatfollows,wedescribethebasicusageofthe IPython Notebook. A
fundamentalconceptofthe
Code Notebook isthatyouworkwithdifferentkinds
ofcells.Theseincludethefollowingtypes:
Containsexecutable Python code
Markdown
Containstextwrittenin Markdown languageand/or HTML
Rawtext
Containstextwithoutformatting[10]
Heading(1­6)
Headingsfortextstructuring,e.g., sectionheads

Figure2­3.IPython’sbrowser­basedNotebook
Thedifferentc
IPython
multitude of
enhanced Pythone l typesalreadyindicatethatthe Notebook i s morethanan
documentationandpresentationscenarios.Forexample,an
shellonly.Itisintendedtofulfil therequirementsofa
convertedtothe
Notebookfile, havingasuffixof .ipynb, can be
followingformats:
Pythonfile
withnoncodecPython
Generatesa codefile (.py)froman IPython Notebook file
el s commentedout.
HTMLpage
Generatesasingle HTML pagefromasingle IPython Notebook file.
HTML5reveal.js
slides
isconvertedintoapresentationwithmultiple
Makinguseofdifferentc HTML5 slides(usingthe
framework). el markingsforslideshows,a Notebook file
LaTeX/PDF
Suchafilecanalsobeconvertedtoa LaTeX file, whichthencanbe
convertedintoaPDFdocument.
RestructuredText
RestructuredText (.rst)i s used,forexample,bythe SPHINX
documentationpackagefor Python projects.
ANALYTICS AND PUBLISHINGPLATFORM
Amajoradvantageof IPython Notebook is thatyoucaneasily
publishandshareyourcomplete
analyticsprojectwith IPython i sNotebook withothers.Onceyour
finished,youcanpublishitasan
HTML pageora PDF, orusethecontentforaslidepresentation.
Theformatofan IPython Notebook file is basedonthe JavaScript
Object
Notebook Notation (JSON)standard.Thefollowingisthetextversionofthe
displayedinFigure2­3—youwillnoticesomemetadata,the
differenttypesofcells,andtheircontent,andthatevengraphicsare
translatedinto ASCII characters:
{"metadata":
"name": "" {
},"nbformat":3,
"nbformat_minor": 0,
"worksheets":
{"cells": [ [
{"cell_type": "code",
"collapsed":
"input": [ false,
"import
"import numpy as np\n", as plt"
matplotlib.pyplot
],"language":
"metadata": "python",
{},
"outputs": [],
},{"prompt_number": 1
"cell_type":
"collapsed": "code",
false, 10, 25)\n",
"a= np.linspace(0,
"input":[
"b= np.sin(a)"
],"language":
"metadata": "python",
{},
"outputs": [],
"prompt_number": 2
},{"cell_type":
"metadata": "markdown",
{},
"source":
"Inline comments
[ can be easily placed between code cells."
},]
{"cell_type":
"collapsed": "code",
"input": [ false,
"plt.plot(a, b, 'b^')\n",
"plt.grid(True)"
],"language":
"metadata": "python",
{},
"outputs": [
{"metadata": {},
"output_type":
"png": "display_data",
"iVBORw0KGgoAAAAN...SuQmCC\n",
"text": [
"<matplotlib.figure.Figure at 0x105812a10>"
]}
],"prompt_number": 3
}
],"metadata": {}
]}}
Forexample,whenconvertingsuchaf
contain LaTeX i l e to LaTeX, rawtextc e l
codesincethecontentofsuchcellsissimplypassedonby s can
theconverter.Allthisisoneofthereasonswhythe IPython Notebook is
documents,likes
nowadaysoftenusedforthecompositionoflarger,morecomplex
documentingtextina cientific researchpapers.Youhaveexecutablecodeand
differentoutputformats. singlefile thatcanbetranslatedintoanumberof
Inaexample,themathematicaldescription
financecontextthisalsomakes IPython ofan avaluableto l, since,for
algorithmandtheexecutable
Python version can liveinthesamedocument.Dependingon the usage
.g., intranet),a PDF document(e.g., clientmailings),
scenario,awebpage(e.g.,eboardmeeting)canbegenerated.Withregardtothe
orapresentation(
presentationoption,youcan,forexample,skipthosec
textpassagesthatmightbetoolongforapresentation. e l s thatmaycontain
Thebasicusageofthe Notebook
withthearrowkeysand“execute”c
Ctrl­Return.Thedifference is thatthef
is quitei
el sirbyusingeitherShift­Returnor
stnoptionmovesyouautomatically
tuitive. Youmainlynavigateit
el .sameIf it icell.The
s acodeceffect of“executing”cis executedandtheoutput(ifany)i
el s dependsonthetype of the s
cthetothenextcellafterexecutionwhilethesecondoptionletsyouremainat
el , thenthecode
shown.If it is aMarkdowncel , thecontentis renderedtoshowtheresult.
MarkdownandLaTeX
Thefollowingshowsafewselectedexamplesfor Markdown commands:
**bold** prints the text in bold
*italic* prints the text in italic
_italic_ also prints it in italic
**_italic_** bold and italic
bulletpoint lists:
** first_bullet
second_bullet
&ndash; renders to a dash
<br> inserts a line break
Figure2­4showsthesamecodebothinarawtextcell(whichlooksthe
sameastheprecedingtext)andrenderedinaMarkdownc e l . Int h i s way,
youcaneasilycombine Python codeandformatted,nicelyrenderedtextin
asingledocument.
Adetaileddescriptionofthe
Notebook Markdown languageusedfor IPython
isfoundathttp://daringfireball.net/projects/markdown/.
IPython alsorendersbydefault
Asmentionedbefore,therenderingcapabilitiesof
restrictedtotheMarkdownlanguage. IPythonare not
mathematicalformulaedescribed
example,fromChapter1theformula on thebasisofthe
for LaTeX typesetting
system,thedefactostandardforscientifictheindexlevelintheBlack­
publishing.Consider,for
Scholes­Merton(1973)model,asprovidedinEquation1­1.For
convenience,werepeatithereasEquation2­1.
Equation2­1.Black­Scholes­Merton(1973)indexlevelatmaturity
Figure2­4.ScreenshotofIPythonNotebookwithMarkdownrendering
LaTeX codethatdescribesEquation2­1looksroughlylikethe
Thefollowing:
S_T = S_0 \exp((r - 0.5\sigma^2) T + \sigma \sqrt{T} z)
Figure2­5showsarawtextc
as as e l with Markdown textandthe
well theresult renderedina Markdown cel . ThefigurealsoshowsaLaTeX
morecomplexformula:theBlack­Scholes­Mertonoptionpricingformula code,as
forEuropeancal options,asfoundinEquation3­1inChapter3.

Figure2­5.MarkdownandLaTeXforfinancialformulae
Magiccommands
OneofIPython’sstrengthsl ies inits magiccommands.Theyare“magic”
inthesensethattheyaddsomereallyhelpfulandpowerfulfunctionstothe
standard Python shellfunctionality.Basicinformationandhelpaboutthese
functions can beaccessedvia:
In [1]: %magic
IPython's 'magic' functions
===========================
Theyou magic
to function system provides a series of functions which allow
control theTherebehavior
features. are of kinds
two IPythonofitself,
magics,plusalotof
line-orientedsystem-type
and cell-
oriented.
...
ist ofallavailablemagiccommandscanbegeneratedinan IPython
Alsessionasfollows:
In [2]:%lsmagic
simpleprofilingtasks.Forsuchausecase, youcan,formightuse
Ininteractivecomputing,magiccommands %timeor%prun:
example,beusedfor
In [3]: import numpy as np
InCPU[4]:times:%timeusernp.sin(np.arange(1000000))
Walltime: 39 ms31.8ms,sys:7.87ms, total: 39.7 ms
Out[5]:
array([-0.70613761,
0. , -0.97735203])0.84147098, 0.90929743, ..., 0.21429647,
In [6]: %prun np.sin(np.arange(1000000))
3 function calls in 0.043 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall
filename:lineno(function)
1 0.041 0.041 0.043 0.043 <string>:1(<module>)
{numpy.core.multiarray.arange}
11 0.002 0.002 0.002
0.000 0.000 objects} 0.002
0.000 0.000 {method 'disable'
of '_lsprof.Profiler'
Therei el inthe%timeit
s yetanothercommand,
singlelineorawholec or %%timeit, fortimingcodesina
IPythonNotebook:
10In[6]:
loops,%timeit
best ofnp.sin(np.arange(1000000))
3: 27.5 ms per loop
thedurationofafunctionc al orasnippetofcode.
Thisfunctionexecutesanumberofloopstogetmorereliableestimatesfor
Itisnotpossibletoexplainindetail
provides.However, a l themagicfunctionsthat
IPython itself strivestomakeit as IPython
easyaspossibleto
interactivelylookupinformationabout
themosthelpfularethosel in IPython
( c f . and i t s commands.Among
ht p:/ bit.ly/ipython_tutoriali)s.ted Table2­2
Table2­2.SelectedhelpfunctionsincludedinIPython
Name Description
? Introductionandoverviewof IPython features
%quickref Quickreference
help Python’s ownhelpsystem
object? Detailsaboutthe“object”;use object?? forextradetails
Anotherfeatureof IPythonisthat i t i
Amagiccommandthatalsohelpswithcustomizings highlyconfigurable.Information
IPython is %bookmark.
abouttheconfigurationcapabilitiesis alsofoundinthedocumentation.
icustomnamessuchthatyou can later—nomatterwherethe IPythonkernel
Thisallowsthebookmarkingofarbitrarydirectoriesbytheuseofyour
s invokedfromandnomatterwhatthecurrentdirectoryis—navigate to
anyofyourbookmarkeddirectoriesimmediately(
useofalcd).The i . e . ,
followingshowshowtosetabookmarkandhowtogetalist youdonotneedto
bookmarks:
In [6]: %bookmark py4fi
InCurrent
[7]:%bookmark
bookmarks:-l
py4fi ->/Users/yhilpisch/Documents/Work/Python4Finance/
Systemshellcommands
Yetanotherreallyhelpfulfeature isfromthatyou can executecommand­
line/systemshellfunctionsdirectly an IPython promptora
cell.Tothis endyouneedtousethe !to indicatethatthefollowing Notebook
commandshouldbeescapedtothesystemshell(or
cellshouldbehandledthatway).Asasimplei %%! whenacomplete
l ustration, thefollowing
createsadirectory,movestothatdirectory,movesback,anddeletesthe
directory:
In [7]: !mkdir python4finance
In/Users/yhilpisch/python4finance
[8]: cd python4finance/
In[9]:cd..
/Users/yhilpisch
In [10]: !rm -rf python4finance/
IPython providesyouwithallthefunctionsyouwouldexpectfroma
even findtheirwayto Pythonvia
powerfulinteractivedevelopmentenvironment.Itisoftenthecasethat
people,beginnersandexpertsalike,
IPython. of examples
theuseof IPython forinteractivedataandfinancialanalytics.
il ustratingThroughoutthebook,thereareaplentitude
YoushouldalsoconsultthebookbyMcKinney(2012),andinparticular
Chapter3,forfurtherinformationonhowtouse IPython effectively.
Spyder
WhileIPythons atisfies al ofmostusers’requirementsforinteractive
analyticsandprototyping,largerprojectsgenerallydemand“something
amore.”Inparticular, IPython itself hasnomoreeditordirectlybuiltintothe
p lication.[1 ] Forallthoselookingfora traditionaldevelopment
environment, Spyder mighttherefore be agoodchoice.
Similarto IPython,Spyder hasbeendesignedtosupportrapid,interactive
developmentwith Python. However,italsohas,forexample,afull­fledged
editor,morepowerfulprojectmanagementanddebuggingcapabilities,and
Spyderyou canalsostartastandardPythonprompt
anobjectandvariableinspectoraswellasaf
shellversion.Within ul integrationoftheIPython
The built­in editorof Spyder providesal youneedtodo Python
session.
development.Amongotherfeatures( cf. t offersthefollowing:
http://code.google.com/p/spyderlib/wiki/Features),i
Highlighting
Syntaxcoloringfor Python,C/C++, and Fortran code;occurrence
highlighting
Introspection
Powerfuldynamiccodeintrospectionfeatures (e.g., codecompletion,
cal tips, objectdefinitionwithamouseclick)
Codebrowser
Browsingofclassesandfunctions
Projectmanagement
Definingandmanagingprojects;generatingto­dolists
Instantcodechecking
Gettingerrorsandwarningsonthefly(byusing pyflakes,cf.
https://pypi.python.org/pypi/pyflakes)
Debugging
Settingbreakpoints and conditionalbreakpointstousewiththe Python
debugger pdb (cf. http://docs.python.org/2/library/pdb.html)
Inaddition,
Consoles Spyder providesfurtherhelpfulfunctionality:
Openmultiple Python and IPython consoleswithseparateprocesses
each;runthecodefromtheactiveeditortab(orpartsofit)inaconsole
Variableexplorer
Editandcomparevariablesandarrays;generate2Dplotsofarrayson
thefly;inspectvariableswhiledebugging
Objectinspector
Displaydocumentationstringsinteractively;automaticallyrender,for
example,richtextformatting
Otherfeatures
Historylog;arrayeditorsimilartoaspreadsheet;directaccesstoonline
help;managementandexplorationofwholeprojects;syntaxandcode
checkingvia Pylint.
Figure2­6providesascreenshotofSpydershowing
l(lowerr
eft), thevariableinspector(upperr and the texteditor(onthe
ight), anactivePythonPythonprogramming,
console
i g h t ) . Spyder i s agoodchoicetostartwith
especiallyforthosewhoareused,forexample,tosuchenvironmentsas
findal
thoseprovidedby Matlab or R. However,advancedprogrammerswillalso
ot ofhelpfuldevelopmentfunctionalityunderasingleroof.

Figure2­6.ScreenshotofSpyder
Conclusions
IfyouareabeginnerorcasualPythondeveloperoranexpertcomingfrom
generallypretty easyin thatonlyacoupleofsimplestepsarerequired.
adifferentprogrammingbackground,gettingstartedwith Python is To
Python environmentavailableandalsotosimplifythe
begin,youshouldinstallanappropriate
tohaveaconsistent Python distribution,like Anaconda,
regularupdatingprocedures.
Withadistributionlike Anaconda youhaveavailablethemostimportant
toolstointeractivelypracticedataandfinancialanalytics,likewith
IPython, ortodeveloplargerapplicationsinamoretraditionalimplement­
test­debugfashion,likewith Spyder. Ofcourse,youcanaddtothemix
yourfavoriteeditor,whichprobablyalreadyhas Python syntax
highlightingincluded.Ifyouadditionally are lookingforsyntaxandcode
otherPython­focusededitoravailable.
checkingcapabilities,youmightconsidertheb uilt­in Spyder editororany
AppendixAintroducesanumber
documentation,andunitt e s t i n g . of bestpracticesintheareasofsyntax,
Intermsofsyntax,spacesandblanklines
stringsinanyfunctionorclass,providingbackgroundandhelpforsuch
playanimportantr
comestodocumentation,youshouldconsiderincludingdocumentation
ole, aswellastheindentationofcodeblocks.Whenit
processfromthebeginning(atleastforlargerprojects or thosesharedwith
examples.Finally,youshouldincludeunittestsinyourdevelopment
thingsasinputparameters,output,andpossibleerrors,aswellasusage
abroaderuserbase)andusededicatedtoolstosimplifythetest procedures.
FurtherReading
Thefollowingwebresourcesarehelpfulwithregardtothetopicscovered
http://docs.continuum.io/anaconda/forthe Anaconda documentation
inthischapter:
http://conda.pydata.org/docs/forthe conda documentation
http://ipython.org/ipython­doc/stable/forthe IPython documentation
http://daringfireball.net/projects/markdown/forthe
usedby IPython Notebook Markdown language
http://code.google.com/p/spyderlibforinformationabout Spyder
AgoodintroductiontoPythondeploymentandtheuseof
developmentenvironmenti s providedin: IPythonas a
WesMcKinney(2012):PythonforDataAnalysis.O’Reilly,
Sebastopol,CA.
[5]30USD(
Theycan,forexample,ingeneral be executedevenona RaspberryPi forabout
cf. http://www.raspberrypi.org),althoughmemoryissuesquicklyarisefor
whenitcomestohardware.
someapplications.Nevertheless,thiscanbeconsideredaratherlowrequirement
[6] Forthosewhowanttocontrolwhichlibraries andpackagesgetinstal ed, thereis
Miniconda, whichcomeswithaminimal Python installationonly.Cf.
http://conda.pydata.org/miniconda.html.
[7]ThereisalsoanAnacondaversion availablethatcontainsproprietarypackages
fromContinuumAnalyticscalled
maingoalistoimprovetheperformanceAccelerate.
of Thiscommercialversion,whose
typicaloperationswith Python, has to
belicensed.
[8]Thisis onlyonesubtle,butharmless,changeinthe Python syntaxfrom 2.7.x to
3.x thatmightbeabit confusingtosomeonenewto Python.
[9]Studio
For Windowsusers anddevelopers,theful integrationof Pythonof Python
isacompellingalternative.Thereisevenawholesuite in Visualtoolsfor
Visual Studio available(cf. http://pytools.codeplex.com).
From IPython 2.0on,these cel s arecalled Raw NBConvert.
[11][10] However,youcanconfigureyourfavorite
themagiccommand %editor FILENAME. editorfor IPython andinvokeit by
Chapter
Examples3.Introductory
Quantitativeanalysis,aswedefinei
and/ors—JohnForman t, is theapplicationofmathematical
tatistical methodstomarketdata.
lThischapterdivesintosomeconcreteexamplesfromquantitativefinance
ibraries forfinancialanalytics.Thefocus ltieis ontousetheflowoftheexposition,
toillustratehowconvenientandpowerfuli Pythonand its
andanumberofdetailsthatmightbeimportantinreal­worldapplications
arebecausel
nottouchedupon.Also,detailsof Python usage are mainlyskipped
a
Specifically,t t e r chaptersexplainthemfurther.
h i s chapterpresentsthefollowingexamples:
Impliedv olatilities
Optionquotesforcertainmaturitydatesaretakentobackoutthe
basis.
tradersandriskmanagers,amongothers,arefacedwithonadaily
impliedvolatilities oftheseoptionsandtoplotthem—ataskoption
MonteCarlosimulation
Theevolutionofastockindexovertimeis simulatedviaMonteCarlo
techniques,selectedresultsarevisualized,andEuropeanoptionvalues
arecalculated.MonteCarlosimulationi
at­riskcalculationsorcreditvalueadjustments.
optionpricingaswellasforriskmanagementeffortsinvolvingvalue­
s acornerstonefornumerical
Technicalanalysis
Ananalysisofhistoricaltimeseriesdatais implementedtobacktestan
investmentstrategybasedontrendsignals;bothprofessionalinvestors
andambitiousamateursregularlyengageint
analysis. his kindofinvestment
Allexampleshavetodealinsomewayswithdate­timeinformation.
Appendix Cintroduceshandlingsuchinformationwith Python,NumPy, and
pandas.
ImpliedVolatilities
GivenanoptionpricingformulaliketheseminaloneofBlack­Scholes­
Merton(1973),impliedv o l a t
differentoptionstrikesandmaturities. i l i t i e s arethosev
paribus,whenputintotheformula,giveobservedmarketquotesfor o l a t i l i t y
In this case,thevolatilityisnot an
inputparameterforthemodel/formula,buttheresultofa(numerical) valuesthat,ceteris
optimizationproceduregiventhatformula.
Theexampleweconsiderinthefollowingdiscussionisaboutanew
generationofoptions,namelyv olatility optionsontheVSTOXXvolatility
index.Eurex,thederivativesexchangethatprovidestheseoptionsonthe
Python­based
VSTOXXandrespectivefuturescontracts,establishedacomprehensive
tutorial called“VSTOXXAdvancedServices”inJune2013
abouttheindexandi t s derivativesc o n t r a c t s . [ 1 2 ]
fHowever,beforeproceedingwiththeVSTOXXoptionsthemselves,l
forthepricingofEuropean cal optionsonanunderlyingwithout et us
irst reproduceinEquation3­1thefamousBlack­Scholes­Mertonformula
dividends.
Equation3­1.Black­Scholes­Merton(1973)optionpricingformula
Thedifferentparametershavethefollowingmeaning:
St Price/leveloftheunderlying at timet
ᵰ Constantvolatility (i.e., standarddeviationofreturns)oftheunderlying
K Strikepriceoftheoption
T Maturitydateoftheoption
r Constantrisklessshortrate
ConsidernowthatanoptionquoteforaEuropeanc
Theimpliedv o l a t i l i t y ᵰimpi s al optionC*is given.
thequantitythatsolvestheimplicit
Equation3­2.
Equation3­2.Impliedvolatility givenmarketquoteforoption
Therei s noclosed­formsolutiontothisequation,suchthatonehastousea
relevantfunction,untilacertainnumberofiterations
correctsolution.Thisschemei or acertaindegreeof
numericalsolutionprocedureliketheNewtonschemetoestimatethe
terates, usingthefirstderivativeofthe
precisioni s reached.Formally,wehaveEquation3­3forsomestarting
value andfor0<n< Ą.
Equation3­3.Newtonschemefornumericallysolvingequations

Thep a r t i a l derivativeoftheoptionpricingformulawithrespecttothe
volatility iscalledVegaandisgiveninclosedformbyEquation3­4.
Equation3­4.VegaofaEuropeanoptioninBSMmodel

Thefinancialandnumericaltoolsneededarenowcomplete—eveni
wecan
codethatassumesthespecialcaset=0(Example3­1).
roughlydescribed—and havealookintotherespectivePython f only
Example3­1.Black­Scholes­Merton(1973)functions
##incl.Vega
Valuation offunction
Europeanandcallimplied
optionsvolatility
in Black-Scholes-Merton
estimation model
## bsm_functions.py
# Analytical Black-Scholes-Merton (BSM) Formula
Valuation of European
def '''bsm_call_value(S0, K,T, r,callsigma): option in BSM model.
Analytical formula.
Parameters
==========
S0 :float
initial stock/index level
K :float
strike
T maturitypricedate (in year fractions)
:float
r :float
sigma:constant
float risk-free short rate
volatility factor in diffusion term
Returns
=======
valuepresent
: floatvalue of the European call option
'''from
from math
scipyimport
importlog,statssqrt, exp
S0d1 == float(S0)
(log(S0 // K)K) ++ (r(r +- 0.50.5 ** sigma **** 2)2) ** T)T) // (sigma ** sqrt(T))
d2value=(S0
=(log(S0 sigma
* stats.norm.cdf(d1, 0.0, 1.0) (sigma sqrt(T))
- K * exp(-r-->* T)cumulative
## stats.norm.cdf * stats.norm.cdf(d2,
distribution 0.0, 1.0))
function
return value for normal distribution
# Vega function
def bsm_vega(S0, K,T, r,sigma):
''' Vega of European option in BSM model.
Parameters
==========
S0 :initial
float stock/index level
K : float
strike price
T :floatmaturity date (in year fractions)
r :float
sigmaconstant
:float risk-free shortdiffusion
volatility factor in term
rate
Returns
=======
vegapartial
: floatderivative of BSM formula with respect
to sigma,i.e.Vega
'''from math import log, sqrt
from scipy import stats
S0d1 == float(S0)
(log(S0stats.norm.cdf(d1,
/ K) + (r + 0.5 * sigma ** 2)* *sqrt(T)
T / (sigma * sqrt(T))
vega=S0*
returnvega 0.0, 1.0)
# Implied volatility function
def bsm_call_imp_vol(S0,
''' Implied volatilityK,ofT,European
r,C0, sigma_est,
call optionit=100):
in BSM model.
Parameters
==========
S0:floatinitial stock/index level
K :floatstrike price
T :floatmaturity date (in year fractions)
r : float
constant:float
risk-free short rate
sigma_est
estimate of impl. volatility
it :integer
number of iterations
Returns
=======
simga_est : floatestimated implied volatility
numerically
for''' isigma_est
in range(it):
return sigma_est-= ((bsm_call_value(S0,
/bsm_vega(S0, K, K,T, T,r, r,sigma_est))
sigma_est) - C0)
Theseareonlythebasicfunctionsneededtocalculateimpliedv
Whatweneed as o l a t i l i t i e s .
caseforEuropeancwell,ofcourse,aretherespectiveoptionquotes,inour
generatesthesingleimpliedv
al optionsontheVSTOXXindex,andthecodethat
olatilities. Wewillseehowtodothis based
onaninteractive IPython session.
Letusstartwiththedayfromwhichthequotesaretaken;i.e., ourt=0
referenceday.ThisisMarch31,2014.Att his day,theclosingvalueofthe
indexwasV0=17.6639(wechangefromStoVtoindicatethatwearenow
workingwiththevolatility index):
In [1]: V0 = 17.6639
Fortherisk­freeshortrate, weassumeavalueofr=0.01p.a.:
In [2]: r = 0.01
havetobecalculated(i.e., ᵰimp). Thedatais storedina pandasDataFrame
Allotherinputparametersaregivenbytheoptionsdata(i.e.,TandK)or
Chapter 7). Wehavetoreadit from diskintomemory:
object(seeChapter6)andsavedina PyTables databasefile(see
In [3]: import pandas as pd
h5futures_data
= pd.HDFStore('./source/vstoxx_data_31032014.h5', 'r')
=h5['futures_data']#VSTOXXfutures data
options_data = h5['options_data'] # VSTOXX call option data
h5.close()
WeneedthefuturesdatatoselectasubsetoftheVSTOXXoptionsgiven
ttime.TheirmaturitiesarethenexteightthirdFridaysofthemonth.Atthe
heir (forward)moneyness.EightfuturesontheVSTOXXaretradedatany
FridayofApriltothethirdFriday of November. TTMin thefollowing
endofMarch,therearefutureswithmaturitiesrangingfromthethird
pandas tablerepresentstime­to­maturityinyearfractions:
In [4]: futures_data
Out[4]: DATE EXP_YEAR EXP_MONTH PRICE MATURITY TTM
496497 2014-03-31 2014
2014 45 17.85
19.55 2014-04-18 0.049
498499 2014-03-31
2014-03-31 2014 67 19.95 2014-05-16
2014-06-20 0.126
0.222
500501 2014-03-31
2014-03-31
2014-03-31
2014
2014
2014
20.40
89 20.95 2014-07-18
2014-08-15 0.299
20.70 2014-09-19 0.375
0.471
502503 2014-03-31
2014-03-31 2014
2014 21.05 2014-10-17
1110 21.25 2014-11-21 0.548
0.644
Theoptionsdatasetislargersinceatanygiventrading day multiplecal
andputoptionsaretradedpermaturitydate.Thematuritydates,however,
arethesameasforthefutures.Thereareat o t a l of395c a l optionsquoted
onMarch31,2014:
In [5]:options_data.info()
Out[5]: <class 'pandas.core.frame.DataFrame'>
Int64Index: 395(totalentries, 46170 to 46564
Data
DATE columns 8 columns):
395395 non-null datetime64[ns]
EXP_YEAR
EXP_MONTH non-null
395395 non-null int64
int64
TYPE
STRIKE 395 non-null
non-null object
float64
PRICE
MATURITY 395
395 non-null float64
395 non-null datetime64[ns]
non-null
TTMdtypes: datetime64[ns](2), float64
float64(3), int64(2), object(1)
In [6]: options_data[['DATE', 'MATURITY', 'TTM', 'STRIKE',
'PRICE']].head()
Out[6]: 46170 2014-03-31
DATE 2014-04-18
MATURITY 0.049TTM STRIKE1 PRICE
16.85
46171 2014-03-31 2014-04-18 0.049
46172 2014-03-31 2014-04-18 0.049 3 14.85 2 15.85
46173
46174 2014-03-31
2014-03-31 2014-04-18
2014-04-18 0.049
0.049 45 13.85
12.85
Asisobviousinthepandast trikea)b.lWethereforewanttorestricttheanalysisto
e, therearecalloptionstradedandquoted
thatare far in­the­money(indexlevelmuchhigherthanoptions
Therearealsooptionstradedthatarefarout­of­the­money(indexlevel
muchlowerthanoptions trike).
those cal optionswithacertain(forward)moneyness,giventhevalueof
thefuturefortherespectivematurity.Weallowamaximumdeviationof
50%fromthefutureslevel.
Beforewecans t a r t , weneedtodefineanewcolumninthe options_data
DataFrame objecttostoretheresults. Wealsoneedtoimportthefunctions
fromthescriptinExample3­1:
In [7]:options_data['IMP_VOL']
# new column for implied= 0.0volatilities
In [8]: from bsm_functions import *
Thefollowingcodenowcalculatestheimpliedv
options: olatilities foral thosecal
0.5 #intolerance
In [9]: tolfor =option level for
options_data.index: moneyness
#forward= overalloption quotes
iteratingfutures_data[futures_data['MATURITY'] == \
['PRICE'].values[0] options_data.loc[option]['MATURITY']]
if#(forward
picking *therightfuturesvalue
(1 - tol) < options_data.loc[option]
['STRIKE']
# only for options with< forward*(1+tol)):
moneyness within tolerance
imp_vol =bsm_call_imp_vol(
V0,#VSTOX valueoptions_data.loc[option]['STRIKE'],
options_data.loc[option]['TTM'],
r,#shortrate
options_data.loc[option]['PRICE'],
sigma_est=2., # estimate for implied
volatility it=100)
options_data['IMP_VOL'].loc[option] = imp_vol
Intsight.Chapter6explains pandasandisyntaxthatmightnotbeobviousatf
his code,thereissomepandas t s useforsuchoperationsind e t a iilr.s t
Atthisstage,itsufficestounderstandthefollowingfeatures:
In[10]: futures_data['MATURITY']
#select the column with name MATURITY
Out[10]: 496497 2014-04-18
498499 2014-05-16
2014-06-20
500501 2014-07-18
2014-08-15
502503 2014-09-19
2014-10-17
Name: 2014-11-21
MATURITY, dtype: datetime64[ns]
In [11]: options_data.loc[46170]
#selectdatarowfor index 46170
Out[11]: EXP_YEAR
DATE 2014-03-31 00:00:00 20144
EXP_MONTH
TYPE C1
STRIKE
PRICE
MATURITY
TTMIMP_VOL 2014-04-18 16.85
00:00:00
0.0490
Name: 46170, dtype: object
In [12]: options_data.loc[46170]['STRIKE']
## select
for indexonlythe
46170 value in column STRIKE
Out[12]: 1.0
Theimpliedv o l a t i l i t i e
this end,weuseonlythesubsetofthe
havecalculatedtheimpliedv s fortheselectedoptionsshallnowbevisualized.To
olatilities:options_data objectforwhichwe
In [13]: plot_data = options_data[options_data['IMP_VOL']> 0]
Tovisualizethedata,wei terate overal asmaturitiesofthedataset
maturitiesappearmultipletimes,weneedtouseal
theimpliedv olatilities bothaslinesand singlepoints.Sincea l and plot
it le trick togetto
a
wesortthe set object(cf.alsoChapter4):[13] set operationgetsridof
nonredundent,sortedlistwiththematurities.The
allduplicates,butmightdeliveranunsortedsetofthematurities.Therefore,
In [14]: maturities
maturities = sorted(set(options_data['MATURITY']))
Out[14]: [Timestamp('2014-04-18
Timestamp('2014-05-16 00:00:00'),
00:00:00'),
Timestamp('2014-06-20
Timestamp('2014-07-18 00:00:00'),
Timestamp('2014-08-15 00:00:00'),
00:00:00'),
Timestamp('2014-09-19
Timestamp('2014-10-17 00:00:00'),
Timestamp('2014-11-21 00:00:00'),
00:00:00')]
Thefollowingcodeiteratesover al maturitiesanddoestheplotting.The
resultisshownasFigure3­1.Asinstockorforeignexchangemarkets,you
willnoticetheso­calledvolatilitysmile,whichismostpronouncedforthe
shortestmaturityandwhichbecomesab
maturities: i t lesspronouncedforthelonger
In [15]: import
%matplotlib matplotlib.pyplot
inline as plt
plt.figure(figsize=(8,
for maturity in maturities:6))
data#select
=plot_data[options_data.MATURITY
data for this maturity == maturity]
plt.plot(data['STRIKE'],
label=maturity.date(), data['IMP_VOL'],
lw=1.5)
plt.plot(data['STRIKE'], data['IMP_VOL'], 'r.')
plt.grid(True)
plt.xlabel('strike')plt.ylabel('impliedvolatility of volatility')
plt.legend()
plt.show()
ilities (ofvolatility) forEuropeancalloptions on the
Figure3­1.ImpliedvolatVSTOXXonMarch31,2014
Toconcludet h i s example,wewanttoshowanotherstrengthof
namely,forworkingwithhierarchicallyindexeddatasets.The pandas:
DataFrame
object options_data hasanintegerindex,whichwehaveusedinseveral
places.However,thisindexisnotreallymeaningful—iti
by acombinationofthematurity s “just”anumber.
and thestrike—i.e.,there
Theoptionquotesfortheday
(“identified”) March 31,2014areuniquelydescribed i s
onlyonecal optionpermaturityandsto trike.
The groupby methodcanbeused capitalizeonthis insightandtogeta
moremeaningfulindex.Tot h i s end,wegroupby MATURITY f i r s
bythe STRIKE. Weonlywanttokeepthe PRICE and IMP_VOL columns: t andthen
In [16]: group_data
keep= ['PRICE',
= 'IMP_VOL']
plot_data.groupby(['MATURITY', 'STRIKE'])[keep]
group_data
Out[16]: <pandas.core.groupby.DataFrameGroupBy object at
0x7faf483d5710>
dataelementineverygroup:
sum.Takingthesumyieldsthesingledatapointsincethereisonlyone
weneedtoapplyanaggregationoperationontheobject,liketakingthe
TheoperationreturnsaDataFrameGroupByobject.[14] Togettothedata,
In [17]: group_data.head()
group_data = group_data.sum()
Out[17]: MATURITY STRIKE PRICE IMP_VOL
2014-04-18 910 8.85 7.85 2.083386
1.804194
1112 6.85 1.550283
13 4.85 1.316103
5.85 1.097184
Theresulting DataFrame objecthastwoindexlevelsandtwocolumns.The
followingshowsallvaluesthatthetwoindicescantake:
In [18]: group_data.index.levels
Out[18]: FrozenList([[2014-04-18 00:00:00, 2014-05-16 00:00:00, 2014-
06-20 00:00:00, 2014-07-18 00:00:00, 2014-08-15 00:00:00, 2014-09-19
00:00:00,4-10-17
201 00:00:00, 2014-11-21 00:00:00], [9.0, 10.0, 11.0,
12.0, 13.0,4.0,1 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0,
24.0, 25.0,26.0, 27.0, 28.0, 29.0, 30.0]])
MonteCarloSimulation
MonteCarlosimulationis oneofthemostimportantalgorithmsinfinance
andnumericalscienceingeneral.I
is quitepowerfulwhenitcomestooptionpricingorriskmanagement
ts importancestemsfromthefactthatit
complexityandcomputationaldemand,respectively,generallyincreasein
linearfashion.
methodcaneasilycopewithhigh­dimensionalproblemswherethe
problems.Incomparisontoothernumericalmethods,theMonteCarlo
ThedownsideoftheMonteCarlomethod is thatit is per se memoryeven
computationallydemandingandoftenneedshugeamountsof
Carloalgorithmse
forquitesimpleproblems.Therefore,i
f iciently. Theexamplethatfollowsi
t is necessarytoimplementMonte
l ustrates different
implementationstrategiesin Python andoffersthreedifferent a
implementationapproachesforaMonteCarlo­basedvaluationof
Europeanoption.[15]Thethreeapproachesa
PurePython r e : [ 1 6 ]
Thisexamplestickswiththestandardlibrary—i.e.,thosel
packagesthatcomewithastandard Python i b r a r
installation—andusesonlyi e s and
built­in Python capabilities to implementtheMonteCarlovaluation.
VectorizedNumPy
Thisimplementationusesthecapabilitiesof NumPy tomakethe
implementationmorecompactandmuchfaster.
FullyvectorizedNumPy
Thefinalexamplecombinesadifferentmathematicalformulationwith
thevectorizationcapabilitiesof
versionofthesamealgorithm. NumPy togetanevenmorecompact
Theexamples are againbasedonthemodeleconomy of
stochasticdifferentialequation(SDE),asinEquation3­5. Black­Scholes­
level)follows,underriskneutrality,ageometricBrownianmotionwitha
Merton(1973),wheretheriskyunderlying(e.g., astockpriceorindex
Equation3­5.Black­Scholes­Merton(1973)stochasticdifferentialequation
Theparametersaredefined
for asin Equation3­1 andZ i s aBrownianmotion.
Adiscretizationscheme theSDEinEquation3­5isgivenbythe
differenceequationinEquation3­6.
Equation3­6.EulerdiscretizationofSDE
Thevariablezi s astandardnormallydistributedrandomvariable,0<ᵮt<
T,a(smallenough)timeinterval.Italsoholds0< tģT withTthef i n a l
timeh o r i z o n . [ 1 7 ]
WeparameterizethemodelwiththevaluesS =100,K=105,T=1.0,r=
0.05,ᵰ=0.2.UsingtheBlack­Scholes­Mertonformula 0 as inEquation3­1
andExample3­1fromthepreviousexample,wecancalculatetheexact
optionvalueasfollows:
In [19]: from
S0K ==105.bsm_functions
100. import bsm_call_value
Tr==1.00.05
sigma=0.2
bsm_call_value(S0, K, T, r, sigma)
Out[19]: 8.0213522351431763
followingrecipecanbeapplied:
implementaMonteCarlovaluationoftheEuropeanc
Thisis ourbenchmarkvaluefortheMonteCarloestimatorstofollow.To
al option,the
1.Dividethetimeinterval[0,T]inequidistantsubintervalsoflength
ᵮt.
2. Startiteratingi=1, 2, , I .
a. numbersz
Foreverytimestept ò
t(i). {ᵮt,2ᵮt,,T},drawpseudorandom
b. DeterminethetimeTvalueoftheindexlevelS T(i) by
applyingthepseudo­randomnumberstimestepbytimestep
tothediscretizationschemeinEquation3­6.
d.c. Determinetheinnervalueh
Iterate until i=I. T(i)– K,0).T oftheEuropeancal optionat T
ashT(ST(i))=max(S
3. risklessshortrateaccordingtoEquation3­7.
Sumuptheinnervalues,average,anddiscountthembackwiththe
Equation3­7providesthenumericalMonteCarloestimatorforthevalueof
theEuropeancal option.
Equation3­7.MonteCarloestimatorforEuropeancalloption

PurePython
purePython.Thecodesimulates250,000pathsover50timesteps.
Example3­2translatestheparametrizationandtheMonteCarlorecipeinto
Python
Example3­2.MonteCarlovaluationofEuropeancalloptionwithpure
##Monte Carlo valuation of European call options with pure Python
## mcs_pure_python.py
from
from time import time
from random importexp,gauss,sqrt,seedlog
math import
seed(20000)
t0=time()
#S0=100.
Parameters# initial value
strike price
KT == 1.0105. ##maturity
# riskless
rsigma= 0.05= 0.2# short rate
volatility
Mdt==50T/#Mnumberof
# timetimestepsinterval
lengthof
I= 250000# number of paths
Sfor# =SimulatingI
[]path paths with M time steps
fori int=range(I):
in[]range(M + 1):
if t==path.append(S0)
0:
else:z = gauss(0.0, 1.0)
* sigma*** z)2) * dt
St = path[t - 1] * exp((r+ sigma-0.5* sqrt(dt)
path.append(St)
S.append(path)
#Calculating
C0 = exp(-r * theT) *sum([max(path[-1]
Monte Carlo estimator- K, 0) for path in S]) / I
#Results=time()output- t0Option Value %7.3f" % C0
tpyprint"European
print "Duration in Seconds %7.3f" % tpy
Runningthescriptyieldsthefollowingoutput:
In [20]: %run mcs_pure_python.py
Out[20]: European
Duration Option
in SecondsValue 34.258
7.999
Notethattheestimatedoptionvaluei tself dependsonthepseudorandom
numbersgeneratedwhilethetimeneededisinfluencedbythehardwarethe
Themajorpartofthe codeinExample3­2consistsofanestedloopthat
scriptisexecutedon.
generatesstep­by­stepsinglevaluesofanindexlevelpathintheinnerloop
andaddscompletedpathstoalistobjectwiththeouterloop.TheMonte
Carloestimatoriscalculatedusing Python’slist comprehensionsyntax.
Theestimatorcouldalsobecalculatedbya for loop:
In [21]: forsum_val#C-like
pathin= 0.0S:iterationfor comparison
sum_val *+=T)max(path[-1]
C0round(C0,
=exp(-r * sum_val / -IK, 0)
3)
Out[21]: 7.999
Althoughthisloopyieldsthesamer e s u l t , the list comprehensionsyntax
is morecompactandclosertothemathematicalnotationoftheMonte
Carloestimator.
VectorizationwithNumPy
NumPy providesapowerfulmultidimensional array clas , called ndarray,
as wellasacomprehensive set offunctionsandmethodstomanipulate
arraysandimplement(complex)operationsonsuchobjects.Fromamore
generalpointofview,therearetwomajorbenefitsofusing NumPy:
SyntaxNumPy generallyallowsimplementationsthataremorecompactthan
pure Python andthatareofteneasiertoreadandmaintain.
SpeedThemajorityof NumPy codeis implementedin C or Fortran, which
makes NumPy, whenusedintherightway,fasterthanpurePython.
Thegenerallymorecompactsyntaxstemsfromthefactthat NumPy brings
powerfulvectorizationandbroadcastingcapabilitiesto
similartohavingvectornotationinmathematicsforlargevectorsor Python. Thisis
matrices.Forexample,assumethatwehaveavectorwiththef
naturalnumbers,1, ,100: irst 100

Scalarmultiplicationofthis vector is writtencompactlyas:

Let’sseeif wecandothis with Pythonlist objects,forexample:


In [22]: v = range(1, 6)
print v
Out[22]: [1, 2, 3, 4, 5]
In [23]: 2 * v
Out[23]: [1, 2, 3, 4, 5, 1, 2, 3, 4, 5]
Naivescalarmultiplicationdoesnotreturnthescalarproduct.I t rather
returns,inthis case,twotimestheobject(vector).With NumPythe resultis,
however,asdesired:
In [24]: import numpy as np6)
vv = np.arange(1,
Out[24]: array([1, 2, 3, 4, 5])
In [25]: 2 * v
Out[25]: array([ 2, 4, 6, 8, 10])
ThisapproachcanbebeneficiallyappliedtotheMonteCarloalgorithm.
Example3­3providestherespectivecode,t his timemakinguseofNumPy’s
Example3­3.MonteCarlo valuationofEuropeancalloptionwithNumPy
vectorizationcapabilities.
(first version)
## Monte Carlo valuation of European call options with NumPy
mcs_vector_numpy.py
#importmath
import
from timenumpyimportas nptime
np.random.seed(20000)
t0 = time()
#S0=Parameters
M = 50;100.;dt =K T= /105.;M; IT==1.0;
250000r = 0.05; sigma = 0.2
#S Simulating
=S[0] np.zeros((M I paths
+ 1, with
I)) M time steps
for t=inS0range(1, M + 1):
zS[t]= np.random.standard_normal(I)
= S[t - 1] * np.exp((r - 0.5 # sigma
* pseudorandom
** numbers
all pathsdt z)
2) *
#vectorized operation per time+ sigmastep*overmath.sqrt(dt)*
C0 = math.exp(-rthe*Monte
#Calculating Carlo estimator - K, 0)) / I
T) * np.sum(np.maximum(S[-1]
# Resultstime()output- t0
tnp1=
print "Duration
print "European inOptionSecondsValue %7.3f"
%7.3f" %% tnp1
C0
Letusrunthis script:
In [26]: %run mcs_vector_numpy.py
Out[26]: Duration
European inOptionSecondsValue 8.037
1.215
In [27]: round(tpy / tnp1, 2)
Out[27]: 28.2
Vectorizationbringsaspeedupofmorethan30timesincomparisontopure
Python. TheestimatedMonteCarlovalueis againquiteclosetothe
Thevectorizationbecomesobviouswhenthepseudorandomnumbersare
benchmarkvalue.
generated. In
singlestep,i.ethelineinquestion,250,000numbersaregeneratedina
., asinglelineofcode:
z = np.random.standard_normal(I)
Similarly,this vectorofpseudorandomnumbersis appliedtothe Python
sense,thetasksthatareaccomplishedbytheouterloopinExample3­2are
nowdelegatedtoNumPy,avoidingtheouterloopcompletelyonthe
discretizationschemeatoncepertimestepinavectorizedfashion.Inthat
level.
VECTORIZATION
Usingvectorizationwith NumPy generallyresultsincodethati
compact,easiertoread(andmaintain),andf a s t e r s more
toexecute.Allthese
aspectsareingeneralimportantforfinancialapplications.
FullVectorizationwithLogEulerScheme
UsingadifferentdiscretizationschemefortheSDEinEquation3­5can
yieldanevenmorecompactimplementation
TotwhichtakesontheforminEquation3­8.
h i s of theMonteCarloalgorithm.
end,considerthelogversionofthediscretizationinEquation3­6,
Equation3­8.EulerdiscretizationofSDE
(logversion)
MonteCarloalgorithmwithoutanylooponthe Python level.Example3­4
Thisversioniscompletelyadditive,allowingforanimplementationofthe
showstheresultingcode.
Example3­4.MonteCarlovaluationofEuropeancalloptionwithNumPy
(secondversion)
# Monte
#version)# Carlo valuation of European call options with NumPy (log
mcs_full_vector_numpy.py
#import math
fromnumpy
from# starimport
timeimport importtime*
for shorter code
random.seed(20000)
t0 = time()
#S0=100.;
ParametersK = 105.; T = 1.0; r = 0.05; sigma = 0.2
M =50; dt = T / M; I = 250000
#S=S0*
Simulatingexp(cumsum((r
I paths with- 0.5M time* sigma** steps 2) * dt
+sigma * math.sqrt(dt)
*random.standard_normal((M
cumsumvalues alsodo + 1, I)), axis=0))
S[0] # ifsum= S0onlyinsteadthethefinal
# Calculating ofMonte Carlowouldareofinterest
estimator
C0 = math.exp(-r * T) * sum(maximum(S[-1] - K, 0)) / I
# Results output
tnp2
print=time() - t0OptionSecondsValue %7.3f" % C0
"European
print "Duration in % tnp2
Letusrunthis thirdsimulationscript.
In [28]: %run mcs_full_vector_numpy.py
Out[28]: European
Duration OptionValue
in Seconds 1.439 8.166
farTheexecutionspeedissomewhatslowercomparedtothef
one can gosometimeswith
graspwhatexactlyisgoingononthe
readabilityinthatt NumPy NumPy irst NumPy
implementation.Theremightalsobeatrade­offbetweencompactnessand
his implementationapproachmakesitquitedifficultto
vectorization.
level.However,itshowshow
GraphicalAnalysis
Finally,let us haveagraphicallookmatplotlib
Chapter5foranexplanationofthe at theunderlyingmechanics(referto
plottinglibrary). First, we
output:
plotthefirst 10simulatedpathsoveralltimesteps.Figure3­2showsthe
In [29]: import matplotlib.pyplot as plt
plt.grid(True)
plt.plot(S[:,:10])
plt.xlabel('time
plt.ylabel('indexstep')
level')
Figure3­2.Thefirst 10simulatedindexlevelpaths
Second,wewanttoseethefrequencyofthesimulatedindexlevels at the
iendofthesimulationperiod.Figure3­3showstheoutput,t
l ustrating the(approximately)log­normaldistributionoftheend­of­period
his time
indexlevelvalues:
In [30]: plt.hist(S[-1],
plt.xlabel('indexbins=50)
plt.grid(True) level')
plt.ylabel('frequency')
Thesametypeoffigurelookscompletelydifferentfortheoption’send­of­
period(maturity)innervalues,asFigure3­4il ustrates:
In [31]: plt.hist(np.maximum(S[-1] - K, 0), bins=50)
plt.grid(True) inner value')
plt.xlabel('option
plt.ylabel('frequency')
plt.ylim(0, 50000)

Figure3­3.Histogramofal simulatedend­of­periodindexlevelvalues
Figure3­4.Histogram of al simulatedend­of­periodoptioninnervalues
In this case,themajorityofthesimluatedvaluesarezero,indicatingthat
theEuropeancalloptionexpiresworthlessinasignificantamountofcases.
Theexactnumberis generatedthroughthefollowingcalculation:
In [32]: sum(S[-1] < K)
Out[32]: 133533
Thisnumbermightvarysomewhat,ofcourse,fromsimulationto
simulation.
TechnicalAnalysis
financeprofessionalsand
Technicalanalysisbasedonhistoricalpriceinformationi
interestedamateursengagei n . s atypicaltask
OnWikipediayou
findthefollowingdefinition:
Infinance,technicalanalysisisasecurityanalysismethodologyfor
forecastingthedirectionof
primarilypriceandvolume. pricesthroughthestudyofpastmarketdata,
purposes, and nottoomuchonusingourinsightstopredictfutureprice
Inwhatfollows,wefocusonthestudyofpastmarketdataforbacktesting
movements.Ourobjectofstudyi
namesincludedintheindexandthet
500(S&P500),whichi s thebenchmarkindexStandard&Poor’s
wholestockmarketintheUnitedStates.Thisi
s generallyconsideredtobeagoodproxyforthe
otal marketcapitalizationrepresented
s duetothehighnumberof
byiWewillreadhistoricalindexlevelinformationfromawebsourceandwill
timplementasimplebacktestingforatradingsystembasedontrendsignals.
. It alsohashighlyliquidfuturesandoptionsmarkets.
pandas
Butfirstlweneedthedatatogets tarted. Totofhirelatedtechnicalissues.Since
ibrary, whichsimplifiesanumber s end,wemainlyrely on the
it is almostalwaysused,weshouldalsoimport NumPy bydefault:
In [33]: import
import numpy
pandasasasnppd
import pandas.io.data as web
SCIENTIFIC AND FINANCIAL PYTHON STACK
Inadditionto NumPy andSciPy,thereareonlyacoupleofimportant
librariesthatformthefundamentals
stack.Amongthemis pandas.Make
(stable)versionsofthese entific andfinancial Python
cisuretoalwayshavecurrent
syntaxand/orAPIchanges).l i b r a r i e s installed(butbeawareofpotential
Thesublibrary pandas.io.data containsthefunction
particularfromthepopularYahoo!Finances DataReader, which
helpswithgettingfinancialtimeseriesdatafromdifferentsourcesandin
arelookingfor,startingonJanuary1,2000: ite. Let’sretrievethedatawe
In[34]: sp500 = web.DataReader('^GSPC', data_source='yahoo',
start='1/1/2000', end='4/14/2014')
sp500.info()
Out[34]: <class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 3592 entries, 2000-01-03 00:00:00 to 2014-04-
14 00:00:00Data
Open columns (total
3592 6 columns):
non-nullfloat64
High
LowClose 3592 3592 non-null
non-null float64
float64
Volume 3592
3592 non-null
non-null float64
int64
Adjdtypes:Closefloat64(5),
3592 non-null
int64(1)float64
DataReader hasconnectedtothedatasourceviaanInternetconnection
andhasgivenbackthetimeseriesdatafortheS&P500index,fromthe
first tradingdayin2000until theenddate.Ithasalsogenerated
automaticallyatimeindexwith Timestamp
Togetafirstimpression,wecanplot objects.
givesanoutputlikethatinFigure3­5:theclosingquotesovertime.This
In [35]: sp500['Close'].plot(grid=True, figsize=(8, 5))

Figure3­5.HistoricallevelsoftheS&P500index
Thetrendstrategywewanttoimplementi s basedonbothatwo­month(
42tradingdays) and aone­year(i.e.,252tradingdays)trend( e i.e.,
i.e.,thpandas
movingaverageoftheindexlevelfortherespectiveperiod).Again,
makesi t efficienttogeneratetherespectivetimeseriesandtoplotthethree
relevanttimeseriesinasinglefigure.First, thegenerationofthetrenddata:
window=42),
In [36]: sp500['42d']
2) = np.round(pd.rolling_mean(sp500['Close'],
window=252),sp500['252d']
2) = np.round(pd.rolling_mean(sp500['Close'],
InpandasDataFrame object and putsinthevaluesforthe42­daytrend.The
thisexample,thefirstlinesimultaneouslyaddsa new columntothe
secondlinedoesthesamewithrespecttothe252­daytrend.Consequently,
wenowhavetwo new
we columns.Thesehavefewerentriesduetothevery
availableforthef
onlya
natureofthedata
t thosedateswhen42and252observationpoints,respectively,are
irst timetocalculatethedesireds
havegeneratedforthesecolumns—i.e.,theystart
tatistics:
In[37]: sp500[['Close', '42d', '252d']].tail()
Out[37]: Date Close 42d 252d
2014-04-08
2014-04-09 1851.96
1872.18 1853.88
1855.66 1728.66
1729.79
2014-04-10
2014-04-11 1833.08 1856.46 1730.74
2014-04-14 1830.61 1856.63 1731.64
1815.69 1856.36 1732.74
Second,theplottingofthenewdata.TheresultingplotinFigure3­6
alreadyprovidessomeinsightsintowhatwasgoingoninthepastwith
respecttoupwardanddownwardtrends:
In5))[38]: sp500[['Close', '42d', '252d']].plot(grid=True, figsize=(8,
Figure3­6.TheS&P500indexwith42dand252dtrendlines
Ourbasicdatasetis mainlycomplete,suchthatwenowcandevisearule to
generatetradingsignals.Therulesaysthefollowing:
Buysignal(golong)
the42dtrendisforthefirst timeSDpointsabovethe252dtrend.
Wait(parkincash)
the42dtrendis withinarangeof+/–SDpointsaroundthe252dtrend.
Sellsignal(goshort)
the42dtrendis forthefirst timeSDpointsbelowthe252dtrend.
his end,weaddanewcolumntothe pandasDataFrame objectforthe
Totdifferencesbetweenthetwotrends.Asyoucansee,numericaloperations
pandas caningeneralbeimplementedinavectorizedfashion,inthat
withonecantakethedifferencebetweentwowholecolumns:
In [39]: sp500['42-252'] = sp500['42d'] - sp500['252d']
sp500['42-252'].tail()
Out[39]: Date
2014-04-08 125.22
2014-04-09
2014-04-10 125.87
125.72
2014-04-11
2014-04-14 124.72
Name: 42-252, 123.89
dtype: float64
pandas takes care ofthisbyputting NaNin values
positions:
Onthelastavailabletradingdatethe42dtrendl
trend.Althoughthenumberofentries aitestherespectiveindex
thetwotrendcolumnsisnotequal,
wellabovethe252d
In [40]: sp500['42-252'].head()
Out[40]: Date
2000-01-03 NaNNaN
2000-01-04
2000-01-05 NaNNaN
2000-01-06
Name: 42-252,NaNdtype: float64
2000-01-07
Tomakei t moreformal,weagaingenerateanewcolumnforwhatwec
aregime.Weassumeavalue of 50forthesignalthreshold: al
In [41]: SD = 50
sp500['Regime']
sp500['Regime'] == np.where(sp500['42-252']
np.where(sp500['42-252'] >< SD,-SD,1,-1,0)
sp500['Regime'])
sp500['Regime'].value_counts()
Out[41]: 10 1489
-1dtype:1232871int64
Inwords,on1,489tradingdates,the42dtrendl
abovethe252dtrend.On1,232days,the42dtrendi i e s morethan SD points
s morethan SDspoints
thefollowingtwolines of code:est therefora(longer)while.Thisi
wecallregimeandwhatisillustratedinFigure3­7,whichisgeneratedby
belowthe252dtrend.Obviously,i
thelong­termtrendittendstor f theshort­termtrendcrossesthelineof
what
In [42]: sp500['Regime'].plot(lw=1.5)
plt.ylim([-1.1, 1.1])
Figure3­7.Signalregimesovertime
signals.Weassumeforsimplicitythat an investorcandirectlyinvest in the
Everythingis nowavailabletotest theinvestmentstrategybasedonthe
indexorcandirectlyshorttheindex,whichintherealworldmustbe
accomplishedbyusingindexfunds,exchange­tradedfunds,orfutureson
whichweneglecthere.Thisseemsjustifiable sincewedonotplantotrade
theindex,forexample.Suchtradesinevitablyleadtotransactioncosts,
Basedon therespectiveregime,theinvestoreitheris longorshortinthe
“toooften.”
market(index)orparkshiswealthincash,whichdoesnotbearany
interest. Thissimplifiedstrategyallowsustoworkwithmarketreturns
only.Theinvestormakes the marketreturnwhen he i s long(
negativemarketreturnswhenheisshort(–1),andmakesnoreturns(0) 1 ) , makesthe
Python, wehavethefollowingvectorized pandas operationtocalculatethe
whenheparkshiswealthincash.Wethereforeneedthereturnsf irst. In
logreturns.Notethatthe shift methodshifts atimeseriesbyasmany
indexentriesas desired—inourcasebyonetradingday,suchthatweget
dailylogreturns:
In[43]: sp500['Market'] = np.log(sp500['Close'] /
sp500['Close'].shift(1))
returnsofthetrend­basedtradingstrategy—wej
Recallinghowweconstructedourregimes,itisnowsimple to getthe
ust havetomultiplyour
Regime column,shiftedbyoneday,bythe Returns
is built“yesterday”andyields“today’s”returns): columns(theposition
In [44]: sp500['Strategy'] = sp500['Regime'].shift(1) *
sp500['Market']
Thestrategypays off well;theinvestor i s able to lockinamuchhigher
returnovertherelevantperiodthanaplainlonginvestmentwouldprovide.
thecumulative,continuousreturnsofourstrategy:
Figure3­8comparesthecumulative,continuousreturnsoftheindexwith
In'Strategy']].cumsum().apply(np.exp).plot(grid=True,
[45]: sp500[['Market',
(8, 5)) figsize=
Figure3­8.TheS&P500indexvs. investor’swealth
Figure3­8showsthatespeciallyduringmarketdownturns(2003and
2008/2009)theshortingofthemarketyieldsquitehighreturns.Although
thestrategydoesnotcapturethewholeupsideduringbullishperiods,the
However,wehavetokeepinmindthatwecompletelyneglect
strategyasawholeoutperformsthemarketquitesignificantly. operational
issues( like tradeexecution) and relevantmarketmicrostructureelements
(values.Aquestionwouldbewhentoexecuteanexitfromthe
e.g., transactioncosts).Forexample,weareworkingwithdailyclosing
beinglongtobeingneutral/incash):onthesamedaya t market(from
theclosingvalue or
impactontheperformance,buttheoverallresultwouldprobablyp
thenextdayattheopeningvalue.Suchconsiderationsforsurehaveanersist.
Also,transactioncostsgenerallydiminishreturns,butthetradingruledoes
notgeneratetoomanysignals.
FINANCIAL TIME SERIES
Wheneveritcomestotheanalysisoffinancialtimeseries,consider
withthpandas.
using Almostanytimeseries­relatedproblemcanbetackled
is powerfull ibrary.
Conclusions
Python
Withoutgoingintotoomuchd
bythemeansofconcreteandtypicalfinancialexamples:
etail, this chapterillustratestheuseof
Calculationofimpliedv
Usingreal­worlddata,olaintiltheitieforms ofacrosssectionofoptiondatafora
givenday,wecalculatenumericallytheimpliedvolatilitiesofEuropean
calloptionsontheVSTOXXv
somecustom Python functions( olaet.igl.,itforanalyticaloptionvaluation)
y index.Thisexampleintroduces
andusesfunctionalityfrom NumPy,SciPy, and pandas.
MonteCarlosimulation
Usingdifferentimplementationapproaches,wesimulatetheevolution
ofanindexlevelovertimeanduseoursimulatedend­of­periodvalues
NumPy, themajorbenefitsofvectorizationofPythoncodeare
toderiveMonteCarloestimatorsforEuropeancalloptions.Using
il ustrated: namely,compactnessofcodeandspeedofexecution.
Backtestingoftrendsignalstrategy
UsingrealhistoricaltimeseriesdatafortheS&P500,webacktestthe
performanceofatradingstrategybasedonsignalsgeneratedby42­day
capabilitiesandconvenienceof pandas whenitcomestotimeseries
and252­daytrends(movingaverages).Thisexamplei l ustrates the
analytics.
Intermsofworkingwith Python,this chapterintroducesinteractive
financialanalytics(usingtheIPythoninteractives hel ), workingwithmore
implementation of algorithmsusingvectorization.Oneimportanttopicis
complexfunctionsstoredinmodules,aswellastheperformance­oriented
notcovered:namely,objectorientationandclassesin Python. Forthe al
curiousreader,AppendixBcontainsaclassdefinitionforaEuropeanc
optionwithmethodsbasedonthefunctionsfoundinthecodeof
Example3­1inthis chapter.
FurtherReading
Themajorreferencesusedinthis chapterare:
Black,FischeriabiandlitieMyronScholes(1973):“ThePricingofOptionsand
pp.638­659.
CorporateL s.” JournalofPoliticalEconomy,Vol.81,No.3,
Hilpisch,Yves(2015):DerivativesAnalyticswithPython.Wiley
Finance,Chichester,England.http://www.derivatives­analytics­with­
python.com.
Hilpisch,Yves(2013):“EfficientDataandFinancialAnalyticswith
http://hilpisch.com/YH_Efficient_Analytics_Article.pdf.
Python.”SoftwareDeveloper’sJournal,No.13,pp.56­65.
Merton,Robert(1973):“TheoryofRationalOptionPricing.”Bell
Journal of EconomicsandManagementScience,Vol.4,pp.141­183.
Chapter19alsodealswithoptionsbasedontheVSTOXXvolatility index;it
[12]calibratesanoptionpricingmodeltomarketquotesandvaluesAmerican,
nontradedoptionsgiventhecalibratedmodel.
[13]Aswe areonlyconsideringasingleday’sworthoffuturesandoptionsquotes,
MATURITY columnofthe
theinformationab futures_data objectwouldhavedeliveredthe
[14] Notethatyoucanalwayslookupattributesandmethodsofunknown
it moreeasilysincetherearenoduplicates.
usingthe Python built­in function dir,like with dir(group_data). objectsby
[15]memory.ForthevaluationofstandardEuropeanoptionsthis is notnecessary,as
Althoughnotneededhere,allapproachesstorecompletesimulationpathsin­
thecorrespondingexampleinChapter1shows.However,forthevaluation
Americanoptionsorforcertainriskmanagementpurposes,wholepathsareneeded. of
TheseMonteCarloexamplesandimplementationapproachesalsoappearinthe
[16]articleHilpisch(2013).
[17] Fordetails,refertothebookbyHilpisch(2015).
PartII. FinancialAnalyticsand
Development
partof thebookrepresentsits core.It introducesthemostimportant
Thisapplicationdevelopment.Thesheernumberoftopicscoveredint
Python libraries, techniques,andapproachesforfinancialanalyticshisandpart
makesitnecessarytofocusmainlyonselected,andpartlyratherspecific,
examplesandusecases.
beused as areference totowhichthereader
consistsofthefollowingchapters:
examplesanddetailsrelated cometolookupofhisthebook
canThechaptersareorganizedaccordingtocertaintopicssuchthatt
atopicofinterecanst. Thiscorepart part
Chapter4on Python datatypes and structures
Chapter5on2Dand3Dvisualizationwith matplotlib
Chapter6onthehandlingoffinancialtimeseriesdata
Chapter7on(performant)input/outputoperations
Chapter8onperformancetechniquesandlibraries
Chapter9onseveralmathematicaltoolsneededinfinance
Chapter10onrandomnumbergenerationandsimulationofstochastic
processes
Chapter11onstatistical applicationswith Python
Chapter12ontheintegrationofPythonand ExcelPython
Chapter13onobject­orientedprogrammingwith
developmentof(simple)graphicaluserinterfaces(GUIs) andthe
Chapter14ontheintegrationof Pythonwithweb technologiesaswell
asthedevelopmentofweb­basedapplicationsandwebservices
Chapter4.DataTypesand
Structures
Badprogrammersworryaboutthecode.Goodprogrammersworryaboutdata
structuresandtheirrelationships.
—LinusTorvalds
Thischapterintroducesbasicdatatypesanddatastructuresof Python.
structureswithit,NumPyandotherlibrariesaddtotheseinavaluablefashion.
AlthoughthePythoninterpreteritself alreadybringsarichvarietyofdata
Thechapterisorganized as follows:
Basicdatatypes
string.irst sectionintroducesbasicdatatypessuchas int,float,and
Thef
Basicdatastructures
ThenextsectionintroducesthefundamentaldatastructuresofPython
e.g., listobjects) andil ustrates controlstructures,functional
(programmingparadigms,andanonymousfunctions.
NumPydatastructures
Thefollowingsectioni s devotedtothecharacteristicsandcapabilitiesof
theNumPy ndarray classandi
forscientificandfinancialapplications.
l ustrates someofthebenefitsofthisclass
Vectorizationofcode
Asthefinalsectioni l ustrates, thanksto NumPy’s arrayclassvectorized
codeiseasilyimplemented,leadingtomorecompactandalsobetter­
Thespiritofthischapteristoprovideageneralintroductionto
performingcode. Python
specificswhenitcomestodatatypesandstructures.I
abackground from f
anotherprograminglanguage,sayCoryouareequippedwith
Matlab, youshould
beabletoeasilygraspthedifferencesthat
Thetopicsintroducedherearea l Python usagemightbringalong.
importantandfundamentalforthechapters
tocome.
BasicDataTypes
Python is adynamicallytypedlanguage,whichmeansthatthe Python
interpreterinfersthetypeofanobjectatruntime.Incomparison,compiled
languageslike C aregenerallystaticallytyped.Inthesecases,thetypeofan
objecthastobeattachedtotheobjectbeforecompilet ime.[18]
Integers
Oneofthemostfundamentaldatatypes is theinteger,or int:
In [1]: atype(a)
= 10
Out[1]: int
Thebuilt­infunction type providestypeinformationfora
standardandbuilt­intypesaswellas for l objectswith
newlycreatedclassesandobjects.In
thel a
Python t e r case,theinformationprovideddependsonthedescriptionthe
is an object.”Thismeans,forexample,thatevensimpleobjectslike
theintobjectwejustdefinedhavebuilt­inmethods.Forexample,youcan
programmerhasstoredwiththeclass.Therei s asayingthat“everythingin
getthenumberofbitsneededtorepresentthe
callingthemethod bit_length: int objectin­memoryby
In [2]: a.bit_length()
Out[2]: 4
Youwillseethatthenumberofbitsneededincreasesthehighertheinteger
valueis thatweassigntotheobject:
In [3]: aa.bit_length()
= 100000
Out[3]: 17
IPython,
methods ofprovidetabcompletioncapabilitiesthatshowa
Ingeneral,there
allclasses and objects.Advanced Python environments,like
are somanydifferentmethodsthatiti s hardto
l methodsattached
memorizeall
to an object.Yousimplytypetheobjectnamefollowedbyadot(
thenpresstheTabkey,e e.g., a.) and
.g., a.tab. Thisthenprovidesacollectionofmethods
youcanc allontheobject.Alternatively,the
Aspecialtyof Python built­infunction dir
Python isthatintegerscanbearbitrarilylarge.
givesacompletelistofattributesandmethodsofanyobject.
example,thegoogolnumber10100.longPython
numbers,whicharetechnically Consider,for
objects:hasnoproblemwithsuchlarge
In [4]: googol
googol = 10 ** 100
Out[4]:
100000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000L
In [5]: googol.bit_length()
Out[5]: 333
LARGE INTEGERS
Pythonintegers canbearbitrarilylarge.Theinterpretersimplyusesas
manybits/bytesasneededtorepresentthenumbers.
Itisimportanttonotethatmathematicaloperationson int objectsreturn int
mathematicalroutines.Thefollowingexpressionyieldstheexpectedresult:
objects.Thiscansometimesleadtoconfusionand/orhard­to­detecterrorsin
In [6]: 1 + 4
Out[6]: 5
However,thenextcasemayreturnasomewhatsurprisingresult:
In [7]: 1 / 4
Out[7]: 0
In [8]: type(1 / 4)
Out[8]: int
Floats
Forthel
operateona s t expressiontoreturnthegenerallydesiredresultof0.25,wemust
float objects,whichbringsusnaturallytothenextbasicdata
type.Addingadottoanintegervalue,likein
interprettheobjectasa 1.or1.0, causes Python to
float objectingeneral:[19]float. Expressionsinvolvinga float alsoreturna
In [9]: 1. / 4
Out[9]: 0.25
In [10]: type (1. / 4)
Out[10]: float
Afloatiorsrealnumbersi
rational abitmoreinvolvedinthatthecomputerizedrepresentationof
s ingeneralnotexactanddependsonthespecific
technicalapproachtaken.Toi l u s t r a t e whatthisimplies,let us defineanother
float object:
In [11]: btype(b)
= 0.35
Out[11]: float
float objectslikethisonearealwaysrepresentedinternallyuptoacertain
degreeof accuracy only.Thisbecomesevidentwhenadding0.1to b:
In [12]: b + 0.1
Out[12]: 0.44999999999999996
Thereasonforthisisthat floatsare internallyrepresentedinbinaryformat;
thatis,adecimalnumber0<n<1isrepresentedbyaseriesoftheform
mightinvolvealargenumberofelementsormightevenbeaninfiniteseries.
.Forcertainfloating­pointnumbersthebinaryrepresentation
afixednumberoftermsintherepresentationseries—inaccuraciesarethe
However,givenafixednumberofbitsusedtorepresentsuchanumber—i.e.,
consequence.Othernumberscanberepresentedperfectlyandaretherefore
storedexactlyevenwithaf i n i t e numberofbitsavailable.Considerthe
followingexample:
In [13]: c0.5c.as_integer_ratio()
=
Out[13]: (1, 2)
Onehalf,i . e
representation . , 0.5,isstoredexactlybecause
as .However,for b = i t
0.35 hasan
we exact(finite) binary
getsomethingdifferent
thantheexpectedrationalnumber :
In [14]: b.as_integer_ratio()
Out[14]: (3152519739159347, 9007199254740992)
Theprecisioni s dependentonthenumberofbitsusedtorepresentthe
double­precisionstandard(i.e.,64bits), for internalrepresentation.[20]This
number.Ingeneral,allplatformsthatPythonrunsonusetheIEEE754
Sincethis topicisofhighimportancefor severalapplicationareasinfinance,
translatesintoa15­digitrelativeaccuracy.
irepresentationofnumbers.Forexample,theissuecanbeofimportancewhen
t is sometimesnecessarytoensuretheexact,oratleastbestpossible,
summingoveralargesetofnumbers.Insuchasituation,acertainkindand/or
magnitudeofrepresentationerrormight,inaggregate,leadtosignificant
Themodule decimal providesanarbitrary­precisionobjectforfloating­point
numbersand
deviationsfromabenchmarkvalue.
suchnumbers:severaloptionstoaddressprecisionissueswhenworkingwith
In [15]: importdecimal
from decimal import Decimal
In [16]: decimal.getcontext()
Out[16]: Context(prec=28, rounding=ROUND_HALF_EVEN, Emin=-999999999,
Emax=999999
DivisionB999, capitals=1, flags=[], traps=[Overflow, InvalidOperation,
yZero])
In [17]: dd =Decimal(1) / Decimal (11)
Out[17]: Decimal('0.09090909090909090909090909091')
Youcanchangetheprecisionoftherepresentationbychangingtherespective
attributevalueoftheContextobject:
In [18]: decimal.getcontext().prec = 4 # lower precision than default
In [19]: ee = Decimal(1) / Decimal (11)
Out[19]: Decimal('0.09091')
In [20]: decimal.getcontext().prec = 50 # higher precision than default
In [21]: ff = Decimal(1) / Decimal (11)
Out[21]:
Decimal('0.090909090909090909090909090909090909090909090909091')
Ihandf needed,theprecision
and canin thiswaybeadjustedtotheexactproblem at
onecanoperatewithfloating­pointobjectsthatexhibitdifferent
degreesofaccuracy:
In [22]: gg = d + e + f
Out[22]: Decimal('0.27272818181818181818181818181909090909090909090909')
ARBITRARY­PRECISION FLOATS
Themodule decimal provides an arbitrary­precisionfloating­point
numberobject.Infinance,i t mightsometimesbenecessarytoensure
highprecisionandtogobeyondthe64­bitdouble­precisionstandard.
Strings
Nowthatwecanrepresentnaturalandfloating­pointnumbers,weturntot
Python is ext.
Thebasicdatatypetorepresenttextin Pythonisthestring.Thestring
objecthasanumberofreallyhelpfulbuilt­inmethods.Infact,
generallyconsidered tobea
any goodchoicewhenitcomestoworkingwithtext
filesofanykindand size.A string objecti s generallydefinedbysingle
ordoublequotationmarksorbyconvertinganotherobjectusingthe str
function(i.e., usingtheobject’sstandardoruser­defined string
representation):
In [23]: t = 'this is a string object'
Withregardtothebuilt­inmethods,youcan,forexample,capitalizethef
wordinthisobject: irst
In [24]: t.capitalize()
Out[24]: 'This is a string object'
Oryoucans p l i t i t intoi t s single­wordcomponentstogeta
thewords(moreonlistobjectslater): list objectofa l
In [25]: t.split()
Out[25]: ['this', 'is', 'a', 'string', 'object']
Youfirst lcanet eralsosearchforaword andget theposition(i.e., indexvalue)ofthe
oftheword back inasuccessfulcase:
In [26]: t.find('string')
Out[26]: 10
If thewordis notinthe string object,themethodreturns­1:
In [27]: t.find('Python')
Out[27]: -1
Replacingcharactersinastringi
withthe replace method: s atypicaltaskthatis easilyaccomplished
In [28]: t.replace(' ', '|')
Out[28]: 'this|is|a|string|object'
Thestrippingofstrings—i.e.,deletionofcertainleading/laggingcharacters—
is alsooftennecessary:
In [29]: 'http://www.python.org'.strip('htp:/')
Out[29]: 'www.python.org'
Table4­1lists anumberofhelpfulmethodsofthe string object.
Table4­1.Selectedstringmethods
Method Arguments Returns/result
capitalize () Copyofthestringwithfirst let er capitalized
count (sub[,start[,
end]]) Countofthenumberofoccurrencesofsubstring
errors]]) Decodedversionofthestring,using
decode ([encoding[, UTF­8) encoding (e.g.,
errors]]) Encodedversionofthestring
encode ([encoding[,
find (sub[,start[,
end]]) (Lowest)indexwheresubstringisfound
join (seq) Concatenation of stringsin sequenceseq
replace (old,new[, count])Replaces old by new thefirst count times
split ([sep[, List of wordsinstringwith sep as separator
maxsplit]])
splitlines
strip ([keepends])
(chars) Separatedlineswithlineends/breaksi f keepends ischars
Copyofstringwithleading/laggingcharactersin
True
upper () removed l let ers capitalized
Copywitha
Apowerfultoolwhenworkingwith string objectsi
Python providessuchfunctionalityinthemodule re: s regularexpressions.
In [30]: import re
Supposeyouarefacedwithalargetextf i l e , suchasacomma­separatedvalue
(CSV)file,whichcontainscertaintimeseriesandrespectivedate­time
information.Moreoftenthannot,thedate­timeinformationisdeliveredina
can generallybedescribed
formatthat Pythoncannot
information by aregularexpression.Considerthe
interpretdirectly.However,thedate­time
following string object,containingthreedate­timeelements,threeintegers,
andthreestrings.Notethat t r i p l e quotationmarksallowthedefinitionof
stringsovermultiplerows:
In [31]: series= """13:00:00', 100, '1st';
'01/18/2014
'01/18/2014
'01/18/2014 13:30:00',
14:00:00', 110,
120, '2nd';
'3rd'
"""
Thefollowingregularexpressiondescribesthe
informationprovidedinthe string object:[21] formatof thedate­time
In [32]: dt = re.compile("'[0-9/:\s]+'") # datetime
Equippedwiththisregularexpression,wecangoonandfinda
elements.Ingeneral,applyingregularexpressionsto l thedate­time
string objectsalso
leadstoperformanceimprovementsfortypicalparsingtasks:
In [33]: result
result = dt.findall(series)
Out[33]:
14:00:0 ["'01/18/2014 13:00:00'", "'01/18/2014 13:30:00'", "'01/18/2014
0'"]
Whenparsing stringREGULAR EXPRESSIONS which
objects,considerusingregularexpressions,
canbringbothconvenienceandperformancetosuchoperations.
Theresulting
datetime string
objects( c f . objectscanthenbeparsedtogenerate Python
AppendixCforanoverviewofhandlingdateandtime
datawith Python). Toparsethe string objectscontainingthedate­time
information,weneedtoprovideinformationofhowtoparse—againasa
string object:
In [34]: from
pydt datetime import datetime
=datetime.strptime(result[0].replace("'", ""),
pydt '%m/%d/%Y %H:%M:%S')
Out[34]: datetime.datetime(2014, 1, 18, 13, 0)
In [35]: print pydt
Out[35]: 2014-01-18 13:00:00
In [36]: print type(pydt)
Out[36]: <type 'datetime.datetime'>
Laterchaptersprovidemoreinformationondate­timedata,thehandlingof
suchdata,and datetime objectsandtheirmethods.This is just meanttobea
teaserforthisimportanttopicinfinance.
BasicDataStructures
Asageneralrule,datastructuresareobjectsthatcontainapossiblylarge
numberofotherobjects.Amongthosethat
structuresare: Python providesasbuilt­in
tupleAcollectionofarbitraryobjects;onlyafewmethodsavailable
listA collectionofarbitraryobjects;manymethodsavailable
dictAkey­valuestoreobject
setAnunorderedcollectionobjectforotheruniqueobjects
Tuples
iAtstuple is anadvanceddatastructure,yetit’s stil quitesimpleandlimited in
applications.Itisdefinedbyprovidingobjectsinparentheses:
In [37]: ttype(t)
= (1, 2.5, 'data')
Out[37]: tuple
Youcanevendroptheparenthesesandprovidemultipleobjectsseparatedby
commas:
In [38]: ttype(t)
= 1, 2.5, 'data'
Out[38]: tuple
thehelp of whichyoucanretrievesingleormultipleelementsofthe
Likealmostalldatastructuresin tuple. It
Python the tuplehas abuilt­inindex,with
thethirdelementofa tuple isPython
is importanttorememberthat useszero­basednumbering,suchthat
atindexposition2:
In [39]: t[2]
Out[39]: 'data'
In [40]: type(t[2])
Out[40]: str
ZERO­BASED NUMBERING
Incontrasttosomeotherprogramminglanguageslike Matlab,Python
useszero­basednumberingschemes.Forexample,thef i r s t elementofa
tupleobject hasindexvalue0.
Thereareonlytwospecialmethodsthatthisobjecttypeprovides:
index. count and
Thefirst countsthenumberofoccurrencesofacertainobjectandthe
secondgivestheindexvalueofthef irst appearanceofit:
In [41]: t.count('data')
Out[41]: 1
In [42]: t.index(1)
Out[42]: 0
tuple objectsarenotveryflexiblesince,oncedefined,theycannotbe
changedeasily.
Lists
Objects of type list aremuchmoreflexible and powerfulincomparisonto
tuple objects.Fromafinancepointofview,youcanachieveal o t working
onlywithlistlistobjectisdefinedthroughbracketsandthebasiccapabilitiesand
behavioraresimilartothoseof
data.A tuple objects:
objects,suchasstoringstockpricequotesandappendingnew
In[43]: ll[2]= [1, 2.5, 'data']
Out[43]: 'data'
list objectscanalsobedefinedorconvertedbyusingthefunction
followingcodegeneratesanew list objectbyconvertingthe list.objectThe
tuple
fromthepreviousexample:
In [44]: ll = list(t)
Out[44]: [1, 2.5, 'data']
In [45]: type(l)
Out[45]: list
Inadditiontothecharacteristicsof tuple objects, list objectsarealso
expandableandreducibleviadifferentmethods.Inotherwords,whereas
string andtupleobjectsareimmutablesequenceobjects(withindexes)that
cannotbechangedoncecreated, listobjects aremutableand canbe changed
viadifferentoperations.Youcanappend list objectstoanexisting list
object,andmore:
In [46]: ll.append([4, 3]) # append list at the end
Out[46]: [1, 2.5, 'data', [4, 3]]
In [47]: l.extend([1.0,
l 1.5, 2.0]) # append elementsof list
Out[47]: [1,2.5, 'data', [4, 3], 1.0, 1.5, 2.0]
In [48]: ll.insert(1, 'insert') # insert object before index position
Out[48]: [1, 'insert', 2.5, 'data', [4, 3], 1.0, 1.5, 2.0]
In [49]: l.remove('data')
l # remove first occurrence of object
Out[49]: [1, 'insert', 2.5, [4, 3], 1.0, 1.5, 2.0]
l, p # removes and returns object at index
In [50]: pprint= l.pop(3)
Out[50]: [1, 'insert', 2.5, 1.0, 1.5, 2.0] [4, 3]
Slicingi s alsoeasilyaccomplished.Here,slicingreferstoanoperationthat
breaksdownadatasetintosmallerparts(ofi nterest):
In [51]: l[2:5] # 3rd to 5th elements
Out[51]: [2.5, 1.0, 1.5]
Table4­2providesasummaryofselectedoperationsandmethodsofthe list
object.
Table4­2.Selectedoperationsandmethodsoflist objects
Method Arguments Returns/result
append= x =s [i:j:k]
l[i:j:k]
l[i] [i](x) Replaces
Replacesevery
Appends xithtoobject
elementby x itoj ­1 bys
kth elementfrom

delcountl[i:j:k] [i:j:k]
(x) Deleteselementswithindexvaluesx i to j –1
Numberofoccurrencesofobject
extend (s) Appendsal elementsof s toobject
index (x[,i[,j]]) Firstindexof x betweenelements i and j –1
insert (i,x)++ Inserts x at/beforeindex i
remove (i) Removeselementwithindex i
pop (i) Removeselementwithindex i andreturnit
sort
reverse ([cmp[,key[,reverse]]])
() Sortsallitemsinplace
Reversesallitemsinplace
Excursion:ControlStructures
Althoughatopicini tself, controlstructureslikeforloopsaremaybebest
introducedin Python basedonlistobjects.Thisisduetothefactthat
loopingingeneraltakesplaceover listobjects, whichisquitedifferentto
forloop loopsovertheelementsofthe listobjectl withindexvalues2
Thewhatisoftenthestandardinotherlanguages.Takethefollowingexample.
to4andprintsthesquareoftherespectiveelements.Notetheimportanceof
theindentation(whitespace)inthesecondline:
In [52]: for element in l[2:5]:
print element ** 2
Out[52]: 6.25
1.02.25
Thisprovidesareallyhighdegreeoff l e x i b i l i t y incomparisontothetypical
counter­basedlooping.Counter­basedloopingi object range: Python,
but is accomplishedbasedonthe(standard) lists alsoanoptionwith
In [53]: rr = range(0, 8, 1) # start, end, step width
Out[53]: [0, 1, 2, 3, 4, 5, 6, 7]
In [54]: type(r)
Out[54]: list
Forcomparison,thesameloopis implementedusing range asfollows:
In [55]: for iprintin range(2,
l[i] ** 5):2
Out[55]: 6.25
1.02.25
can LOOPING
Python you loopoverarbitrary
Incontentoftheobjecti
OVER LISTS
list objects,nomatterwhatthe
s. Thisoftenavoidstheintroductionofacounter.
Python alsoprovidesthetypical(conditional)controlelementsif,
in
else.Their useiscomparable otherlanguages: elif,and
In [56]: for iifini %range(1,2 == 0:10):# % is for modulo
print
i% 3"%d"%d== is0:is multiple
elifprint even" % iof 3" % i
else:print "%d is odd" % i
Out[56]: 12 isis oddeven
3564 isis multiple
oddeven of 3
78 isisiseven
even
odd
9 is multiple of 3
Similarly, while providesanothermeanstocontroltheflow:
In [57]: while
total=total0 < 100:
printtotaltotal+= 1
Out[57]: 100
Aspecialty of Pythonis so­called list comprehensions.Insteadoflooping
overexisting list objects,thisapproachgenerates list objectsvialoopsina
rathercompactfashion:
In [58]: mm = [i ** 2 for i in range(5)]
Out[58]: [0, 1, 4, 9, 16]
Inacertainsense,thisalreadyprovidesaf
like”vectorizedcodeinthatloops are i r s t meansto
more generate“something
(vectorizationofcodei s discussed inmoreratherdetaillaimplicitthanexplicit
ter inthischapter).
Excursion:FunctionalProgramming
Python providesanumberoftoolsforfunctionalprogrammingsupportas
list objects).Amongthesetoolsare filter,map, and reduce. However,we
well—i.e.,theapplicationofafunctiontoawholesetofinputs(inourcase
afunctionfthatreturnsthesquareoftheinput x:
needafunctiondefinitionfirst.Tostart withsomethingreallysimple,consider
In [59]: def f(x):
f(2)return x ** 2
Out[59]: 4
objectsandeven
Ofcourse,functionscanbearbitrarilycomplex,withmultipleinput/parameter
multipleoutputs,(returnobjects).However,considerthe
followingfunction:
In [60]: def even(x):
return x % 2 == 0
even(3)
Out[60]: False
Thereturnobjectimap:s aBoolean.Suchafunctioncanbeappliedtoawhole list
objectbyusing
In[61]: map(even, range(10))
Out[61]:
False] [True, False, True, False, True, False, True, False, True,
Totto map,his byusing
end,wecanalsoprovideafunctiondefinitiondirectlyas an argument
lambda oranonymousfunctions:
In [62]: map(lambda x: x ** 2, range(10))
Out[62]: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
Functionscanalsobeusedtof
thefilterreturnselements of i l t e r a list object.Inthefollowingexample,
as definedbytheevenfunction:alistobjectthatmatchtheBooleancondition
In [63]: filter(even, range(15))
Out[63]: [0, 2, 4, 6, 8, 10, 12, 14]
Finally, reduce helpswhenwewanttoapplyafunctiontoa
sumofallelementsina l elementsofa
list objectthatreturnsasinglevalueonly.Anexampleisthecumulative
list object(assumingthatsummationis definedfor
theobjectscontainedinthelist):
In [64]: reduce(lambda x, y: x + y, range(10))
Out[64]: 45
Analternative,nonfunctionalimplementationcouldlooklikethefollowing:
In[65]: defcumsum(l):
fortotalelem
= 0in l:
totaltotal+= elem
return
cumsum(range(10))
Out[65]: 45
LIST COMPREHENSIONS, FUNCTIONAL
PROGRAMMING,ANONYMOUS FUNCTIONS
It canbeconsideredgoodpracticetoavoidloopsonthePythonlevelas
farlikeasmap,filter, and reduceprovide meanstowritecodewithoutloops
possible.listcomprehensionsandfunctionalprogrammingtools
thati s bothcompactandingeneralmorereadable. lambda oranonymous
functionsarealsopowerfultoolsinthiscontext.
Dicts
dict objectsaredictionaries,andalsomutablesequences,thatallowdata
retrievalbykeysthatcan,forexample,be string objects.Theyareso­called
dict objects
aretokey­valuestores.Whilelistobjectsareorderedandsortable,
unordered and unsortable. An examplebesti l u s t r
list objects.Curlybracketsarewhatdefine dict objects:a t e s furtherdifferences
In [66]: d = {'Name' : 'Angela Merkel',
'Country'
'Profession': 'Germany',
: 'Chancelor',
'Age' : 60
type(d)}
Out[66]: dict
InOut[67]:
[67]: print
Angelad['Name'],
Merkel 60 d['Age']
Again,this classofobjectshasanumberofbuilt­inmethods:
In [68]: d.keys()
Out[68]: ['Country', 'Age', 'Profession', 'Name']
In [69]: d.values()
Out[69]: ['Germany', 60, 'Chancelor', 'Angela Merkel']
In [70]: d.items()
Out[70]: [('Country',
('Age', 60), 'Germany'),
('Profession',
('Name', 'Angela'Chancelor'),
Merkel')]
In [71]: ifbirthday
birthday= Trueis True:
printd['Age']
d['Age']+= 1
Out[71]: 61
Thereareseveralmethodstoget iterator objectsfromthe
objectsbehavelike list objectswheniteratedover: dict object.The
In[72]: foritem
printinitemd.iteritems():
Out[72]: ('Country',
('Age', 61) 'Germany')
('Profession',
('Name','Angela'Chancelor')
Merkel')
In [73]: for value
print intype(value)
d.itervalues():
Out[73]: <type
<type 'str'>
'int'>
<type
<type 'str'>
'str'>
Table4­3providesasummaryofselectedoperationsandmethodsofthe dict
object.
Table4­3.Selectedoperationsandmethodsofdictobjects
Method Arguments Returns/result
d[k] [k] Itemof d withkey k
d[k] = x [k] Setsitemkey k to x
del d[k] [k] Deletesitemwithkey k
clear () Removesal items
copy () Makesacopy
has_key (k) True if k is akey
items () Copyofal key­valuepairs
iteritems () Iteratoroveral items
iterkeys () Iterator over al keys
itervalues () Iterator over al values
keys () Copyofal keys
poptiem (k) Returnsandremovesitemwithkey k
update ([e]) Updatesitemswithitemsfrom e
values () Copyofal values
Sets ast datastructurewewillconsideris the set object.Althoughsettheory
iThel
smanypracticalapplicationsfor
acornerstoneofmathematicsandalsofinancetheory,therearenottoo
set objects.Theobjectsareunordered
collectionsofotherobjects,containingeveryelementonlyonce:
In [74]: ss = set(['u', 'd', 'ud', 'du', 'd', 'du'])
Out[74]: {'d', 'du', 'u', 'ud'}
In [75]: t = set(['d', 'dd', 'uu', 'u'])
Withsetobjects,you can implementoperationsasyouareusedtoin
mathematicalsettheory.Forexample,youcangenerateunions,intersections,
anddifferences:
In [76]: s.union(t) # all of s and t
Out[76]: {'d', 'dd', 'du', 'u', 'ud', 'uu'}
In [77]: s.intersection(t) # both in s and t
Out[77]: {'d', 'u'}
In [78]: s.difference(t) # in s but not t
Out[78]: {'du', 'ud'}
In [79]: t.difference(s) # in t but not s
Out[79]: {'dd', 'uu'}
In [80]: s.symmetric_difference(t) # in either one but not both
Out[80]: {'dd', 'du', 'ud', 'uu'}
Oneapplicationof set objectsis togetridofduplicatesina list object.For
example:
In [81]: from #random import10)integers
1,000random
l = [randint(0, randint
foriinbetween 0 and 10
range(1000)]
len(l) # number of elements in l
Out[81]: 1000
In [82]: l[:20]
Out[82]: [8, 3, 4, 9, 1, 7, 5, 5, 6, 7, 4, 4, 7, 1, 8, 5, 0, 7, 1, 9]
In [83]: ss =set(l)
Out[83]: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
NumPyDataStructures
Theprevioussectionshowsthat Python providessomequiteusefuland
flexiblegeneraldatastructures.Inparticular, list objectscanbeconsidereda
However,scientificandfinancialapplicationsgenerallyhaveaneedforhigh­
realworkhorsewithmanyconvenientcharacteristicsandapplicationareas.
performingoperationsonspecialdatastructures.Oneofthemostimportant
datastructuresinthisregardisthearray.Arraysgenerallystructureother
(fundamental)objects in rowsandcolumns.numbersonly,althoughthe
Assumeforthemomentthatweworkwith
conceptgeneralizes to othertypes of dataaswell.Inthesimplestcase,aone­
dimensionalarraythenrepresents,mathematicallyspeaking,avectorof,in
general,realnumbers,internallyrepresentedby float objects.Itthenconsists
ofasingleroworcolumnofelementsonly.Inamorecommoncase,anarray
representsani×jmatrixofelements.Thisconceptgeneralizestoi×j×k
cubesofelementsinthreedimensionsaswellastogeneraln­dimensional
arraysofshapei×j×k ×l× .
thatsuchmathematicalstructuresareofhighimportanceinanumberof
Mathematicaldisciplineslikelinearalgebraandvectorspacetheoryillustrate
disciplinesandfields. Itcanthereforeprovefruitfultohaveavailablea
convenientlyandefficiently.Thisi s wherethe Python library NumPyarrayscomes
specializedclassofdatastructuresexplicitlydesignedtohandle
intoplay,withits ndarray class.
ArrayswithPythonLists
structurespresentedintheprevioussection. list objectsareparticularly
BeforeweturntoNumPy,let usfirst constructarrayswiththebuilt­indata
suitedtoaccomplishingthistask.Asimple list can alreadybeconsidereda
one­dimensionalarray:
In [84]: v = [0.5, 0.75, 1.0, 1.5, 2.0] # vector of numbers
Since list objectscancontainarbitraryotherobjects,theycanalsocontain
other list objects.Inthatway,two­andhigher­dimensionalarraysareeasily
constructedbynested list objects:
In [85]: mm = [v, v, v] # matrix of numbers
Out[85]: [[0.5,
[0.5, 0.75,
0.75, 1.0,
1.0, 1.5,
1.5, 2.0],
2.0],
[0.5, 0.75, 1.0, 1.5, 2.0]]
Wecanalsoeasilyselectrowsviasimpleindexingorsingleelementsvia
doubleindexing(wholecolumns,however,arenotsoeasytoselect):
In [86]: m[1]
Out[86]: [0.5, 0.75, 1.0, 1.5, 2.0]
In [87]: m[1][0]
Out[87]: 0.5
Nestingcanbepushedfurtherforevenmoregeneralstructures:
In [88]: v1v2 ==[0.5,
[1, 2]1.5]
mc =[v1,
=c [m, m]v2]# cube of numbers
Out[88]: [[[0.5, 1.5], [1, 2]], [[0.5, 1.5], [1, 2]]]
In [89]: c[1][1][0]
Out[89]: 1
NotethatcombiningobjectsinthewayjustpresentedgenerallyworkswithLet
referencepointerstotheoriginalobjects.Whatdoesthatmeaninpractice?
ushavealookatthefollowingoperations:
In [90]: vm == [0.5,
[v, v,0.75,
v] 1.0, 1.5, 2.0]
m
Out[90]: [[0.5, 0.75, 1.0, 1.5, 2.0],
[0.5,
[0.5, 0.75,
0.75, 1.0,
1.0, 1.5,
1.5, 2.0],
2.0]]
Nowchangethevalueofthef
happenstothe m object: i r s t elementofthe v objectandseewhat
In [91]: v[0]
m = 'Python'
Out[91]: [['Python',
['Python', 0.75, 1.0, 1.5, 2.0],
['Python', 0.75, 1.0, 1.5, 2.0],
0.75, 1.0, 1.5, 2.0]]
Thiscanbeavoidedbyusingthe deepcopy functionofthe copy module:
In [92]: from copy import
vm ==[0.5, 0.75, deepcopy
1.0, 1.5,] 2.0]
m 3 *[deepcopy(v),
Out[92]: [[0.5,
[0.5, 0.75, 1.0,
0.75, 1.0, 1.5,
1.5, 2.0],
2.0],
[0.5, 0.75, 1.0, 1.5, 2.0]]
InOut[93]:
[93]: v[0]
m = 'Python'
[[0.5, 0.75, 1.0, 1.5, 2.0],
[0.5,
[0.5, 0.75,
0.75, 1.0,
1.0, 1.5,
1.5, 2.0],
2.0]]
RegularNumPyArrays
Obviously,composing array structureswith list objectsworks,somewhat.
Buti t i s notreallyconvenient,andthe
specificgoalinmind.I t list classhasnotbeenbuiltwitht
hasratherbeenbuiltwithamuchbroaderandmore h i s
generalscope.Fromthispointofview,somekindofspecializedclasscould
therefore be reallybeneficialtohandlearray­typestructures.
Suchaspecializedclassisnumpy.ndarray,whichhasbeenbuiltwiththe
specificgoalofhandlingn­dimensionalarraysbothconvenientlyand
efficiently—i.e.,inahighlyperformingmanner.Thebasic handlingof
instancesofthisclassis againbestillustratedbyexamples:
In [94]: import numpy as np
In [95]: atype(a)
= np.array([0, 0.5, 1.0, 1.5, 2.0])
Out[95]: numpy.ndarray
In [96]: a[:2] # indexing as with list objects in 1 dimension
Out[96]: array([ 0. , 0.5])
methods.Forinstance:numpy.ndarray classis themultitudeofbuilt­in
Amajorfeatureofthe
In [97]: a.sum() # sum of all elements
Out[97]: 5.0
In [98]: a.std() # standard deviation
Out[98]: 0.70710678118654757
In [99]: a.cumsum() # running cumulative sum
Out[99]: array([ 0. , 0.5, 1.5, 3. , 5. ])
Anothermajorfeaturei
ndarray objects: s the(vectorized)mathematicaloperationsdefinedon
In [100]: a * 2
Out[100]: array([ 0., 1., 2., 3., 4.])
In[101]: a ** 2
Out[101]: array([ 0. , 0.25, 1. , 2.25, 4. ])
In[102]: np.sqrt(a)
Out[102]:
1.41421356array([ 0. , 0.70710678, 1. , 1.22474487,
])
Thetransitiontomorethanonedimensionis seamless,andal features
presentedsofarcarryovertothemoregeneralcases.Inparticular,the
indexingsystemis madeconsistentacrossalldimensions:
In [103]: bb = np.array([a, a * 2])
Out[103]: array([[[ 0.0. ,, 1.0.5,, 1.2. ,, 3.1.5,, 4.2. ]])],
In [104]: b[0] # first row
Out[104]: array([ 0. , 0.5, 1. , 1.5, 2. ])
In [105]: b[0, 2] # third element of first row
Out[105]: 1.0
In [106]: b.sum()
Out[106]: 15.0
Incontrasttoourlistobject­basedapproachtoconstructingarrays,the
numpy.ndarray classknowsaxesexplicitly.Selectingeitherrowsorcolumns
fromamatrixisessentiallythesame:
In[107]: b.sum(axis=0)
# sum along axis 0, i.e. column-wise sum
Out[107]: array([ 0. , 1.5, 3. , 4.5, 6. ])
In [108]: b.sum(axis=1)
#sumalong axis 1, i.e. row-wise sum
Out[108]: array([ 5., 10.])
Thereareanumberofwaystoi
as n i t i a l i z e (instantiate)a numpy.ndarray
Oneis presentedbefore,via np.array. However,thisassumesthatall
elementsofthearrayarealreadyavailable.Incontrast,onewouldmaybelike object.
In[109]:numpy.ndarray
thefollowingfunctions:
tohavethe
withresultsgeneratedduringtheexecutionofcode.Tothisend,wecanuse
c= objectsinstantiatedfirsttopopulatethemlater
np.zeros((2, 3, 4), dtype='i', order='C') # also:
np.ones() c
Out[109]: array([[[0,[0, 0,0, 0,0, 0],0],
[0, 0, 0, 0]],
[[0,[0, 0,0, 0,0, 0],0],
[0, 0, 0, 0]]], dtype=int32)
In [110]: d = np.ones_like(c, dtype='f16', order='C') # also:
np.zeros_like()
d
Out[110]: array([[[[1.0,
1.0, 1.0, 1.0, 1.0],
[1.0, 1.0, 1.0, 1.0],
1.0, 1.0, 1.0]],
[[[ 1.0,
1.0, 1.0, 1.0, 1.0],
[ 1.0, 1.0, 1.0, 1.0],
1.0, 1.0, 1.0]]], dtype=float128)
Withal thesefunctionsweprovidethefollowinginformation:
shapeEitheran int, asequenceof ints, orareferencetoanother
numpy.ndarray
dtypeA numpy.dtype—these
(optional) are NumPy­specific datatypesfor numpy.ndarray
objects
orderTheorderinwhichtostoreelementsinmemory:
(optional) C for C­like ( i . e . , row­
wise)orFfor Fortran­like (i.e., column­wise)
Here,itbecomesobvioushow NumPyspecializes
withthenumpy.ndarrayclass,incomparisontothe theconstructionofarrays
list­based approach:
Theshape/length/sizeofthearrayi
dimension. s homogenousacrossanygiven
It onlyallowsforasingledatatype(numpy.dtype)forthewholearray.
Theroleofthe order parameteri
providesanoverviewof numpy.dtype objects(atie.re.in, thebasicdatatypes
s discussedl thechapter.Table4­4NumPy
allows).
Table4­4.NumPydtypeobjects
dtype Description Example
t Bitfield t4 (4bits)
b Boolean b (true or false)
i Integer i8 (64bit)
u Unsignedinteger u8 (64bit)
f Floatingpoint f8 (64bit)
c Complexfloatingpoint c16 (128bit)
O Object 0 (pointertoobject)
S,a String S24 (24characters)
U Unicode U24 (24Unicodecharacters)
V Other V12 (12­bytedatablock)
NumPy providesageneralizationofregulararraysthatloosensatleastthe
dtype restriction,butletusstickwithregulararraysforamomentandsee
whatthespecializationbrings in termsofperformance.
Asasimpleexercise,supposewewanttogenerateamatrix/arrayofshape
5,000×5,000elements,populatedwith(pseudo)random,standardnormally
distributednumbers.Wethenwant to calculatethe sum of a l elements.First,
thepure Python approach,wherewemakeheavyuseof
andfunctionalprogrammingmethodsaswellas list comprehensions
lambda functions:
In [111]: import
I = 5000random
In[112]: %time mat = [[random.gauss(0, 1) for j in range(I)]for i in
range(I)] # anested list comprehension
Out[112]: CPUWalltimes:
time: user
36.4 36.5
s s, sys: 408 ms, total: 36.9s
In [113]: %time[reduce(lambda
reduce(lambda x,x, y:y: xx ++ y,y, row) \\
forrow in mat])
Out[113]: Walltime:
CPU times: 4.07
user s4.3 s, sys: 52 ms, total: 4.35 s
678.5908519876674
Letusnowturnto
convenience,the NumPy
NumPy andseehowthesameproblemi
sublibrary random s solvedthere.For
offersamultitude of functionsto
initializea and
numpy.ndarray object populateit atthesametimewith
(pseudo)randomnumbers:
In[114]: %timemat = np.random.standard_normal((I, I))
Out[114]: CPUWalltimes:
time: user
1.87 1.83
s s, sys: 40 ms, total: 1.87 s
In [115]: %time mat.sum()
Out[115]: Wall
CPU times:
time: user
34.6 ms36 ms, sys: 0 ns, total: 36 ms
349.49777911439384
Weobservethefollowing:
SyntaxNumPy versionisevenmorecompactandreadable. Pythoncode, the
Althoughweuseseveralapproachestocompactthepure
Performance
and thecalculationofthesumi
Thegenerationofthe numpy.ndarray
s roughly100timesfasterthanthe
objectis roughly20timesfaster
respectiveoperationsinpure Python.
USING NUMPY ARRAYS
Theuseof NumPy forarray­basedoperationsandalgorithmsgenerally
improvementsoverpure Python code.
resultsincompact,easilyreadablecodeandsignificantperformance
StructuredArrays
Thespecialization of the numpy.ndarray classobviouslybringsanumberof
reallyvaluablebenefitswithi t. However,atoo­narrowspecializationmight
NumPy providesstructuredarraysthat
turnouttobetoolargeaburdentocarryforthemajorityofarray­based
algorithmsandapplications.Therefore,
“per column”mean?Considerthefollowinginitialization ofastructuredarray
allowustohavedifferentNumPydatatypespercolumn,atl east. Whatdoes
object:
In [116]: dt = np.dtype([('Name',
('Height','S10'),
'f'), ('Children/Pets',
('Age','i4'), 'i4', 2)])
s = np.array([('Smith',
('Jones', 45,53, 1.83,
1.72, (0,(2, 1)),
2))], dtype=dt)
s
Out[116]: array([('Smith',
('Jones', 45, 1.7200000286102295,
53, 1.8300000429153442, [2,[0, 2])],
1]),
dtype=[('Name', 'S10'), ('Age', '<i4'), ('Height',
'<f4'), ('Children/Pets', '<i4', (2,))])
Inasense,thisconstructioncomesquiteclosetotheoperationfori
We n
tablesina SQLdatabase. havecolumnnamesandcolumndatatypes,with
maybesomeadditionalinformation( i t i
e.g., maximumnumberofcharactersper a l i z i n g
string object).Thesinglecolumnscannowbeeasilyaccessedbytheir
names:
In [117]: s['Name']
Out[117]: array(['Smith',
dtype='|S10')'Jones'],
In [118]: s['Height'].mean()
Out[118]: 1.7750001
Havingselectedaspecificrowandrecord,respectively,theresultingobjects
mainlybehavelike dict objects,whereonecanretrievevaluesviakeys:
In [119]: s[1]['Age']
Out[119]: 53
Innumpy.ndarray
summary,structuredarrays area generalizationoftheregular
objecttypesinthatthedatatypeonlyhastobethesameper
column, as oneisused toin thecontextoftablesinSQLdatabases.One
advantageofstructuredarraysi s thatasingleelementofacolumncanbe
anothermultidimensionalobjectanddoesnothavetoconformtothebasic
NumPy datatypes.
NumPy STRUCTURED ARRAYS
SQL table­likedatastructuresto Python,
thedescriptionandhandlingofrathercomplexarray­orienteddata
(named)column.Theybring
structureswithavarietyofdifferentdatatypesandevenstructuresper
provides,inadditiontoregulararrays,structuredarraysthatallow
witha l thebenefitsofregular numpy.ndarray objects(syntax,methods,
performance).
VectorizationofCode
Vectorizationofcodeis astrategytogetmorecompactcodethatis possibly
executedf a s t e r . Thefundamentalidea
singleelementsoftheobject. In i s toconduct an operationonor to
Python, thefunctionalprogrammingtools
applyafunctiontoacomplexobject“atonce”andnotbyiteratingoverthe
map,filter, and reduce provide meansfor vectorization.Inasense, NumPy
hasvectorizationbuiltindeepdowninits core.
BasicVectorization
Aswelearnedintheprevioussection,simplemathematicaloperationscanbe
implementedon numpy.ndarray objectsdirectly.Forexample,wecanadd
two NumPy arrayselement­wiseasfollows:
In [120]: rs == np.random.standard_normal((4,
np.random.standard_normal((4, 3))3))
In [121]: r + s
Out[121]: array([[-1.94801686,
[0.33847593, -0.6855251 , 2.28954806],
-1.97109602, 1.30071653],
[-1.12066585, 0.22234207, -2.73940339],
[0.43787363, 0.52938941, -1.38467623]])
NumPy alsosupportswhatiscalledbroadcasting. Thisallowsustocombine
ofthisbefore.Considerthefollowingexample:
objectsofdifferentshapewithinasingleoperation.Wehavealreadymadeuse
In [122]: 2 * r + 3
Out[122]: array([[[4.94758114,
2.54691692, 1.65823523,
0.25648128, 8.14636725],
1.89566919],
[0.41775907,
[0.67600205, 3.41004636, 2.06567484],
0.58038395, 1.07282384]])
Inthiscase,the r objecti
3is s multipliedby2element­wiseandthen3i
of s added
element­wise—the as well,uptoacertainpoint:the r object.It
broadcastedorstretchedtotheshape
workswithdifferentlyshapedarrays
In [123]: sr =+ np.random.standard_normal(3)
s
Out[123]: array([[ 0.23324118, -1.09764268, 1.90412565],
[1.43357329,
[-0.83133775, -1.79851966, -1.22122338],
[-0.70221625, -0.22173711, -1.13622055],
-1.63656832, -1.63264605]])
Thisbroadcaststheone­dimensionalarrayofsize3toashapeof(4,3).The
samedoesnotwork,forexample,withaone­dimensionalarrayofsize4:
In [124]: sr =+ np.random.standard_normal(4)
s
Out[124]: ValueError
(4,) operands could not be broadcast together with shapes (4,3)
However,transposingthe
(4,3) robject makes theoperationworkagain.Inthe
shape intoanobjectofthesametypewithshape(3,ndarray
followingcode,thetransposemethodtransformsthe 4): objectwith
In [125]: r.transpose() + s
Out[125]: array([[-0.63380522,
[-1.07814606, 0.5964174 , 0.9677324,
-1.74913253, 0.88641996, -0.86931849],
0.49770367],
[2.16591995, -0.92953858, 1.71037785, -0.67090759]])
In [126]: np.shape(r.T)
Out[126]: (3, 4)
Asageneralr u
numpy.ndarraysl e , custom­defined Python functionsworkwith
aswell.Iftheimplementationallows,arrayscanbeused
withfunctionsjustasintor float objectscan.Considerthefollowing
function:
In [127]: def f(x):
return 3 * x + 5
Wecanpassstandard Python objectsaswellas numpy.ndarray objects(for
whichtheoperationsinthefunctionhavetobedefined,ofcourse):
In [128]: f(0.5) # float object
Out[128]: 6.5
In [129]: f(r) #NumPy array
Out[129]: array([[[ 4.32037538,
7.9213717 , 2.98735285, 12.71955087],
0.88472192, 3.34350378],
[[ 1.1266386
1.51400308, 5.61506954, 3.59851226],
, 1.37057593, 2.10923576]])
NumPy doesis tosimplyapplythefunction
InWhatthatsense,byusingthi s f totheobjectelement­wise.
kindofoperationwedonotavoidloops;weonly
avoidthemonthe Python levelanddelegatetheloopingto NumPy.
NumPy level,loopingoverthe numpy.ndarray objectis takencareofby Onthe
highlyoptimizedcode,mostofitwritteninCandthereforegenerallymuch
fasterthanpure Python. Thisexplainsthe“secret”behindtheperformance
benefitsofusingNumPyforarray­basedusecases.
Whenworkingwitharrays,onehastotakecareto
Python doesnotworkwith NumPy
therespectiveobjects.Forexample,the
moduleof calltherightfunctionson
sin functionfromthestandardmath
arrays:
In [130]: import math
math.sin(r)
Out[130]: TypeError
only length-1 arrays can be converted to Python scalars
Thefunctioni s designedtohandle,forexample,floatobjects—i.e.,single
numbers,notarrays. NumPy providestherespectivecounterpartsasso­called
ufuncs,oruniversalfunctions:
In [131]: np.sin(r) # array as input
Out[131]: array([[-0.22460878,
0.82702259, -0.62167738,
[[-0.96114497, -0.98025745, 0.53829193],
-0.52453206],
[-0.91759955, -0.93554821,
0.20358986, -0.45035471],
-0.82124413]])
In[132]: np.sin(np.pi) # float as input
Out[132]: 1.2246467991473532e-16
NumPy providesalargenumber of suchufuncsthatgeneralizetypical
mathematicalfunctionsto numpy.ndarray objects.[2 ]
UNIVERSAL FUNCTIONS
Becarefulwhenusingthe from library import * approachto
importing.Such
numpy.sin to an approach can causethe NumPy referencetotheufunc
math.sin. Youshould,asarule,importbothlibrariesbynameto
bereplacedbythereferencetothe mathfunction avoid
confusion: import numpy as np; import math. Thenyou can use
math.sin alongside np.sin.
MemoryLayout
Whenwef irst initialized numpy.ndarray objectsbyusing numpy.zero, we
providedanoptionalargumentforthememorylayout.Thisargument
specifies,roughlyspeaking,whichelementsofanarraygetstoredinmemory
measurableimpactontheperformanceofarrayoperations.However,when
nexttoeachother.Whenworkingwithsmallarrays,thishashardlyany
arraysgetlargethestoryissomewhatdifferent,dependingontheoperations
tobeimplementedonthearrays.
l ustrate thisimportantpoint for memory­wisehandlingof arraysin
Toiscienceandfinance,considerthefollowingconstructionofmultidimensional
numpy.ndarray objects:
In[133]: xy == np.random.standard_normal((5,
# linear 10000000))
2 * x +
CF == np.array((x, 3 order='C') = a * x + b
equation
np.array((x, y),y), order='F') y
x =0.0; y = 0.0 # memory cleanup
In [134]: C[:2].round(2)
Out[134]: array([[[-0.51,
[-1.22, -1.14,
0.68, -1.07,
1.83, ..., 1.23,
..., 0.2 , -0.27,
-0.18, -0.16],
0.1 ],
[0.45, 1.4,
[-0.16, 0.15, -0.79,
0.01, ...,
..., -0.75,
-0.33, 0.91,
0.54, -1.12],
1.81],
[[[ 1.98,
1.07, -1.07,0.72, -0.37,
0.86, ..., -0.76,
3.4 , 0.71,
2.64, 0.34]],
3.21],
[[[ 2.67,
3.90.55,, 5.83.29,
4.37, 3.03,
6.66, ...,
..., 1.55.47,, 4.82,
2.47, 0.76],
2.68],
[ 5.14, 0.87,, 2.27, 1.42, ..., 2.34, 4.43,
..., 1.48, 4.09, 3.67]]])
6.63],
Let’slookatsomereallyfundamentalexamplesandusecases
of ndarray objects: forboth types
In [135]: %timeit C.sum()
Out[135]: 10 loops, best of 3: 123 ms per loop
In [136]: %timeit F.sum()
Out[136]: 10 loops, best of 3: 123 ms per loop
l elements of thearrays, thereisno performance
differencebetweenthetwomemorylayouts.However,considerthefollowing
Whensummingupa
examplewiththeC­likememorylayout:
In[137]: %timeit C[0].sum(axis=0)
Out[137]: 10loops, best of 3: 102 ms per loop
In [138]: %timeit C[0].sum(axis=1)
Out[138]: 10 loops, best of 3: 61.9 ms per loop
Summingfivelargevectorsandgettingbackasinglelargeresultsvector
obviouslyi s slowerinthiscasethansumming10,000,000smallonesand
gettingbackanequalnumberofresults.Thisisduetothefactthatthesingle
Withthe each other.
Fortran­like memorylayout,therelativeperformancechanges
elementsofthesmallvectors—i.e.,therows—arestorednextto
considerably:
In [139]: %timeit F.sum(axis=0)
Out[139]: 1 loops, bestof3: 801 ms per loop
In[140]: %timeit F.sum(axis=1)
Out[140]: 1 loops, best of 3: 2.23 s per loop
In [141]: F = 0.0; C = 0.0 # memory cleanup
Inalargenumberofsmallones.Theelementsofthefewlargevectorsarestored
thiscase,operating ona fewlargevectorsperformsbetterthanoperating on
inmemorynexttoeachother,whichexplainstherelativeperformance
advantage.However,overalltheoperationsareabsolutelymuchslowerwhen
comparedtothe C­like variant.
Conclusions canbe consideredthe
Python provides,incombinationwith NumPy, arichsetofflexibledata
structures.Fromafinancepointofview,thefollowing
mostimportantones:
Basicdatatypes
Intypes.finance,theclasses int,float, and string providetheatomicdata
Standarddatastructures
Theclasses tuple,list,dict, and set havemanyapplicationareasin
finance,withlistbeingthemostflexibleworkhorseingeneral.
ArraysAlargeclassoffinance­relatedproblemsandalgorithmscanbecasttoan
array setting; NumPyprovides thespecializedclass numpy.ndarray, which
providesbothconvenienceandcompactnessofcodeaswellashigh
Thischapter
performance.
showsthatboththebasicdatastructuresandthe NumPy
allowforhighlyvectorizedimplementationofalgorithms.Dependingonthe ones
specificshapeofthedatastructures,careshouldbetakenwithregardtothe
memorylayoutofarrays.Choosingtherightapproachherecanspeedupcode
executionbyafactoroftwoormore.
FurtherReading
Thischapterfocusesonthoseissuesthatmight be ofparticularimportance
can for
financealgorithmsandapplications.However,i t
pointfortheexplorationofdatastructuresanddatamodeling onlyrepresentastarting
in Python.
Thereareanumberofvaluableresourcesavailabletogodeeperfromhere.
HerearesomeInternetresourcestoconsult:
ThePythondocumentationis alwaysagoodstartingpoint:
http://www.python.org/doc/.
FordetailsonNumPy arraysas well as relatedmethodsandfunctions,see
http://docs.scipy.org/doc/.
TheSciPylecturenotesarealsoagoodsourcetogetstarted:http://scipy­
lectures.github.io/.
Goodreferencesinbookformare:
Goodrich,Michaeleta l. (2013):DataStructuresandAlgorithmsin
Python.JohnWiley&Sons,Hoboken,NJ.
Langtangen,HansPetter(2009):APrimeronScientificProgramming
withPython.SpringerVerlag,Berlin,Heidelberg.
[18] The Cython librarybringssactta,tCython is ahybridlanguageof PythonPythonandC.thatare
ic typingandcompilingfeaturesto
[19]Here and inthefollowingdiscussion,termslikefloat,floatobject,etc.areused
comparabletothoseinC.Inf
interchangeably,acknowledgingthateveryfloati
forotherobjecttypes. s alsoanobject.Thesameholdstrue
[20] Cf.http://en.wikipedia.org/wiki/Double­precision_floating­point_format.
[21]It is notpossibletogointodetailshere,butthereis awealthofinformation
availableontheInternetaboutregularexpressionsingeneralandforPythonin
particular.Foranintroductiontothistopic,refertoFitzgerald,Michael(2012):
IntroducingRegularExpressions.O’Reilly,Sebastopol,CA.
[22] Cf.http://docs.scipy.org/doc/numpy/reference/ufuncs.htmlforanoverview.
Chapter5.DataVisualization
Useapicture.I t ’ s worthathousandwords.
—ArthurBrisbane(1911)
Thischapterisaboutbasicvisualizationcapabilitiesofthe
lmatplotlib
ibrary. Althoughtherearemanyothervisualizationl i b r a r i matplotlib
e s available,
hasestablishedittisseltightlyintegratedwith
structuresthatitprovides.
standardplotsandflexiblewheni
customizations.Inaddition,i ftasthebenchmarkand,inmany
situations,arobustandreliablevisualizationt NumPyand thedata
comestomorecomplexplotsand
o l. It is botheasytousefor
Thischaptermainlycoversthefollowingtopics:
2Dplotting
Fromthemostsimpletosomemoreadvancedplotswithtwoscalesor
alsocovered.
differentsubplots;typicalfinancialplots,likecandlestickcharts,are
3Dplotting
Aselectionof3Dplotsusefulforfinancialapplicationsarepresented.
Thischaptercannotbecomprehensivewithregardtodatavisualization
and
withPython and ermatplotlib,butit
timeseriesdatawiththe pandas libraprovidesanumberofexamplesforthe
mostbasic atmostimportantcapabilities
alsofoundinl how tovisualizeare
ry. for finance.Otherexamples
chapters.Forinstance,Chapter6shows
Two­DimensionalPlotting
Tobeginwith,wehavetoimporttherespectivelibraries.The mainplotting
functionsarefoundinthesublibrary matplotlib.pyplot:
In [1]:import
import numpyas
matplotlibnpasmpl as plt
matplotlib.pyplot
%matplotlib inline
One­DimensionalDataSet
Inal thatfollows,wewillplotdatastoredin NumPyndarray objects.
However,
Python matplotlib
plot. Toformats,like isofcourseabletoplotdatastoredindifferent
20 standardnormallydistributed
list objects,aswell.F
thisend,wegenerate irst, weneeddatathatwecan
(pseudo)randomnumbers as a NumPyndarray:
In [2]: np.random.seed(1000)
y = np.random.standard_normal(20)
Themostfundamental,butneverthelessquitepowerful,plottingfunctioni
plot fromthepyplotsublibrary.Inprinciple,itneedstwosetsof s
numbers:
xabscissa)
y values:alist oranarraycontainingthe x coordinates(valuesofthe
values:alist oranarraycontainingthe y coordinates(valuesofthe
ordinate)
Thenumberof x and y valuesprovidedmustmatch,ofcourse.Considerthe
followingtwolinesofcode,whoseoutputi s presentedinFigure5­1:
In [3]: xplt.plot(x,
= range(len(y))
y)

Figure5­1.Plotgivenxandyvalues
plot noticeswhenyoupassan ndarray object.Inthis case,therei
needtoprovidethe“extra”informationofthexvalues.I f s no
youonlyprovide
y values, plot takestheindexvaluesastherespective x values.
theTherefore,thefollowingsinglelineofcodegeneratesexactlythesame
output (cf. Figure5­2):
In [4]: plt.plot(y)
Figure5­2.Plotgivendataas1Darray
NUMPY ARRAYS AND MATPLOTLIB
Youcansimplypass NumPyndarray objectsto matplotlib functions.
IHowever,becarefultonotpassatoolargeand/orcomplexarray.
t is abletointerpretthedatastructureforsimplifiedplotting.
Sincethemajorityofthe
can ndarray methodsreturnagainan ndarray object,in
youwiththesampledata,wegetthecumulativesumoft
alsopassyourobjectwithamethod(orevenmultiplemethods,
somecases)attached. By callingthe cumsum method honis dataand, asto be
thendarrayobject
expected,adifferentoutput(cf. Figure5­3):
In [5]: plt.plot(y.cumsum())
Figure5­3.Plotgivena1Darraywithmethodattached
Ingeneral,thedefaultplottingstyledoesnotsatisfytypicalrequirements
forreports,publications,e t c . Forexample,youmightwanttocustomizethe
fontused(e.g.,forcompatibilitywith LaTeX fonts),tohavelabelsatthe
axes,ortoplotagridforbetterreadability.Therefore, matplotlib offersa
example,arethosefunctionsthatmanipulatetheaxesandthosethatadd
accessible;forothersonehastogoabitdeeper.Easilyaccessible,for
largenumberoffunctionstocustomizetheplottingstyle.Someareeasily
gridsandlabels(cf.Figure5­4):
In [6]: plt.plot(y.cumsum())
plt.axis('tight')# adds# adjusts
plt.grid(True) a gridthe axis ranges
Figure5­4.Plotwithgridandtight axes
Otheroptionsfor
tobe plt.axis are given in Table5­1,themajorityofwhich
have passedasastringobject.
Table5­1.Optionsforplt.axis
Parameter Description
Empty Returnscurrentaxislimits
off Turnsaxislinesandlabelsoff
equal Leadstoequalscaling
scaled Equalscalingviadimensionchanges
tight Makesal datavisible(tightenslimits)
image Makesal datavisible(withdatalimits)
[xmin,xmax,ymin,ymax] Setslimitstogiven(listof)values
Inaddition,youcandirectly
examplewhoseoutputi set theminimumandmaximumvaluesof
axisbyusing plt.xlimsandshowninFigure5­5: each
plt.ylim. Thefollowingcodeprovidesan
In [7]: plt.plot(y.cumsum())
plt.xlim(-1, 20)
plt.grid(True)
plt.ylim(np.min(y.cumsum())
np.max(y.cumsum()) -+ 1,1)
Figure5­5.Plotwithcustomaxislimits
Forthesakeofbetterreadability,aplotusuallycontainsanumberoflabels
addedbythefunctions y values.These are
plt.title,plt.xlabel,xandandplt.ylabel,
—e.g.,atitle andlabelsdescribingthenatureof
respectively.Bydefault, plot plotscontinuousl i n e s , eveni
pointsareprovided.Theplottingofdiscretepointsis accomplishedby
choosingadifferentstyleoption.Figure5­6overlays(red)pointsandaf discretedata
(blue)linewithlinewidthof1.5points:
In [8]: plt.figure(figsize=(7,
##size
the figsize 4))defines the
parameter
ofthefigure'ro')
plt.plot(y.cumsum(), 'b',in (width,
lw=1.5) height)
plt.grid(True)
plt.axis('tight')
plt.xlabel('index')
plt.ylabel('value')
plt.title('A Simple Plot')

Figure5­6.Plotwithtypicallabels
Bydefault, plt.plot supportsthecolorabbreviationsinTable5­2.
Table5­2.Standardcolorabbreviations
Color
Character
b Blue
g Green
r Red
c Cyan
m Magenta
y Yellow
k Black
w White
IninTable5­3.
termsoflineand/orpointstyles, plt.plot supportsthecharacterslisted
Table5­3.Standardstyle characters
Character Symbol
- Solidlinestyle
-- Dashedlinestyle
-. Dash­dotlinestyle
.: Dottedlinestyle
Pointmarker
, Pixelmarker
o Circlemarker
v Triangle_downmarker
^ Triangle_upmarker
< Triangle_leftmarker
> Triangle_rightmarker
1 Tri_downmarker
2 Tri_upmarker
3 Tri_leftmarker
4 Tri_rightmarker
s Squaremarker
p Pentagonmarker
* Starmarker
h Hexagon1marker
H Hexagon2marker
+ Plusmarker
x Xmarker
D Diamondmarker
d Thindiamondmarker
| Vlinemarker
can makesurethatdifferentdatasetsareeasilydistinguished.As
can becombinedwithanystylecharacter.Inthis
wewillsee,theplottingstylewillalsobereflectedinthelegend.
way,you
Anycolorabbreviation
Two­DimensionalDataSet
Plottingone­dimensionaldatacanbeconsideredaspecialcase.Ingeneral,
the sameruleswithof additionalissuesmightarisein
datasetswillconsistofmultipleseparatesubsets
suchdatasetsfollows
dimensionaldata.However,anumber of data.Thehandlingof
matplotlibas
suchacontext.Forexample,twodatasetsmighthavesuchadifferent withone­
scalingthattheycannotbeplottedusingthesamey­and/orx­axisscaling.
Anotherissuemightbethatyoumaywanttovisualizetwodifferentdata
setsindifferentways,e
Tobeginwith,l e t usf i r.sgt. , onebyalineplotandtheotherbyabarplot.
generateatwo­dimensionalsampledatas e t . The
codethatfollowsgeneratesfirsta NumPyndarrayof shape20×2with
standardnormallydistributed(pseudo)randomnumbers.Ont his array,the
method cumsumi.e.iscalledtocalculatethecumulativesumofthesampledata
alongaxis0( , thefirst dimension):
In [9]: np.random.seed(2000)
y = np.random.standard_normal((20, 2)).cumsum(axis=0)
Ingeneral,youcanalsopasssuchtwo­dimensionalarraysto plt.plot. It
willthenautomaticallyinterpretthecontaineddataasseparatedatasets
(alongaxis1,i.e., theseconddimension).Arespectiveplotis shownin
Figure5­7:
In [10]: plt.figure(figsize=(7,
plt.plot(y, lw=1.5) 4))
plots two'ro')lines
# plotstwo
plt.plot(y,
plt.grid(True) dotted lines
plt.axis('tight')
plt.xlabel('index')
plt.ylabel('value')
plt.title('A Simple Plot')

Figure5­7.Plotwithtwodatasets
Insuchacase,furtherannotationsmightbehelpfultobetterreadthep
Youcanaddindividuallabelstoeachdatasetandhavetheml i s t e d lot.
inthe
legend. plt.legend acceptsdifferentlocalityparameters. 0 standsforbest
location,inthesensethatasl it le dataaspossiblei
Figure5­8showstheplotofthetwodatas e t s , s hiddenbythelegend.
thistimewithalegend.In
thegeneratingcode,wenowdonotpassthe ndarray objectasawholebut
whichallowsustoattachindividuallabelstothem: 0] and y[:, 0]),
ratheraccessthetwodatasubsetsseparately(y[:,
In [11]: plt.plot(y[:, 0], lw=1.5,4))label='1st')
plt.figure(figsize=(7,
plt.plot(y[:,
plt.plot(y, 1],
'ro') lw=1.5, label='2nd')
plt.grid(True)
plt.legend(loc=0)
plt.axis('tight')
plt.xlabel('index')
plt.ylabel('value')
plt.title('A Simple Plot')
Furtherlocationoptions for plt.legend includethosepresentedin
Table5­4.
Table5­4.Optionsforplt.legend
Loc Description
Empty Automatic
0 Bestpossible
1 Upperright
2 Upperleft
3 Lowerleft
4 Lowerright
5 Right
6 Center left
7 Centerright
8 Lowercenter
9 Uppercenter
10 Center
Figure5­8.Plotwithlabeleddatasets
Multipledatasetswithasimilarscaling,likesimulatedpathsforthesame
financialriskfactor,canbeplottedusingasingley­axis.However,often
datasetsshowratherdifferentscalingsandtheplottingofsuchdatawitha
100andplotthedataagain( cf. Figure5­9):
singleyscalegenerallyleadstoasignificantlossofvisualinformation.To
illustratetheeffect,wescalethefirstofthetwodatasubsetsbyafactorof
In [12]: y[:, 0] = y[:, 0] * 1004))
plt.figure(figsize=(7,
plt.plot(y[:, 1],lw=1.5,
plt.plot(y[:, 0], lw=1.5, label='1st')
plt.plot(y, 'ro') label='2nd')
plt.legend(loc=0)
plt.grid(True)
plt.axis('tight')
plt.xlabel('index')
plt.ylabel('value')
plt.title('A Simple Plot')

Figure5­9.Plotwithtwodifferentlyscaleddatasets
InspectionofFigure5­9revealsthatthef irst datasetis stil “visually
readable,”whiletheseconddatasetnowlookslikeastraightlinewiththe
newscalingofthey­axis.
problem:
nowgets“visuallylost.” There are twobasicapproachestoresolvethis
Ina sense,informationabouttheseconddataset
Useoftwoy­axes(left/right)
Useoftwosubplots(upper/lower,left/right)
Letusf
differenty­axes.
irst introduceThelefty­axisisforthef
asecondy­axisintothepirst datasetwhiletherighty­axis
lot. Figure5­10nowhastwo
is forthesecond.Consequently,therearealsotwolegends:
In [13]: fig,ax1=
plt.plot(y[:, plt.subplots()
0],0], 'b', lw=1.5, label='1st')
plt.plot(y[:,
plt.grid(True) 'ro')
plt.legend(loc=8)
plt.axis('tight')
plt.ylabel('value 1st')
plt.xlabel('index')
plt.title('A
ax2plt.plot(y[:,Simple Plot')
= ax1.twinx()
plt.plot(y[:, 1],1], 'g',
'ro')lw=1.5, label='2nd')
plt.legend(loc=0)
plt.ylabel('value 2nd')
Figure5­10.Plotwithtwodatasetsandtwoy­axes
onesthatfollow:
Thekeylinesofcodearethosethathelpmanagetheaxes.Thesearethe
fig,# plotax1 =firstplt.subplots()
dataset using first (left) axis
ax2=# plotax1.twinx()
second data set using second (right) axis
Byusingthe plt.subplots function,wegetdirectaccesstotheunderlying
plottingobjects(thefigure,subplots,e t c . ) . I t allowsus,forexample, to
generateasecondsubplotthatsharesthex­axiswiththefirst subplot.In
Figure5­10wehave,then,actuallytwosubplotsthatoverlayeachother.
Next,considerthecaseoftwoseparatesubplots.Thisoptiongiveseven
morefreedomtohandlethetwodatasets, asFigure5­11il ustrates:
In [14]: plt.figure(figsize=(7,
plt.subplot(211) 5))
plt.plot(y[:,
plt.plot(y[:, 0],0], lw=1.5,
'ro') label='1st')
plt.grid(True)
plt.legend(loc=0)
plt.axis('tight')
plt.ylabel('value')
plt.title('A Simple Plot')
plt.subplot(212)
plt.plot(y[:,
plt.plot(y[:, 1],1], 'g',
'ro')lw=1.5, label='2nd')
plt.grid(True)
plt.legend(loc=0)
plt.axis('tight')
plt.xlabel('index')
plt.ylabel('value')
Figure5­11.Plotwithtwosubplots
Theplacingofsubplotsinthea matplotlibfigure
accomplishedherebytheuseofaspecialcoordinatesystem.objecti splt.subplot
takesasargumentsthreeintegersfornumrows,
numcols
separatedbycommas and
or not). numrowsfignumthe
thenumberofcolumns, numcols, and fignum
specifiesthenumberofrows, (either
numberofthesub­plot,
startingwith1andendingwith numrows * numcols.
withnineequallysizedsubplotswouldhave Forexample,afigure
numrows=3,numcols=3, and
fignum=1,2,...,9.
“coordinates”: plt.subplot(3, 3, 9).
Thelower­rightsubplotwouldhavethefollowing
Sometimes,itmightbenecessaryordesiredtochoosetwodifferentplot
typestovisualizesuchdata.Withthesubplotapproachyouhavethe
freedomtocombinearbitrarykindsofplotsthat matplotlib
Figure5­12combinesaline/pointplotwithabarchart: o f e r s . [ 2 3 ]
In [15]: plt.figure(figsize=(9,
plt.plot(y[:, 0], lw=1.5,4))label='1st')
plt.subplot(121)
plt.plot(y[:,
plt.grid(True) 0], 'ro')
plt.legend(loc=0)
plt.axis('tight')plt.xlabel('index')
plt.ylabel('value')
plt.title('1st Data Set')
plt.subplot(122)
plt.bar(np.arange(len(y)), y[:, 1], width=0.5,
plt.grid(True) color='g', label='2nd')
plt.legend(loc=0)
plt.axis('tight')
plt.xlabel('index')plt.title('2ndData Set')
Figure5­12.Plotcombiningline/pointsubplotwithbarsubplot
OtherPlotStyles
probablythemostimportantones in finance;thisandis becausemanydatasets
Whenitcomestotwo­dimensionalplotting,line pointplots are
Chapter6addressesfinancialtimesseriesdataindetail.However,forthe
embodytimeseriesdata,whichgenerallyis visualizedbysuchplots.
momentwewanttostickwiththetwo­dimensionaldatasetandillustrate
somealternative,andforfinancialapplicationsuseful,visualization
Thefirst
approaches.
typei i s thescatterp l o t , wherethevaluesofonedatasetserveasthe
financialtimeseriesagainstthoseofanother one.
valuesfortheotherdataset.Figure5­13showssuchaplot.Suchaplot
s used,forexample,whenyouwanttoplotthereturnsofone
Forthis example we x
useanewtwo­dimensionaldatasetwithsomemoredata: will
In [16]: y = np.random.standard_normal((1000, 2))
In [17]: plt.figure(figsize=(7,
plt.plot(y[:,0], y[:, 5))1], 'ro')
plt.grid(True)plt.xlabel('1st')
plt.ylabel('2nd')
plt.title('Scatter Plot')

Figure5­13.Scatterplotviaplotfunction
matplotlib alsoprovidesaspecificfunctiontogeneratescatterplots. It
basicallyworksinthesameway,butprovidessomeadditionalfeatures.
generatedusingthe scatter function:
Figure5­14showsthecorrespondingscatterplottoFigure5­13,thistime
In [18]: plt.figure(figsize=(7,
plt.scatter(y[:, 0], 5))1], marker='o')
y[:,
plt.grid(True)plt.xlabel('1st')
plt.ylabel('2nd')
plt.title('Scatter Plot')

Figure5­14.Scatterplotviascatterfunction
scatter plottingfunction,forexample,allowstheadditionofathird
Thedimension,whichcan be visualizedthroughdifferentcolorsandbe
describedbytheuseofacolorbar.Tothisend,wegenerateathirddataset
withrandomdata,this timewithintegersbetween0and10:
In [19]: c = np.random.randint(0, 10, len(y))
Figure5­15showsascatterplotwherethereisathirddimensioni l ustrated
bydifferentcolorsofthesingledotsandwithacolorbarasalegendforthe
colors:
In [20]: plt.figure(figsize=(7,
plt.scatter(y[:, 0], 5))1], c=c, marker='o')
y[:,
plt.colorbar()
plt.grid(True)
plt.xlabel('1st')plt.ylabel('2nd')
plt.title('Scatter Plot')
Figure5­15.Scatterplotwiththirddimension
Anothertypeofp l o t , thehistogram,i s alsooftenusedinthecontextof
financialreturns.Figure5­16putsthefrequencyvaluesofthetwodatasets
nexttoeachotherinthesamep lot:
In [21]: plt.figure(figsize=(7,
plt.hist(y, 4))'2nd'], bins=25)
label=['1st',
plt.grid(True)
plt.legend(loc=0)
plt.xlabel('value')
plt.ylabel('frequency')
plt.title('Histogram')
Figure5­16.Histogramfortwodatasets
Sincethehistogrami s suchanimportantplottypeforfinancial
applications,l es theparametersthataresupported:plt.hist. Thefollowing
exampleil ustertatustakeacloserlookattheuseof
plt.hist(x,
bottom=None,bins=10,
cumulative=False, range=None, normed=False, weights=None,
histtype='bar',
rwidth=None,log=Falsec,olor=None, align='mid',
label=None, orientation='vertical',
stacked=False, hold=None,
**kwargs)
Table5­5providesadescriptionofthemainparametersofthe
function. plt.hist
Table5­5.Parametersforplt.hist
Parameter Description
x list object(s), ndarray object
bins Numberofbins
range Lowerandupperrangeofbins
normed Normingsuchthatintegralvalueis 1
weights Weightsforeveryvaluein x
cumulative Options(
histtype Everybincontainsthecountsofthelowerbins
strings): bar,barstacked,step,stepfilled
align Options(strings): left, mid, right
orientation Options(strings): horizontal,vertical
rwidth Relativewidthofthebars
log Logscale
color Colorperdataset (array­like)
label Stringorsequenceofstringsforlabels
stacked Stacksmultipledatasets
Figure5­17showsasimilarp
stackedinthehistogram: l o t ; t h i s time,thedataofthetwodatasetsi s
In [22]: plt.figure(figsize=(7,
plt.hist(y, label=['1st',4))'2nd'], color=['b', 'g'],
stacked=True, bins=20)
plt.grid(True)
plt.legend(loc=0)
plt.xlabel('value')
plt.ylabel('frequency')
plt.title('Histogram')

Figure5­17.Stackedhistogramfortwodatasets
Anotherusefulplottypei s theboxplot.Similartothehistogram,the
boxplotallowsbothaconciseoverviewofthecharacteristics ofa dataset
forourdataset:
andeasycomparisonofmultipledatasets. Figure5­18showssuchaplot
In [23]: plt.boxplot(y)
fig, ax = plt.subplots(figsize=(7, 4))
plt.grid(True)xticklabels=['1st', '2nd'])
plt.setp(ax,
plt.xlabel('data set')
plt.ylabel('value')
plt.title('Boxplot')
Thislastexampleusesthefunction plt.setp, whichsetspropertiesfora
(by:set of)plottinginstance(s).Forexample,consideringalineplotgenerated
line = plt.plot(data, 'r')
thefollowingcode:
plt.setp(line, linestyle='--')
changesthestyleofthelineto“dashed.”Thisway,youcaneasilychange
parametersa fter theplottinginstance(“artist object”)hasbeengenerated.
Figure5­18.Boxplotfortwodatasets
Asaf i n a l i l u s t r a t i o n int h i s section, we consideramathematicallyinspired
plotthatcanalsobefoundasanexampleinthegalleryfor
betweenalowerandanupperlimit—inotherwords,theintegralvalueof
plotsafunctionandil ustrates graphicallytheareabelowthefunction matplotlib. It
thefunctionbetweenthelowerandupperl
resultingplotandillustratesthat
typesettingfortheinclusion of mathematicalformulaeintoplots:
matplotlibimitseamlesslyhandles LaTeX
s. Figure5­19showsthe
In [24]: from matplotlib.patches import Polygon
defa, b=0.5,
func(x): # integral+1limits
return0.51.5*np.exp(x)
x = np.linspace(0, 2)
y = func(x)
ax=
fig, plt.subplots(figsize=(7,
plt.plot(x,y, 'b', linewidth=2) 5))
plt.ylim(ymin=0)
function ## Illustrate the integral value, i.e. the area under the
the lower andb) upper limits
IxIybetween
=np.linspace(a,
= func(Ix)
verts
poly= =[(a,0)]+ list(zip(Ix,Iy))+ [(b,0)]
Polygon(verts, facecolor='0.7', edgecolor='0.5')
ax.add_patch(poly)
plt.text(0.5horizontalalignment='center',
* (a + b), 1, r"$\int_a^b f(x)\mathrm{d}x$",
fontsize=20)
plt.figtext(0.9, 0.9, '$x$')
plt.figtext(0.075,0.075, '$f(x)$')
ax.set_xticks((a, b)) '$b$'))
ax.set_xticklabels(('$a$',
ax.set_yticks([func(a), func(b)])'$f(b)$'))
ax.set_yticklabels(('$f(a)$',
plt.grid(True)
Figure5­19.Exponentialfunction,integralarea,andLaTeXlabels
Letusgothroughthegenerationoft h i s plotstepbystep.Thef
thedefinitionofthefunctiontobeintegrated: i r s t stepi s
def func(x):
return 0.5 * np.exp(x) + 1
Thesecondstepi s thedefinitionoftheintegrallimitsandthegenerationof
needednumericalvalues:
= 0.5, 1.5 # integral
xa,=bnp.linspace(0, 2) limits
y = func(x)
Third,weplotthefunctionitself:
fig,ax = plt.subplots(figsize=(7,
plt.plot(x,y, 'b', linewidth=2) 5))
plt.ylim(ymin=0)
Fourthandcentral,wegeneratetheshadedarea(“patch”)bytheuseofthe
Polygon functionil ustrating theintegralarea:
IxIy ==func(Ix)
np.linspace(a, b)
verts
poly = [(a, 0)] +list(zip(Ix,
=Polygon(verts, Iy)) + [(b,
facecolor='0.7', 0)]
edgecolor='0.5')
ax.add_patch(poly)
Thefiftoththepstepliosttheaddition
labels of themathematicalformulaandsomeaxis
, usingthe plt.text and plt.figtext functions. LaTeX
code is passedbetweentwodollarsigns($ ... $). Thefirst two
parametersofbothfunctionsarecoordinatevaluestoplacetherespective
text:
plt.text(0.5horizontalalignment='center',
* (a + b), 1, r"$\int_a^b f(x)\mathrm{d}x$",
fontsize=20)
plt.figtext(0.9, 0.075,
plt.figtext(0.075, 0.9, '$x$')
'$f(x)$')
Finally,wesettheindividualxandyticklabelsa t t h e i r respective
positions.Notethatalthoughweplacevariablenamesrenderedin LaTeX,
thecorrectnumericalvaluesareusedfortheplacing.Wealsoaddagrid,
whichinthis particularcaseis onlydrawnfortheselectedtickshighlighted
before:
ax.set_xticklabels(('$a$',
ax.set_xticks((a, b)) '$b$'))
ax.set_yticklabels(('$f(a)$', '$f(b)$'))
ax.set_yticks([func(a),func(b)])
plt.grid(True)
FinancialPlots
matplotlib alsoprovidesasmallselectionofspecialfinanceplots.These,
dataorsimilarfinancialtimeseriesdata.Thoseplottingcapabilitiesare
likethecandlestickp lot, aremainlyusedtovisualizehistoricalstockprice
found in thematplotlib.financesublibrary:
In [25]: import matplotlib.finance as mpf
Asaconveniencefunction,t
historicalstockpricedatafromtheYahoo!Financewebsite(
http://finance.yahoo.com).Allyouneedarestartandenddatesand
his sublibraryallowsforeasyretrievalof
cf.
respectivetickersymbol.ThefollowingretrievesdatafortheGermanDAX the
indexwhosetickersymbolis ^GDAXI:
In [26]: start = (2014,6,5,30)1)
end = (2014,
quotes = mpf.quotes_historical_yahoo('^GDAXI', start, end)
DATAQUALITY OF WEB SOURCES
Nowadays,acoupleof Python libraries provideconvenience
althoughthis is aconvenientwaytovisualizefinancialdatasets,the
functionstoretrievedatafromYahoo!Finance.Beawaret hat,
onit.Forexample,stocks
dataqualityisnotsufficienttobaseanyimportantinvestmentdecision
plits, leadingto“pricedrops,”areoftennot
correctlyaccountedfor in thedataprovidedbyYahoo!Finance.This
holdstrueforanumberofotherfreelyavailabledatasourcesaswell.
quotes nowcontainstimeseriesdatafortheDAXindexstartingwith
(inepochtimeformat),then Open,High,Low,Close, and Volume: Date
In[27]: quotes[:2]
Out[27]: [(735355.0,
9611.7900000000009,
9556.0200000000004,
9627.3799999999992,
9533.2999999999993,
88062300.0),
(735358.0,
9536.3799999999992,
9529.5,
9548.1700000000001,
9407.0900000000001,
61911600.0)]
Theplottingfunctionsof matplotlib.finance
formatandthedatasetcanbepassed,forexample,tothe understandexactlyt h
candlestick i s
functionasi t i s . Figure5­20showsther e s u l t . Dailypositivereturnsare
indicatedbybluerectangles,andnegativereturnsbyredones.Asyou
notice, matplotlib takescareoftherightlabelingofthex­axisgiventhe
dateinformationinthedataset:
In [28]: fig,ax= plt.subplots(figsize=(8,
mpf.candlestick(ax, quotes, width=0.6,5)) colorup='b',
fig.subplots_adjust(bottom=0.2)
colordown='r')
plt.grid(True)
ax.xaxis_date()
#datesonthe x-axis
ax.autoscale_view()
plt.setp(plt.gca().get_xticklabels(), rotation=30)

Figure5­20.Candlestickchartforfinancialdata
Intheprecedingcode, plt.setp(plt.gca().get_xticklabels(),
rotation=30) grabsthex­axislabelsandrotatesthemby30degrees.To
this end,thefunction plt.gcaisused,which
object.Themethodcallof returnsthecurrent figure
get_xticklabels thenprovidestheticklabels
Table5­6providesadescriptionofthedifferentparametersthe
forthex­axisofthefigure.
mpf.candlestick functiontakes.
Table5­6.Parametersformpf.candlestick
Parameter Description
ax An Axes instancetoplotto
quotes Financialdatatoplot(sequenceof
sequences) time, open, close, high,low
width Fractionofadayfortherectanglewidth
colorup Thecoloroftherectanglewhereclose >= open
colordown Thecoloroftherectanglewhereclose<open
alpha Therectanglealphalevel
Arathersimilarplottypei s providedbythe
whichisusedinthesamefashionasthe plot_day_summary function,
candlestickfunctionand with
similarparameters.Here,openingandclosingvaluesarenotillustratedbya
coloredrectanglebutratherbytwo
shows: small horizontallines, as Figure5­21
In [29]: mpf.plot_day_summary(ax,
fig, ax = plt.subplots(figsize=(8,
quotes, 5)) colordown='r')
colorup='b',
plt.grid(True)
ax.xaxis_date()Index')
plt.title('DAX
plt.ylabel('index level')
plt.setp(plt.gca().get_xticklabels(), rotation=30)

Figure5­21.Dailysummarychartforfinancialdata
Often,stockpricedatai
withtheresultshown in s combinedwithvolumedatainasingleplottoalso
Figure5­22,imarketl ustarcatiesvisuchausecasebasedon
provideinformationwithregardto ty. Thefollowingcode,
historicaldataforthestockofYahoo!Inc.:
Inend))[30]: quotes = np.array(mpf.quotes_historical_yahoo('YHOO', start,
6))In [31]: mpf.candlestick(ax1,
fig,(ax1,ax2)= plt.subplots(2,
quotes, sharex=True, figsize=(8,
width=0.6, colorup='b',
colordown='r')
ax1.set_title('Yahoo Inc.')
ax1.set_ylabel('index
plt.bar(quotes[:, 0]-level')
ax1.xaxis_date()
ax1.grid(True)
0.25, quotes[:, 5], width=0.5)
ax2.set_ylabel('volume')
ax2.grid(True)
ax2.autoscale_view()
plt.setp(plt.gca().get_xticklabels(), rotation=30)
Figure5­22.Plotcombiningcandlestickandvolumebarchart
3DPlotting
Therearenottoomanyfieldsinfinancethatreallybenefitfrom
visualizationinthreedimensions.However,oneapplicationareais
voftimes­of­maturityandstrikes.Inwhatfollows,weartificiallygeneratea
olatility surfacesshowingimpliedvolatilities simultaneouslyforanumber
plotthatresemblesav
Strikevaluesbetween50and150
olatility surface.Tothisend,weconsider:
Thisprovidesourtwo­dimensionalcoordinatesystem.WecanuseNumPy’s
Times­to­maturitybetween0.5and2.5years
meshgrid functiontogeneratesuchasystemoutoftwoone­dimensional
ndarray objects:
In [32]: strike
ttm= =np.linspace(50, 150,24)24)
np.linspace(0.5,2.5,
strike, ttm = np.meshgrid(strike, ttm)
Thistransformsboth1D arrays into2D arrays, repeatingtheoriginalaxis
valuesasoftenasneeded:
In[33]: strike[:2]
Out[33]: 50.
63.04347826, 67.39130435,, 54.34782609,
array([[ 58.69565217,
71.73913043, 76.08695652,
80.43478261, 84.7826087 , 89.13043478, 93.47826087,
97.82608696, 102.17391304, 106.52173913, 110.86956522,
115.2173913 , 119.56521739, 123.91304348, 128.26086957,
132.60869565, 136.95652174, 141.30434783, 145.65217391, 150.
],
63.04347826, [ 67.39130435,
50. , 54.34782609, 58.69565217,
80.43478261, 84.7826087, 71.73913043, 76.08695652,
89.13043478, 93.47826087,
97.82608696, 102.17391304, 106.52173913, 110.86956522,
115.2173913 ,
132.60869565, 119.56521739, 123.91304348, 128.26086957,
136.95652174, 141.30434783, 145.65217391, 150.
]])
Now,giventhenew ndarray objects,wegeneratethefakeimplied
volatilities byasimple,scaledquadraticfunction:
In [34]: iv#=(strike **2 (100*strike)
generate fake implied volatilities / ttm
- 100) /
Theplotresultingfromthefollowingcodeis showninFigure5­23:
In[35]: from mpl_toolkits.mplot3d import Axes3D
axfig==fig.gca(projection='3d')
plt.figure(figsize=(9, 6))
cstride=2,surf = ax.plot_surface(strike, ttm, iv, rstride=2,
cmap=plt.cm.coolwarm,
antialiased=True) linewidth=0.5,
ax.set_xlabel('strike')
ax.set_ylabel('time-to-maturity')
ax.set_zlabel('implied volatility')
fig.colorbar(surf, shrink=0.5, aspect=5)
Figure5­23.3Dsurfaceplotfor(fake)impliedvolatilities
Table5­7providesadescriptionofthedifferentparametersthe
plot_surface functioncantake.
Table5­7.Parametersforplot_surface
Parameter Description
X, Y, Z Datavaluesas2Darrays
rstride Arrayrowstride (stepsize)
cstride Arraycolumnstride (stepsize)
color Colorofthesurfacepatches
cmap Acolormapforthesurfacepatches
facecolors Facecolorsfortheindividualpatches
norm Aninstanceof Normalize tomapvaluestocolors
vmin Minimumvalue tomap
vmax Maximumvalue to map
shade Whethertoshadethefacecolors
Aswithtwo­dimensionalp lots, thelinestylecanbereplacedbysingle
pointsor,asinwhatfollows,singlet riangles. Figure5­24plotsthesame
dataasa3Dscatterplot,butnowalsowithadifferentviewingangle,using
the view_init functiontosetit:
In [36]: axfig==fig.add_subplot(111,
plt.figure(figsize=(8,projection='3d')
5))
ax.view_init(30, 60)ttm, iv, zdir='z', s=25,
ax.scatter(strike,
c='b', marker='^')
ax.set_xlabel('strike')
ax.set_ylabel('time-to-maturity')
ax.set_zlabel('implied volatility')

Figure5­24.3Dscatterplotfor(fake)impliedvolatilities
Conclusions
matplotlibcan beconsideredboththebenchmarkandtheworkhorse
whenitcomestodatavisualization
NumPy
However,ontheotherhand, in
matplotlibPython.Itis tightlyintegratedwith
andthebasicfunctionalityiseasilyandconvenientlyaccessed.
is arathermightylibrarywith
somewhatcomplexAPI.Thismakesi t impossibletogiveabroader a
overviewofal thecapabilitiesof matplotlib inthischapter.
Thischapterintroducesthebasicfunctionsof matplotlib for2Dand3D
plottingusefulinmostfinancialcontexts.Otherchaptersprovidefurther
examplesofhowtousethis fundamentallibraryforvisualization.
FurtherReading
Themajorresourcesfor matplotlib canbefoundontheWeb:
http://matplotlib.org.matplotlib is, ofcourse,thebeststartingpoint:
Thehomepageof
There’sagallerywithmanyusefulexamples:
http://matplotlib.org/gallery.html.
Atutorialfor2Dplottingis foundhere:
http://matplotlib.org/users/pyplot_tutorial.html.
Anotheronefor3Dplottingis here:
http://matplotlib.org/mpl_toolkits/mplot3d/tutorial.html.
It hasbecomekindofastandardroutine to consultthegallery,tolookthere
correspondingexamplecode.Using,forexample, IPython Notebook,
foranappropriatevisualizationexample,andtostartwiththe
onlyasinglecommandi
rightexample. s requiredtogetstartedonceyouhavefoundthe
[23] Foranoverviewofwhichplottypesareavailable,visit the matplotlib gallery.
Chapter
Series 6.FinancialTime
Theonlyreasonfortimei
—AlbertEinstein s sothateverythingdoesn’thappenatonce.
Oneofthemostimportanttypesofdataoneencountersinfinanceare
financialtimeseries.Thisi s dataindexedbydateand/ortime.Forexample,
pricesofstocksrepresentfinancialtimeseriesdata.Similarly,theUSD­
EURexchange rate representsafinancialtimeseries;theexchangeratei
quotedinbriefintervalsoftime,andacollectionofsuchquotesthen
timeseriesofexchanger ates. isas
There is nofinancialdisciplinethatgets by withoutconsideringtime
sciences.Themajortooltocopewithtimeseriesdatain
importantfactor.Thismainlyi Python isthean
s thesameaswithphysicsandother
library pandas. WesMcKinney,themainauthorof tpandas,
developingthelibrarywhenworkingasananalysta
Management,alargehedgefund.Itissafetosaythat pandasstarted
AQRCapital hasbeen
designedfromthegrounduptoworkwithfinancialtimes
main
suchastheDataFrameandSeriesclasses, i s e
chapterdemonstrates,the inspirationforthefundamentalclasses, r i e s . Ast
drawnfromtheRstatistical h i s
analysislanguage,whichwithoutdoubthasastrengthinthatkindof
Thechapterismainly
modelingandanalysis. basedonacoupleofexamplesdrawnfroma
financialcontext.It proceedsalongthefollowingl ines:
Westart exploringthecapabilitiesof pandas byusingverysimpleand
Firstandsecondsteps
smalldatas
transformingthistoa DataFrameobject. Aswego,basicanalyticsand
ets; wethenproceedbyusinga NumPyndarray objectand
visualizationcapabilitiesareil ustrated.
Web
pandas allowsustoconvenientlyretrievedatafromtheWeb—e.g.,
Datafromthe
fromYahoo!Finance—andtoanalyzesuchdatainmanyways.
Usingdatafrom CSV f i l e
Comma­separatedvalue(CSV)f s iles representaglobalstandardforthe
exchangeoffinancialtimeseriesdata; pandas makesreadingdatafrom
suchfilesanefficienttask.Usingdatafortwoindices,weimplementa
regressionanalysiswith pandas.
High­frequencydata
to tickdata.Dailytickdatavolumesforastockpriceears.[24]
regularlysurpassthosevolumesofdailydatacollectedover30y
dailyquotes
Inrecentyears,availablefinancialdatahasincreasinglyshiftedfrom
Allfinancialtimeseriesdatacontainsdateand/ortimeinformation,by
with Python,NumPy, andpandasaswellasofhowtoconverttypicaldate­
definition.AppendixCprovidesanoverviewofhowtohandlesuchdata
timeobjecttypesintoeachother.
pandasBasics
Inasense, pandasis built “ontop”of NumPy. So,forexample, NumPy
thereforeimportbothtobeginwith: pandas objectsaswell.We
universalfunctionswillgenerallyworkon
In [1]: import
import numpy
pandasasasnppd
FirstStepswithDataFrameClass
indexed and labeleddata,nottoodifferent
Onaratherfundamentall evel,the DataFramefromaclassiSQLs designedtomanage
databasetableora
worksheet ina spreadsheetapplication.Considerthefollowingcreationofa
DataFrame object:
In [2]: df = pd.DataFrame([10, 20, 30,'b',40],'c',columns=['numbers'],
index=['a', 'd'])
df
Out[2]: a numbers10
bc 2030
d 40
Thissimpleexamplealreadyshowssomemajorfeaturesofthe
classwhenitcomestostoringdata: DataFrame
DataDataitself canbeprovidedindifferentshapesandtypes(list, tuple,
ndarray, and dictobjects are candidates).
LabelsDatais organizedincolumns,whichcanhavecustomnames.
IndexThereis anindexthatcantakeondifferentformats(e.g., numbers,
strings, timeinformation).
Workingwithsucha DataFrame objectisingeneralprettyconvenientand
f icient, e.g., comparedtoregular ndarray objects,whicharemore
especializedandmorerestrictedwhenyouwanttodosomethinglink
enlargeanexistingobject.Thefollowing are
howtypicaloperationsona DataFrame objectwork: simpleexamplesshowing
In [3]: df.index # the index values
Out[3]: Index([u'a', u'b', u'c', u'd'], dtype='object')
In [4]: df.columns # the column names
Out[4]: Index([u'numbers'], dtype='object')
In [5]:df.ix['c'] # selection via index
Out[5]: numbers 30 int64
Name: c, dtype:
In [6]: df.ix[['a', 'd']] # selection of multiple indices
Out[6]: a numbers10
d 40
In [7]: df.ix[df.index[1:3]] # selection via Index object
Out[7]: b numbers20
c 30
In [8]: df.sum() # sum per column
Out[8]: numbers
dtype: int64100
In [9]: df.apply(lambda x: x ** 2) # square of every element
Out[9]: a numbers100
bc 400900
d 1600
Ingeneral,youcanimplementthesamevectorizedoperationsona
DataFrame objectasonaNumPy ndarray object:
In[10]: df ** 2 # again square, this time NumPy-like
Out[10]: a numbers100
bc 400900
d 1600
Enlargingthe DataFrame objectinbothdimensionsis possible:
In [11]: df['floats']
df#new column= (1.5, 2.5, 3.5, 4.5)
is generated
Out[11]: a numbers10 floats1.5
bc 2030 2.53.5
d 40 4.5
In [12]: df['floats'] # selection of column
Out[12]: ab 1.52.5
cd 3.54.5
Name: floats, dtype: float64
Awhole DataFrame objectcanalsobetakentodefineanewcolumn.In
suchacase,indicesarealignedautomatically:
In'Francesc'],
[13]: df['names'] = pd.DataFrame(['Yves', 'Guido', 'Felix',
df index=['d', 'a', 'b', 'c'])
Out[13]: a numbers10 floats1.5 Guido
names
bc 3020 3.52.5 FrancescFelix
d 40 4.5 Yves
Appendingdataworkssimilarly.However,inthefollowingexamplewe
seeasideeffectthatisusuallytobeavoided—theindexgetsreplacedbya
simplenumberedindex:
In'Henry'},
[14]: df.append({'numbers': 100, 'floats': 5.75, 'names':
ignore_index=True)
# temporary object; df not changed
Out[14]: 0 numbers10 floats names
1 20 2.50 Guido
1.50 Felix
23 3040 3.50
4.50 Francesc
Yves
4 100 5.75 Henry
Iindexinformation.Thispreservestheindex:
t isoftenbettertoappendaDataFrameobject,providingtheappropriate
In [15]: df = df.append(pd.DataFrame({'numbers':
'names': 100,'floats':
'Henry'}, index= 5.75,
['z',])) df
Out[15]: a floats
1.50 names
Guido numbers10
bc 2.50
3.50 Felix 2030
Francesc
dz 4.50
5.75 HenryYves 10040
of thestrengthsof pandas is workingwithmissingdata.
Oneconsiderthefollowingcodethataddsa To this end,
new column,butwithaslightly
differentindex.Weusetheratherflexible join methodhere:
In [16]: df.join(pd.DataFrame([1,
index=['a', 4, 9,'c',16,'d',25],'y'],
'b',
# temporarycolumns=['squares',]))
object
Out[16]: a floats
1.50 names numbers10 squares1
Guido
bc 2.50
3.50 Felix 2030 49
Francesc
zd 5.75
4.50 Henry Yves 10040 NaN16
s that pandas bydefaultacceptsonlyvaluesfor
thoseindicesthatalreadyexist.Welosethevaluefortheindexyandhave
Whatyoucanseeherei
aNaNvalue(i.e.,“NotaNumber”)atindexposition
indices,we can provideanadditionalparameter to t ez.To
l preservebothtojoin.
pandashow
Inourcase,weuse
indices: how="outer" tousetheunionofa l valuesfromboth
index=['a',4,'b','c',
In [17]: df = df.join(pd.DataFrame([1, 9, 16, 25],'d', 'y'],
columns=['squares',]),
how='outer')
df
Out[17]: a floats
1.50 names numbers10 squares1
Guido
bc 2.50
3.50 Felix 2030 49
Francesc
ydz 4.50
NaN YvesNaN NaN40 1625
5.75 Henry 100 NaN
Indeed,theindexi s nowtheunionofthetwooriginalindices.Allmissing
datapoints,giventhenewenlargedindex,arereplacedbyNaNvalues.
Otheroptionsforthejoinoperationinclude
Although
joined.
themethodi innerforthe intersectionof
left (default)fortheindexvaluesoftheobjectonwhich
theindexvalues,s called,andrightfortheindexvaluesoftheobjecttobe
therearemissingvalues,themajorityofmethodcal s willstil
work.Forexample:
In [18]: df[['numbers',
# column-wise'squares']].mean()
mean
Out[18]: numbers
squares 4011
dtype: float64
In [19]: df[['numbers',
# column-wise'squares']].std()
standard deviation
Out[19]: numbers
squares 35.355339
dtype: float649.669540
SecondStepswithDataFrameClass
Fromnowon,wewillworkwithnumericaldata.Wewilladdfurther
featuresaswego,likeaDatetimeIndextomanagetimeseriesdata.To
haveadummydatasettoworkwith,generatea numpy.ndarry with,for
example,ninerowsandfourcolumnsofpseudorandom,standardnormally
distributednumbers:
In [20]: aa.round(6)
= np.random.standard_normal((9, 4))
Out[20]: array([[-0.737304,
[-0.788818, 1.065173, 0.073406,
-0.985819, 0.403796, 1.301174],
-1.753784],
[-0.155881,
[-0.777546, -1.752672,
1.730278, 1.037444,
0.417114, -0.400793],
0.184079],
[-1.76366 ,
[[-1.134258, -0.375469, 0.098678,
1.401821, 1.227124,
0.458838, -0.143187, -1.553824],
0.979389],
[-0.103058, 1.565701, -2.085863],
[1.040318, -0.36617,
-0.128799, -0.478036,
0.786187, -0.03281
0.414084]])],
Althoughyoucanconstruct DataFrame objectsmoredirectly(aswehave
seenbefore),usingan ndarray objectis generallyagoodchoicesince
pandas willretainthebasicstructureandwill“only”addmeta­information
(e.g.,indexvalues).Italsorepresentsatypicalusecaseforfinancial
applicationsandscientific researchingeneral.Forexample:
In [21]: dfdf = pd.DataFrame(a)
Out[21]: 0 -0.7373040 1.0651731 0.0734062 1.3011743
12 -0.788818
-0.155881 -0.985819
-1.752672 0.403796
1.037444 -1.753784
-0.400793
34 -0.777546
-1.763660 1.730278
-0.375469 0.417114
0.098678 0.184079
-1.553824
56 -1.134258
0.458838 1.401821 1.227124
-0.143187 1.565701 0.979389
-2.085863
78 -0.103058
1.040318 -0.366170
-0.128799 -0.478036
0.786187 -0.032810
0.414084
Table 6­1 lists theparametersthatthe DataFrame functiontakes.Inthe
table, “array­like”meansadatastructuresimilartoan ndarray object—a
list, forexample. Index is aninstanceofthe pandasIndex clas .
Table6­1.ParametersofDataFramefunction
Parameter Format Description
Datafor DataFrame;dict cancontain Series,
data ndarray/dict/DataFrame ndarrays,lists
index Index/array­like Indextouse;defaultsto range(n)
columns Index/array­like Columnheaderstouse;defaultsto range(n)
dtype dtype, default None Datatypetouse/force;otherwise,it is inferred
copy bool, default None Copydatafrominputs
Aswithstructuredarrays,
havecolumnnamesthat andaswedefineddirectlybyassigninga
canbe
therightnumberofelements.Thisi ustrates thatyou canDataFrameobjects
lhavealreadyseen, listwith
define/changethe
attributesoftheDataFrameobject as yougo:
In [22]: df.columns
df = [['No1', 'No2', 'No3', 'No4']]
Out[22]: 0 -0.737304No1 1.065173No2 0.073406No3 1.301174No4
12 -0.788818
-0.155881 -0.985819
-1.752672 0.403796
1.037444 -1.753784
-0.400793
34 -0.777546
-1.763660 1.730278
-0.375469 0.417114
0.098678 0.184079
-1.553824
56 -1.134258
0.458838 1.401821 1.227124
-0.143187 1.565701 0.979389
-2.085863
78 -0.103058
1.040318 -0.366170
-0.128799 -0.478036
0.786187 -0.032810
0.414084
Thecolumnnamesprovideanefficientmechanismtoaccessdatainthe
DataFrame object,againsimilartostructuredarrays:
In [23]: df['No2'][3] # value in column No2 at index position 3
Out[23]: 1.7302783624820191
handletimeindiceswell.Thiscanalsobeconsideredamajorstrengthof
Toworkwithfinancialtimeseriesdataefficiently,youmustbeableto
pandas. Forexample,assumethatourninedataentriesinthefourcolumns
correspondtomonth­enddata,beginninginJanuary2015.A
DatetimeIndex objectisthengeneratedwith date_rangeas follows:
In[24]: dates
dates = pd.date_range('2015-1-1', periods=9, freq='M')
Out[24]: <class 'pandas.tseries.index.DatetimeIndex'>
[2015-01-31,
Length:9, Freq:...,M,Timezone:
2015-09-30] None
Table6­2lists theparametersthatthe date_range functiontakes.
Table6­2.Parameters of date_rangefunction
Parameter Format Description
start string/datetime left boundforgeneratingdates
end string/datetime rightboundforgeneratingdates
periods integer/None numberofperiods(if start or end is None)
freq string/DateOffset frequencystring, e.g., 5D for5days
tz string/None timezonenameforlocalizedindex
normalize bool, default None normalize start and end tomidnight
name string, default None nameofresultingindex
objects.Fortime series data,however,a DatetimeIndexstringandint
So far, wehaveonlyencounteredindicescomposedof objectgenerated
withthedate_rangefunctionis ofcoursewhatis needed.
Aswiththecolumns,weassignthenewlygenerated
new Index objecttothe DataFrame object: DatetimeIndex asthe
In [25]: df.index
df = dates
Out[25]: 2015-01-31 -0.737304No1 1.065173No2 0.073406No3 1.301174No4
2015-02-28
2015-03-31 -0.788818
-0.155881 -0.985819
-1.752672 0.403796
1.037444 -1.753784
-0.400793
2015-04-30
2015-05-31 -0.777546
-1.763660 1.730278
-0.375469 0.417114
0.098678 0.184079
-1.553824
2015-06-30 -1.134258 1.401821 1.227124 0.979389
2015-07-31
2015-08-31 0.458838 -0.143187 1.565701 -2.085863
2015-09-30 1.040318 -0.128799 0.786187 -0.032810
-0.103058 -0.366170 -0.478036 0.414084
Whenitcomestothegeneration
date_range
theparameter function,there of
are DatetimeIndex objectswiththehelpof
anumberofchoicesforthefrequency
freq. Table6­3lists al theoptions.
Table6­3.Frequencyparametervaluesfordate_rangefunction
Alias Description
B Businessdayfrequency
C Custombusinessdayfrequency(experimental)
D Calendardayfrequency
W Weeklyfrequency
M Monthendfrequency
BM Businessmonthendfrequency
MS Monthstart frequency
BMS Businessmonthstart frequency
Q Quarterendfrequency
BQ Businessquarterendfrequency
QS Quarterstart frequency
BQS Businessquarterstartfrequency
A Yearendfrequency
BA Businessyearendfrequency
AS Yearstart frequency
BAS Businessyearstart frequency
H Hourlyfrequency
T Minutelyfrequency
S Secondlyfrequency
L Milliseonds
U Microseconds
In thissubsection,westart withaof a pandasDataFrame
enrichedversionintheform object.Butdoestanhis
NumPyndarray objectandendwith
procedureworktheother way around as well?Yes,it does:
In [26]: np.array(df).round(6)
Out[26]: array([[-0.737304,
[-0.788818, 1.065173, 0.073406,
-0.985819, 0.403796, 1.301174],
-1.753784],
[-0.155881,
[-0.777546, -1.752672,
1.730278, 1.037444,
0.417114, -0.400793],
0.184079],
[-1.76366
[-1.134258,, -0.375469, 0.098678,
1.401821, 1.227124, -1.553824],
0.979389],
[[-0.103058,
0.458838, -0.143187, 1.565701, -2.085863],
[ 1.040318, -0.36617, 0.786187, -0.03281
-0.128799, -0.478036, 0.414084]])],
YoucangenerateaARRAYSAND DATAFRAMES
DataFrame objectingeneralfromanndarray
object.Butyoucanalsoeasilygeneratean ndarray
DataFrame byusingthefunction array of NumPy. objectoutofa
BasicAnalytics
Like NumPy arrays,the pandasDataFrame classhasbuilt inamultitudeof
conveniencemethods.Forexample,youcaneasilygetthecolumn­wise
sums,means,andcumulativesumsasfollows:
In[27]: df.sum()
Out[27]: No1No2 -3.961370
0.445156
5.131414
No3No4 -2.948346
dtype: float64
In [28]: df.mean()
Out[28]: No1No2 -0.440152
0.049462
0.570157
No3No4 -0.327594
dtype: float64
In [29]: df.cumsum()
Out[29]: 2015-01-31 -0.737304No1 1.065173No2 0.073406No3 1.301174No4
2015-02-28
2015-03-31 -1.526122
-1.682003 0.079354 0.477201
-1.673318 1.514645 -0.452609
-0.853403
2015-04-30
2015-05-31 -2.459549
-4.223209 0.056960
-0.318508 1.931759
2.030438 -0.669323
-2.223147
2015-06-30
2015-07-31 -5.357467
-4.898629 1.083313
0.940126 3.257562
4.823263 -1.243758
-3.329621
2015-08-31
2015-09-30 -3.961370 0.445156 5.131414 -3.362430
-5.001687 0.573956 4.345227 -2.948346
Therei s alsoashortcut toa numberofoften­usedstatistics fornumerical
datasets, the describe method:
In [30]: df.describe()
Out[30]: count 9.000000No1 9.000000No2 9.000000No3 9.000000No4
mean
stdmin -0.440152
0.847907 0.049462
1.141676 0.570157
0.642904 -0.327594
1.219345
-1.763660
25%50% -0.788818 -1.752672
-0.375469 -0.478036
0.098678 -2.085863
-1.553824
-0.737304
75%max -0.103058 -0.143187
1.065173 0.417114
1.037444 -0.032810
0.414084
1.040318 1.730278 1.565701 1.301174
Youcanalsoapplythemajorityof
objects: NumPy universalfunctionsto DataFrame
In [31]: np.sqrt(df)
Out[31]: 2015-01-31 No1NaN 1.032072No2 0.270935No3 1.140690No4
2015-02-28
2015-03-31 NaNNaN NaNNaN 0.635449
1.018550 NaNNaN
2015-04-30
2015-05-31 NaNNaN 1.315400NaN 0.645844
0.314131 0.429045NaN
2015-06-30
2015-07-31 NaN 1.183985NaN 1.107756
0.677376 1.251280 0.989641NaN
2015-08-31
2015-09-30 1.019960NaN NaNNaN 0.886672NaN 0.643494NaN
NUMPY UNIVERSALFUNCTIONS
Ingeneral,youcanapply NumPy universalfunctionsto pandas
DataFrame objectswhenevertheycouldbeappliedtoan ndarray
objectcontainingthesamedata.
pandas isquiteerrort
putsahNaNvalue
onlyt o l e r a n t ,
incompletedatasetsasiftheywerecomplete
is, butasbrieflyshownalready,youinthesensethati
ina
can t captureserrorsandj
wheretherespectivemathematicaloperationfails.Not
alsoworkwithsuch
numberofcases: u s t
In [32]: np.sqrt(df).sum()
Out[32]: No1No2 1.697335
No3No4 3.531458
6.130617
dtype: 3.202870
float64
Insuchcases, pandas just leavesoutthe
theotheravailablevalues.Plotting
awayingeneral(cf.Figure6­1): of dataiNaNs alsoonlyonelineofcode
valuesandonlyworkswith
In [33]: %matplotlib inline
df.cumsum().plot(lw=2.0)
Figure6­1.Lineplot of aDataFrameobject
pandas providesawrapperaround
specificallydesignedfor
Basically, DataFrame objects.Table6­4l
matplotplibists theparameters
(cf. Chapter5),
thatthe plotmethod takes.
Table6­4.Parametersofplotmethod
Parameter Format Description
x Label/position,default None Onlyusedwhencolumnvalues
arex­ticks
y Label/position,default None Onlyusedwhencolumnvalues
arey­ticks
subplots Boolean,default False Plotcolumnsinsubplots
sharex Boolean,default True Sharingofthex­axis
sharey Boolean,default False Sharingofthey­axis
use_index Boolean,default True Useof DataFrame.index asx­ticks
stacked Boolean,default False Stack(onlyforbarplots)
sort_columns Boolean,default False Sortcolumnsalphabetically
beforeplotting
title String,default None Titlefortheplot
grid Boolean,default False Horizontalandverticalgridlines
legend Boolean,default True Legendoflabels
ax matplotlib axisobject plotting objecttousefor
matplotlibaxis
style Stringorlist/dictionary lineplottingstyle(for each
column)
kind "line“/”bar“/”barh“/”kde“/”density" typeofplot
logx Boolean,default False Logarithmicscalingofx­axis
logy Boolean,default False Logarithmicscalingofy­axis
xticks Sequence,default Index x­ticksfortheplot
yticks Sequence,default Values y­ticksfortheplot
xlim 2­tuple,list Boundariesforx­axis
ylim 2­tuple,list Boundariesfory­axis
rot Integer,default None Rotationofx­ticks
secondary_y Boolean/sequence,default False Secondaryy­axis
mark_right Boolean,default True Automaticlabelingofsecondary
axis
colormap String/colormapobject,default
None Colormaptouseforplotting
kwds Keywords Optionstopassto matplotlib
SeriesClass
Sofar, wehaveworkedmainlywiththe pandasDataFrame class:
In [34]: type(df)
Out[34]: pandas.core.frame.DataFrame
Buttherei s alsoadedicated Series class.Wegeta Series object,for
example,whenselectingasinglecolumnfromour DataFrame object:
In [35]: df['No1']
Out[35]: 2015-01-31
2015-02-28 -0.737304
-0.788818
2015-03-31
2015-04-30 -0.155881
-0.777546
2015-05-31
2015-06-30 -1.763660
-1.134258
2015-07-31
2015-08-31 0.458838
-0.103058
2015-09-30 1.040318
Freq: M, Name: No1, dtype: float64
In [36]: type(df['No1'])
Out[36]: pandas.core.series.Series
Themain DataFrame methods are availablefor Series objectsaswell,and
wecan,forinstance,plottheresultsasbefore(cf.Figure6­2):
In [37]: df['No1'].cumsum().plot(style='r',
import matplotlib.pyplot as plt lw=2.)
plt.xlabel('date')
plt.ylabel('value')

Figure6­2.LineplotofaSeriesobject
has powerfulandflexiblegroupingcapabilities.Theywork
GroupByOperations
pandas
similarlytogroupingin SQL aswellaspivottablesinMicrosoft Excel. To
havesomethingtogroupby,weaddacolumnindicatingthequarterthe
respectivedataoftheindexbelongsto:
In'Q3',[38]:'Q3']df['Quarter'] = ['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2', 'Q3',
df
Out[38]: 2015-01-31 -0.737304No1 1.065173No2 0.073406No3 1.301174No4 QuarterQ1
2015-02-28 -0.155881
2015-03-31 -0.788818 -1.752672
-0.985819 1.037444
0.403796 -0.400793
-1.753784 Q1Q1
2015-04-30
2015-05-31 -0.777546
-1.763660 1.730278 0.417114
-0.375469 0.098678 0.184079 Q2Q2
-1.553824
2015-06-30
2015-07-31 -1.134258
0.458838 1.401821 1.227124
-0.143187 1.565701 0.979389 Q3Q2
-2.085863
2015-08-31
2015-09-30 -0.103058
1.040318 -0.366170
-0.128799 -0.478036
0.786187 -0.032810
0.414084 Q3Q3
Now,wecangroupbythe“Quarter”columnandcanoutputs
thesinglegroups: tatistics for
In [39]: groups = df.groupby('Quarter')
Forexample,wecaneasilygetthe mean,max, and sizeof everygroup
bucketasfollows:
In[40]: groups.mean()
Out[40]: No1 No2 No3 No4
Quarter
Q1Q2 -0.560668 -0.557773 0.504882 -0.284468
-1.225155 0.918877 0.580972 -0.130118
Q3 0.465366 -0.212719 0.624617 -0.568196
In [41]: groups.max()
Out[41]: Quarter No1 No2 No3 No4
Q1Q2 -0.155881 1.065173 1.037444 1.301174
Q31.040318 -0.128799 1.565701 0.979389
-0.777546 1.730278 1.227124 0.414084
In [42]: groups.size()
Out[42]: Quarter
Q1Q2 33
Q3dtype: int643
Groupingcanalsobedonewithmultiplecolumns.Tothisend,weadd
anothercolumn,indicatingwhetherthemonthoftheindexdateis oddor
even:
In [43]: df['Odd_Even'] = ['Odd', 'Even', 'Odd', 'Even', 'Odd',
'Even', 'Odd', 'Even', 'Odd']
Thisadditionalinformationcannowbeusedforagroupingbasedontwo
columnssimultaneously:
In[44]: groups = df.groupby(['Quarter', 'Odd_Even'])
In [45]: groups.size()
Out[45]: Quarter
Q1 Odd_Even
Even 12
Q2 Even Odd 21
Q3 Even Odd 12
Odd
dtype: int64
In [46]: groups.mean()
Out[46]: Quarter Odd_Even No1 No2 No3 No4
Q1 Even Odd -0.788818
-0.446592 -0.985819
-0.343749 0.403796
0.555425 -1.753784
0.450190
Q2 Even Odd -0.955902
-1.763660 1.566050
-0.375469 0.822119
0.098678 0.581734
-1.553824
Q3 Even Odd -0.103058
0.749578 -0.366170
-0.135993 -0.478036
1.175944 -0.032810
-0.835890
objects.Subsequentsectionsapplythpandas
Thisconcludestheintroductioninto andtheuseof DataFrame
is toolsettoreal­worldfinancialdata.
FinancialData
TheWebtodayprovidesawealthoffinancialinformation forfree.Web
giantssuchasGoogleorYahoo!havecomprehensivefinancialdata
offerings.Althoughthequalityofthedatasometimesdoesnotfulfill of stock
professionalrequirements,forexamplewithregardtothehandling
splits, suchdatais wellsuitedtoil ustrate the“financialpower”of pandas.
Tothis end,wewillusethe pandas built­in function DataReaderto
retrievestockpricedatafromYahoo!Finance,analyzethedata,and
generatedifferentplotsofit.[25]Therequiredfunctioni s storedina
submoduleof pandas:
In [47]: import pandas.io.data as web
Atthetimeofthiswriting,
Yahoo!Finance(yahoo) pandas supportsthefollowingdatasources:
GoogleFinance(google)
St.LouisFED(fred)
KennethFrench’sdatalibrary(famafrench)
WorldBank(via pandas.io.wb)
WecanretrievestockpriceinformationfortheGermanDAXindex,for
example,fromYahoo!Financewithasinglelineofcode:
In [48]:DAX= web.DataReader(name='^GDAXI', data_source='yahoo',
start='2000-1-1')
DAX.info()
Out[48]: <class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 3760 entries, 2000-01-03 00:00:00 to 2014-09-
26 00:00:00Data columns (total 6 columns):
Open
High 3760
3760 non-null
non-null float64
float64
LowClose 3760 3760 non-null
non-null float64
float64
Volume 3760 non-null int64
Adj Close 3760 non-null float64
dtypes: float64(5), int64(1)
Table6­5presentstheparametersthatthe DataReader functiontakes.
ParameterTable6­5.ParametersofDataReaderfunction
Format Description
name String Nameofdataset—generally,thetickersymbol
data_source E.g.,“yahoo” Datasource
start String/datetime/None Leftboundaryofrange(default"2010/1/1“)
end String/datetime/None Rightboundaryofrange(defaulttoday)
The tail methodprovidesuswiththefivelast rowsofthedataset:
In [49]: DAX.tail()
Out[49]:
Adj CloseDate Open High Low Close Volume
9749.54 2014-09-22 9748.53 9812.77 9735.69 9749.54 73981000
9595.03 2014-09-23 9713.40 9719.66 9589.03 9595.03 88196000
9661.97 2014-09-24 9598.77 9669.45 9534.77 9661.97 85850600
9510.01 2014-09-25 9644.36 9718.11 9482.54 9510.01 97697000
9490.55 2014-09-26 9500.55 9545.34 9454.88 9490.55 83499600
Togetabetteroverviewoftheindex’shistory,aplotisagaingenerated
easilywiththe plot method(cf. Figure6­3):
In [50]: DAX['Close'].plot(figsize=(8, 5))

Figure6­3.HistoricalDAXindexlevels
Retrievingdataandvisualizingi
analyticstasksisanother.LikeNumPyt i s onething.Implementingmorecomplex
ndarrays,pandas allowsfor
asan
DataFrame
vectorizedmathematicaloperationsonwhole,andevencomplex,
example.Addingacolumnwiththerespectiveinformationcouldbe
objects.Takethelogreturnsbasedonthedailyclosingprices
column and then iterates overal indexestocalculatethesinglelogreturn
achievedwiththefollowingcode,whichf irst generatesanew,empty
valuesstepbystep:
In [51]: %%time
DAX['Ret_Loop']
in range(1,= len(DAX)):
0.0
for iDAX['Ret_Loop'][i] = np.log(DAX['Close'][i]
DAX['Close'][i -/1])
CPUtimes: 449userms452ms, sys: 12 ms, total: 464 ms
Out[51]: Walltime:
In [52]: DAX[['Close', 'Ret_Loop']].tail()
Out[52]: Date Close Ret_Loop
2014-09-22
2014-09-23 9749.54
9595.03 -0.005087
-0.015975
2014-09-24
2014-09-25 9661.97
9510.01 0.006952
-0.015853
2014-09-26 9490.55 -0.002048
Alternatively,youcanusevectorizedcodetoreachthesameresultwithout
looping.Tothisend,the shift methodisuseful;its h i f t s Series orwhole
DataFrame objectsrelativetot h e
accomplishourgoal,weneedtoshiftthe i r index,forwardaswell
Close as backward.
columnbyoneday,or To
moregenerally,oneindexposition:
In[53]: %timeDAX['Return'] = np.log(DAX['Close'] /
DAX['Close'].shift(1))
Out[53]: CPUWalltimes:
time: user
1.52 4msms, sys: 0 ns, total: 4 ms
In[54]: DAX[['Close', 'Ret_Loop', 'Return']].tail()
Out[54]: Date Close Ret_Loop Return
2014-09-22
2014-09-23 9749.54
9595.03 -0.005087
-0.015975 -0.005087
-0.015975
2014-09-24 9661.97
2014-09-25 9510.01
2014-09-26 0.006952
9490.55 -0.015853 0.006952
-0.002048 -0.002048
-0.015853
Thisnotonlyprovidesthesameresultswithmorecompactandreadable
code,butalsoi s themuchfasteralternative.
VECTORIZATION WITH DATAFRAMES
pandasDataFrame objectsasyouwouldwheneveryoucoulddosuch
Ingeneral,youcanusethesamevectorizationapproacheswith
anoperationwithtwo
data. NumPyndarray objectscontainingthesame
Onecolumnwiththelogreturndatai
deletetheotherone: s enoughforourpurposes,sowecan
In [55]: del DAX['Ret_Loop']
Nowl e t ushavealook at thenewlygeneratedreturndata.Figure6­4
iVolatilityclustering
l ustrates twostylizedfactsofequityreturns:
vVolatilityi
olatility. s notconstantovertime;thereareperiodsofhighvolatility
(bothhighlypositiveandnegativereturns)aswellasperiodsoflow
Leverage effect
Generally,v olatility andstockmarketreturnsarenegativelycorrelated;
whenmarketscomedownvolatilityrises,andviceversa.
Hereis thecodethatgeneratesthis plot:
In [56]: DAX[['Close', 'Return']].plot(subplots=True,
figsize=(8, 5))style='b',

Figure6­4.TheDAXindexanddailylogreturns
Whilev
(technical)stocktradersmightbemoreinterestedinmovingaverages,
olatility issomethingofparticularimportanceforoptionstraders,or
rolling_meanfunction ofpandas(thereareother“rolling” functionsas
so­calledtrends.Amovingaverageiseasilycalculatedwiththe
well,like rolling_max,rolling_min, and rolling_corr):
In [57]: DAX['42d']=
DAX['252d'] =pd.rolling_mean(DAX['Close'],
pd.rolling_mean(DAX['Close'],window=42)
window=252)
In [58]: DAX[['Close', '42d', '252d']].tail()
Out[58]: Date Close 42d 252d
2014-09-22 9749.54
2014-09-23 9595.03 9464.947143
9463.780952 9433.168651
9429.476468
2014-09-24
2014-09-25 9661.97 9465.300000 9437.122381
2014-09-26 9490.55 9459.425000 9440.479167
9510.01 9461.880476 9443.769008
Atypicalstockpricechartwiththetwotrendsincludedthenlookslike
Figure6­5:
In [59]: DAX[['Close', '42d', '252d']].plot(figsize=(8, 5))
Figure6­5.The DAX indexandmovingaverages
Returningtothemoreoptionstrader­likeperspective,themovinghistorical
standarddeviationofthelogreturns—i.e.themovinghistoricalvolatility—
mightbemoreofinterest:
In [60]: import math = pd.rolling_std(DAX['Return'],
DAX['Mov_Vol']
# moving annual volatility window=252)* math.sqrt(252)
Figure6­6furthersupportsthehypothesisoftheleverageeffectbyclearly
showingthatthehistoricalmovingvolatility tendsistoe: increasewhen
marketscomedown,andtodecreasewhentheyr
Instyle='b',
[61]: DAX[['Close', 'Mov_Vol', 'Return']].plot(subplots=True,
figsize=(8, 7))

Figure6­6.TheDAXindexandmoving,annualizedvolatility
RegressionAnalysis
Theprevioussectionintroducestheleverageeffectasastylizedfactof
equitymarketreturns.Sofar, thesupportthatweprovidedis basedonthe
suchanalysisonamoreformal,statisticalground.The
inspectionoffinancialdataplotsonly.Using pandas, wecanalsobase
simplestapproachis
Intouse( l i n e a r ) ordinaryleast­squaresregression(OLS).
whatfollows,theanalysisusestwodifferentdatasetsavailableonthe
Web:
EUROSTOXX 50
HistoricaldailyclosingvaluesoftheEURO
composedofEuropeanblue­chipstocks STOXX50index,
VSTOX Historical dailyclosingdatafortheVSTOXXvolatility index,
calculatedonthebasisofvolatilities impliedbyoptionsontheEURO
STOXX50index
relatetoexpectationswith regardto thefuturevolatility development,
Itisnoteworthythatwenow(indirectly)useimpliedvolatilities,which
whilethepreviousDAXanalysisusedhistoricalv
dEurex.
etails, seethe“VSTOXXAdvancedServices”tuotloartialipagesprovidedby
ty measures.For
Webeginwithafewimports:
In [62]: import pandasimport
from urllib as pdurlretrieve
Fortheanalysis,weretrievef
called data.If isno i l e s fromtheWebandsavetheminafolder
create
irst via mkdir data. We proceedbyretrievingthemostcurrentavailableone
there suchfolderalready,youmightwantto
finformationwithregardtobothindices:
In'http://www.stoxx.com/download/historical_values/hbrbcpe.txt'
[63]: es_url =
vs_url =
'http://www.stoxx.com/download/historical_values/h_vstoxx.txt'
urlretrieve(es_url, './data/es.txt')
urlretrieve(vs_url,
!ls# Windows:
-o ./data/*.txt './data/vs.txt')
use dir
Out[63]: -rw-------
-rw------- 11 yhilpisch 0 SepSep 2828 11:14
641180 11:14 ./data/es50.txt
-rw------- 1 yhilpisch 330564 Sep 28 11:14 ./data/es.txt
yhilpisch ./data/vs.txt
ReadingtheEUROSTOXX50datadirectlywith pandasis notthebest
structurefortheimport.Twoissueshavetobeaddressed,relatingtothe
routeint
headerandthestructure:
his case.Alittledatacleaningbeforehandwillgiveabetterdata
import.
Thereareacoupleofadditionalheaderlinesthatwedonotneedforthe
FromDecember27,2001onwards,thedata
additionalsemicolona set “suddenly”has an
t the endofeach datarow.
Thefollowingcode reads thewholedatasetandremovesal blanks:[26]
In [64]: lines
lines ==[line.replace('','')
open('./data/es.txt', 'r').readlines()
for lineinlines]
Withregardtotheheader,wecaninspecti
coupleoflinesof the downloadeddataset:t easilybyprintingthefirst
In [65]: lines[:6]
Out[65]: ['PriceIndices-EUROCurrency\n',
Chip;Broad\n','Date;Blue-Chip;Blue-Chip;Broad;Broad;ExUK;ExEuroZone;Blue-
';Europe;Euro-Zone;Europe;Euro-Zone;;;Nordic;Nordic\n',
'31.12.1986;775.00;900.82;82.76;98.58;98.06;69.06;645.26;65.56\n',
';SX5P;SX5E;SXXP;SXXE;SXXF;SXXA;DK5F;DKXF\n',
'01.01.1987;775.00;900.82;82.76;98.58;98.06;69.06;645.26;65.56\n']
Theabove­mentionedformatchangecanbeseenbetweenlines3,883and
3,990ofthef ile. FromDecember27,theresuddenlyappearsanadditional
semicolonat theendofeachdatarow:
In [66]: for line in lines[3883:3890]:
print line[41:],
Out[66]: 317.10;267.23;5268.36;363.19
322.55;272.18;5360.52;370.94
322.69;272.95;5360.52;370.94
327.57;277.68;5479.59;378.69;
329.94;278.87;5585.35;386.99;
326.77;272.38;5522.25;380.09;
332.62;277.08;5722.57;396.12;
Tomakethedataseteasiertoimport,wedothefollowing:
1. Generateanewtextfile.
2. Deleteunneededheaderlines.
3. Writeanappropriatenewheaderlinetothenewfile.
4. Addahelpercolumn, DEL (tocatchthetrailing semicolons).
Withtheseadjustments,thedatasetcanbe
5. Writealldatarowstothenewfile.
importedandthehelpercolumn
deletedafter theimport.Butfirst, thecleaningcode:
# opens= open('./data/es50.txt',
In [67]: new_file a new file 'w')
new_file.writelines('date'
+';DEL'+ lines[3][:-1]
+ lines[3][-1])
##aswritesfirstline
the corrected
ofnew third
file line of the original file
new_file.writelines(lines[4:])
# writesthe remaining lines of the orignial file
new_file.close()
Letusseehowthenewheaderlooks:
In [68]: new_lines
new_lines[:5]= open('./data/es50.txt', 'r').readlines()
Out[68]: ['date;SX5P;SX5E;SXXP;SXXE;SXXF;SXXA;DK5F;DKXF;DEL\n',
'01.01.1987;775.00;900.82;82.76;98.58;98.06;69.06;645.26;65.56\n',
'31.12.1986;775.00;900.82;82.76;98.58;98.06;69.06;645.26;65.56\n',
'02.01.1987;770.89;891.78;82.57;97.80;97.43;69.37;647.62;65.81\n',
'05.01.1987;771.89;898.33;82.82;98.60;98.19;69.16;649.94;65.82\n']
t looksappropriatefortheimportwiththe read_csv functionof pandas,
Isowecontinue:
In [69]: es = pd.read_csv('./data/es50.txt', index_col=0,
parse_dates=True, sep=';', dayfirst=True)
In [70]: np.round(es.tail())
Out[70]:
DEL SX5P SX5E SXXP SXXE SXXF SXXA DK5F DKXF
date
2014-09-22 3096 3257 347 326 403 357 9703 565
NaN 2014-09-23 3058 3206 342 321 398 353 9602 558
NaN 2014-09-24 3086 3244 344 323 401 355 9629 560
NaN 2014-09-25 3059 3202 341 320 397 353 9538 556
NaN 2014-09-26 3064 3220 342 321 398 353 9559 557
NaN
Thehelpercolumnhasfulfil ed its purposeandcannowbedeleted:
In [71]: deles.info()
es['DEL']
Out[71]: <class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 7153 entries, 1986-12-31 00:00:00to 2014-09
26 00:00:00Data columns (total 8 columns):
SX5P
SX5E 7153
7153 non-null
non-null float64
float64
SXXP
SXXE 7153
7153 non-null
non-null float64
float64
SXXF
SXXA 7153
7153 non-null
non-null float64
float64
DK5F 7153 non-null float64
dtypes: float64(8) float64
DKXF 7153 non-null
EquippedwiththeknowledgeaboutthestructureoftheEUROSTOXX50
can
dataset,we f icient: read_csv
alsousetheadvancedcapabilitiesofthe
functiontomaketheimportmorecompactande
In [72]: cols = ['SX5P', 'SX5E', 'SXXP', 'SXXE', 'SXXF',
'SXXA', 'DK5F', 'DKXF']
es = pd.read_csv(es_url, index_col=0, parse_dates=True,
sep=';',
skiprows=4, names=cols) header=None,
dayfirst=True,
In [73]: es.tail()
Out[73]:
DK5F SX5P SX5E SXXP SXXE SXXF SXXA
DKXF 3096.02 3257.48 346.69 325.68 403.16 357.08
2014-09-22
9703.33 564.81
9602.32 2014-09-23
558.35
3057.89 3205.93 341.89 320.72 397.96 352.56
9628.84 2014-09-24
559.83
3086.12 3244.01 344.35 323.42 400.58 354.72
2014-09-25 3059.01 3202.31 341.44 319.77 396.90 352.58
9537.95 555.51
2014-09-26 3063.71 3219.58 342.30 321.39 398.33 352.71
9558.51 556.57
Fortunately,theVSTOXXdataseti s alreadyinaformsuchthati
importedabitmoreeasilyintoa DataFrame object: t canbe
In [74]: vs = pd.read_csv('./data/vs.txt',
parse_dates=True,index_col=0,
sep=',', header=2,
dayfirst=True)
vs.info()
Out[74]: <class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 4010 entries, 1999-01-04 00:00:00 to 2014-09-
26 00:00:00
Data
V2TX columns
4010 (total9
non-null columns):
float64
V6I1
V6I2 3591
4010 non-null
non-null float64
float64
V6I3
V6I4 3960
4010 non-null
non-null float64
float64
V6I5
V6I6 4010
3995 non-null
non-null float64
float64
V6I7
V6I8 4010 non-null float64
dtypes: float64(9) float64
3999non-null
Table6­6containstheparametersoft h i s importantimportfunction.There
areamultitudeofparameters,themajorityofwhichdefaultto None;
object, ofcourse,isnondefaultandhastobespecifiedinanycase.
Table6­6.Parameters of read_csvfunction
Parameter Format Description File path, URL, orothersource
object String
sep String,default "," Delimitertouse
lineterminator String(one
character) Stringforlinebreaks
quotechar String Characterforquotes
quoting Integer Controlsrecognitionofquotes
escapechar String Stringforescaping
dtpye dtype/dict dictofdtype(s)for column(s)
compression "gzip"/"bz2" Fordecompressionofdata
dialect String/csv.Dialect CSVdialect, default Excel
header Integer Numberofheaderrows
skiprows Integer Numberofrowstoskip
index_col Integer Numberofindexcolumns(sequencefor
multi­index)
names Array­like Columnnamesif noheaderrows
prefix String Stringtoaddtocolumnnumbersifnoheader
names
na_values List/dict Valuestoconsideras
Additionalstringstorecognizeas
True NA,NaN
true_values List
false_values List Values to consideras False
keep_default_na Boolean,default If True,NaN is addedto na_values
True
parse_dates Boolean/list, Whethertoparsedatesinindexcolumnsor
defaultFalse multiplecolumns
keep_date_col Boolean,default
False Keepsoriginaldatecolumns
dayfirst Boolean,default
False ForEuropeandateconventionDD/MM
thousands String Thousandsoperator
comment String Restoflineascomment(nottobeparsed)
decimal String String to indicatedecimal,e.g., "." or ","
nrows Integer Numberofrowsoffiletoread
iterator Boolean,default
False Return TextFileReader object
chunksize Integer Return TextFileReader objectforiteration
skipfooter Integer Numberoflinestoskipat bottom
converters Dictionary Functiontoconvert/translatecolumndata
verbose Boolean,default
False Reportnumberof
columns NA valuesinnonnumeric
delimiter String Alternativeto
expressions sep, cancontainregular
encoding String Encodingtouse,e.g., "UTF-8"
squeeze Boolean,default
False Returnone­columndatasetsas Series
na_filter Boolean,default
False Detect missingvaluemarkersautomatically
usecols Array­like Selectionofcolumnstouse
mangle_dupe_cols Boolean,default
False Nameduplicatecolumnsdifferently
tupleize_cols Boolean,default
False Leavealist oftuplesoncolumnsasis
Toimplementtheregressionanalysis,weonlyneedonecolumnfromeach
of interest, namelythoseforthemajorindexes.
dataset.Wethereforegenerateanew
combinethetwocolumns DataFrame objectwithinwhichwe
SinceVSTOXXdataisonlyavailablefromthebeginningofJanuary1999,
weonlytakedatafromthatdateon:
In [75]: import
data = datetimeasdt
pd.DataFrame({'EUROSTOXX' : > dt.datetime(1999,
es['SX5E'][es.index
1, 1)]}) data = data.join(pd.DataFrame({'VSTOXX' :
1, 1)]})) vs['V2TX'][vs.index > dt.datetime(1999,
sWealsofillmissingvalueswiththel
emethod ast availablevaluesfromthetime
al the fillna method,providing
rwhichwouldhoweverleadtoa“foresight”issue:
ies. Wecparameter.Anotheroptionwouldbe ffill (forforwardfill)asthe
bfill (forbackwardfill),
In[76]: data = data.fillna(method='ffill')
data.info()
Out[76]: <class 'pandas.core.frame.DataFrame'>
entries,
26 00:00:00Data columns (total 2 columns):1999-01-04 00:00:00 to 2014-09-
DatetimeIndex: 4034
EUROSTOXX
VSTOXX 4034 non-nullfloat64
dtypes: float64(2)non-null float64
4034
In [77]: data.tail()
Out[77]: 2014-09-22 EUROSTOXX
3257.48 VSTOXX
15.8303
2014-09-23
2014-09-24 3205.93
3244.01 17.7684
15.9504
2014-09-25
2014-09-26 3219.58 17.5658
3202.31 17.6012
Again,agraphicalrepresentationofthenewdatasetmightprovidesome
insights.Indeed,asFigure6­7shows,thereseemstobeanegative
correlationbetweenthetwoindexes:
6))In [78]: data.plot(subplots=True, grid=True, style='b', figsize=(8,

Figure6­7.TheEUROSTOXX50indexandtheVSTOXXvolatility index
However,toputt
graphically: h i s onmoreformalground,wewanttoworkagainwith
thelogreturnsofthetwofinancialtimeseries. Figure6­8showsthese
In [79]: rets = np.log(data / data.shift(1))
rets.head()
Out[79]: 1999-01-04 EUROSTOXXNaN VSTOXXNaN
1999-01-05
1999-01-06 0.017228
0.022138 0.489248
-0.165317
1999-01-07
1999-01-08 -0.015723
-0.003120 0.256337
0.021570
6))In [80]: rets.plot(subplots=True, grid=True, style='b', figsize=(8,

Figure6­8.LogreturnsofEUROSTOXX50andVSTOXX
Wehaveeverythingtogethertoimplementtheregressionanalysis.Inwhat
follows,theEUROSTOXX50returnsaretakenastheindependent
variablewhiletheVSTOXXreturnsaretakenasthedependentvariable:
In [81]: model
xdat ==rets['EUROSTOXX']
ydat rets['VSTOXX']
model = pd.ols(y=ydat, x=xdat)
Out[81]:
------------------------------------Summary of Regression Analysis-----
---------
Formula: Y ~ <x> + <intercept>
Number
Number ofof Observations:
Degrees of Freedom: 4033
2
R-squared:
Adj R-squared: 0.53210.5322
Rmse: 0.0389
F-stat (1, 4031): 4586.3942, p-value: 0.0000
Degrees of Freedom: model 1, resid 4031
-----------------------Summary of Estimated Coefficients----
--------------------
CI 2.5% CI 97.5%Variable Coef Std Err t-stat p-value
-----------------------------------------------------------------------
--------- x -2.7529 0.0406 -67.72 0.0000
-2.8326
-2.6732
intercept -0.0001 0.0006 -0.12 0.9043
-0.0013 0.0011
--------------------------------------------End
---------
of Summary-------------
Obviously,therei
resultsasfollows:s indeedahighlynegativecorrelation.Wecanaccessthe
In [82]: model.beta
Out[82]: xintercept -2.752894
dtype: float64-0.000074
Thisinput,incombinationwiththe raw logreturndata,isusedtogenerate
theplotinFigure6­9,whichprovidesstrongsupportfortheleverage
ef ect:
In [83]: plt.plot(xdat,
axx ==plt.axis() ydat,# grabaxis
'r.') values
np.linspace(ax[0], ax[1]+ model.beta[0]
plt.plot(x,model.beta[1] + 0.01) * x, 'b', lw=2)
plt.axis('tight')
plt.grid(True)
plt.xlabel('EURO
plt.ylabel('VSTOXXSTOXXreturns')
50 returns')
Figure6­9.Scatterplotoflogreturnsandregressionline
Asafinal cross­check,we can calculatethecorrelationbetweenthetwo
financialtimeseriesdirectly:
In [84]: rets.corr()
Out[84]: EUROSTOXX EUROSTOXX
1.000000 VSTOXX
-0.729538
VSTOXX -0.729538 1.000000
Althoughthecorrelationi
considerablyovertime, assshowninFigure6­10.Thefigureuses
stronglynegativeonthewholedatas
correlationonayearlybasis,i .e., for252tradingdays: et, it varies
In [85]: pd.rolling_corr(rets['EUROSTOXX'], rets['VSTOXX'],
window=252).plot(grid=True, style='b')

Figure6­10.RollingcorrelationbetweenEUROSTOXX50andVSTOXX
High­FrequencyData
Bynow,youshouldhaveafeelingforthestrengthsof pandas whenit
burdenforsomemarketplayers:high­frequencydata.Thisbriefsection
prevalentinthefinancialanalyticssphereandrepresentsquiteahigh
comestofinancialtimeseriesdata.Oneaspectinthisregardhasbecome
l ustrates howtocopewithtickdatainsteadofdailyfinancialdata. To
ibeginwith,acoupleofimports:
In [86]: import
import numpy
pandasasasnppd
importurllib
from datetimeimportas dturlretrieve
%matplotlib inline
TheNorwegianonlinebrokerNetfondsprovidestickdataforamultitudeof
stocks,inparticularforAmericannames.Theweb­basedAPIhasbasically
thefollowingformat:
In [87]: url1
url2 ='http://hopey.netfonds.no/posdump.php?'
url ==url1'date=%s%s%s&paper=AAPL.O&csv_format=csv'
+ url2
Wewanttodownload,combine,andanalyzeaweek’sworthoftickdata
fortheAppleInc.stock,aquiteactivelytradedname.Letusstart withthe
datesofinterest:[27]
In[88]: year
month ='2014'
='09'might'23', '24',
=['22',
days#dates need to be'25']
updated
In [89]: AAPL
for dayAAPL= pd.DataFrame()
indays:
parse_dates=True))= AAPL.append(pd.read_csv(url % (year, month, day),
index_col=0,header=0,
AAPL.columns = ['bid', 'bdepth', 'bdeptht',
# shorter colummn names 'odeptht']
'offer', 'odepth',
Thedatasetnowconsistsofalmost100,000rows:
In [90]: AAPL.info()
Out[90]: <class 'pandas.core.frame.DataFrame'>
DatetimeIndex:
09-25 22:19:25 95871 entries, 2014-09-22 10:00:01 to 2014-
Data
bidbdepthcolumns95871
(totalnon-null
6 columns):
float64
bdeptht 95871
95871 non-null
non-null float64
float64
offer
odepth 95871
95871 non-null
non-null float64
float64
odeptht 95871
dtypes: float64(6) float64
non-null
Figure6­11showsthebidcolumnsgraphically.Onecanidentifyanumber
closed:
ofperiodswithoutanytradingactivity—i.e.,timeswhenthemarketsare
In[91]: AAPL['bid'].plot()
Overthecourseofasingletradingdaywhenmarkets
courseusuallyahighactivityl are open,there i s of
forthefirst dayinthesample andcanevelthreehoursofthet
Norwegiantimezoneandyou .seeeasilywhenpre­tradings
Figure6­12showsthetradingactivity
hird. Timesareforthe
tarts, when
USstockmarketsareopen,andwhentheyclose:
In [92]: to_plot(AAPL.index
= AAPL[['bid',
> 'bdeptht']][ 9, 22, 0, 0))
dt.datetime(2014,
&#adjust
(AAPL.index set 9, 23,figsize=(8,
dates<todt.datetime(2014,
givendatastyle='b',
to_plot.plot(subplots=True, 2, 59))]5))
Figure6­11.Applestocktick dataforaweek
Figure6­12.Applestocktick dataandvolumeforatradingday
Usually,financialtickdataseriesleadtoa DatetimeIndex thati s highly
tInirregular.Inotherwords,timeintervalsbetweentwoobservationpointsare
highlyheterogeneous.Againstt
hiwhatfollows,wesimplytakethe mean fortheresamplingprocedure;
pandasprovides amethodforthispurposefortheDataFrameobject.
setsmightsometimesbeusefuloreveninorderdependingonthetaskat
hand. s mightbeconsistentforsomecolumns(
his background,aresamplingofsuchdata
e.g., “bid”)butnotforothers
(e.g., “bdepth”):
In [93]: np.round(AAPL_resam.head(),
AAPL_resam = AAPL.resample(rule='5min',
2) how='mean')
Out[93]:
odeptht bid bdepth bdeptht offer odepth
2014-09-22 10:00:00 100.49
200 2014-09-22 10:05:00 100.49 366.67 366.67 100.95 200
200 2014-09-22 10:10:00 100.54 100.00 100.00 100.84 200
100 2014-09-22 10:15:00 100.59 150.00 150.00 100.74 100
1500 2014-09-22 10:20:00 100.50 200.00 200.00 100.75 1500
100.00 100.00 100.75 1500
1500
filledemptytimeintervalswiththemostrecentavailablevalues(beforethe
TheresultingplotinFigure6­13looksab it smoother.Here,wehavealso
emptytimeinterval):
In[94]: AAPL_resam['bid'].fillna(method='ffill').plot()
Figure6­13.ResampledApplestocktick data
Toconcludet his section,weapplyacustom­defined Python functionto
ournewdataset.Thefunctionwechooseisarbitraryanddoesnotmake
anyeconomicsensehere;itj ust mirrorsthestockperformancea t acertain
stockpricelevel(compareFigure6­14toFigure6­13):
In [95]: def reversal(x):
return 2* 95 - x
InAAPL_resam['bid'].fillna(method='ffill').apply(reversal).plot()
[96]:
Finally,let’scleanupdiskspacebyerasingalldatasetssavedtodisk:
In [97]: !rm# ./data/*
Windows: del /data/*

Figure6­14.ResampledApplestocktick datawithfunctionappliedtoit
Conclusions
Financialtimeseriesdatais oneofthemostcommonandimportantforms
classof
iofdatainfinance.Thelibrary
t comestoworkingwithsuchdatasets.Modeled after the data.frame
R,thepandasDataFramepandasclassprovidesawealthofattributesand
isgenerallythetoolofchoicewhen
methods to attackalmost anys anotherbenefitofusing
mightface.Conveniencei
mightbeabletogeneratethesameresultbyusing pandas:and/orevenif you
NumPy
kindof(financial)analyticsproblemyou
matplotlib only, pandas generallyhassomeneatshortcutsbasedona
Inaddition, pandasmakes itreallyeasytoretrievedatafromavariety of
powerfulandflexibleAPI.
matplotlib, it automatesthemanagementoffinancialtimeseriesdataNumPyinor
websources,likeYahoo!FinanceorGoogle.Comparedto“pure”
manyrespects and alsoprovideshigherf l
combiningdatasetsandenlargingexistingones.e x i b i l i t y wheni t comesto
FurtherReading
pandas
Atthetimeofthiswriting,thedefinitiveresourceinprintedformfor
is thebookbythemainauthorofthelibrary:
McKinney,Wes(2012):DataAnalysiswithPython.O’Reilly,
Sebastopol,CA.
Ofcourse,theWeb—especiallythewebsiteof
awealthofinformation: pandas itself—alsoprovides
Again,i t i s goodtos t a
http://pandas.pydata.org. r t onthehome page ofthelibrary:
Therei s rathercomprehensiveonlinedocumentationavailableat
http://pandas.pydata.org/pandas­docs/stable/.
Thedocumentationin
muchfunctionality PDF formatwith1,500+pagesi
pandas hastooffer: l u s t r a t e s how
http://pandas.pydata.org/pandas­docs/stable/pandas.pdf.
[24] Consideringonlydailyclosingprices,youhaveapproximately30×252=7,560
closingpricesforasinglestockoveraperiodof30years.I
havemorethan10,000daily(bid/ask)ticksforasinglestock.t is notuncommonto
[25] Forasimilarexampleusingmatplotlibonly,seeChapter5.
[26] SeeChapter7formoreinformationoninput­outputoperationswith
[27]Note Python.
thatthedataprovideronlyprovidesthis typeofdataforacoupleofdays
backfromthecurrentdate.Therefore,youmightneedtousedifferent(
current)datestoimplementthesameexample. i.e., more
Chapter7.
OperationsInput/Output
It is acapitalmistaketotheorizebeforeonehasdata.
—SherlockHolmes
Asageneralr
otherapplicationarea,isstoredonharddiskdrives(HDDs)orsomeother
formofpermanentstoragedevice,likesolids
ule, themajorityofdata,beit inafinancecontextorany
tate disks(SSDs)orhybrid
whilecostsperstorageunit(
diskdrives.Storagecapacitieshave been steadilyincreasingovertheyears,
e.g., megabytes)havebeensteadilyf al ing.
Atthesametime,storeddatavolumeshave access been increasingatamuch
diskforpermanentstorage,butalsotocompensateforlackofsufficient
inthelargestmachines.Thismakesi
fasterpacethanthetypicalrandom t necessarynotonlytostoredatato
memory(RAM)availableeven
RAMbyswappingdatafromRAMtodiskandback.
Input/output(I/O)operationsarethereforegenerallyveryimportanttasks
whenitcomestofinanceapplicationsanddata­intensiveapplicationsin
computations,sinceI/Ooperationscannotingeneralshuffledatafast
general.Oftentheyrepresentthebottleneckforperformance­critical
enoughtotheRAM[28]andfromthe RAM tothedisk.Inasense,CPUsare
Althoughthemajorityoftoday’sfinancialand
often“starving”duetoslowI/Ooperations.
areconfrontedwith“big”data( e . g . , corporateanalyticsefforts
ofpetascales ize), singleanalytics
tasksgenerallyusedata(sub)setsthatfallinthe“mid”datacategory.A
recentstudyconcluded:
Ourmeasurementsaswellasotherrecentworkshowsthatthemajorityof
real­worldanalyticjobsprocesslessthan100GBofinput,butpopular
infrastructuressuch asHadoop/MapReducewereoriginallydesignedfor
petascaleprocessing.
—Appuswamyetal. (2013)
Intermsoffrequency,singlefinancialanalyticstasksgenerallyprocess
dataofnotmorethanacoupleofgigabytes(GB)insize—andt h i s i s a
sweetspotfor Python andthel i b r a r i e
pandas, andPyTables.Datasetsofsuchasizecanalsobeanalyzedin­
memory,leadingtogenerallyhighspeedswithtoday’sCPUsandGPUs. s ofi t s s c i e n t i f i c stack,like NumPy,
met.However,thedatahastobereadintoRAMandtheresultshavetobe
writtentodisk,meanwhileensuringtoday’sperformancerequirementsare
Thischapteraddressesthefollowing areas:
BasicI/O
Python hasb u i l t ­ i n functions to serializeandstoreanyobject on disk
andtoreaditfromdiskintoRAM;apartfromt
wheni t comestoworkingwithtextf i l e s and SQL , Python iNumPy
hatdatabases. s strongalso
providesdedicatedfunctionsforf
objects. ast storageandretrievalof ndarray
I/Owith pandaslibraryprovidesaplentitudeofconveniencefunctionsand
pandas
Themethodsto read datastored in differentformats(e.g.,CSV, JSON)and to
write datatofilesindiverseformats.
I/OwithPyTablesPyTables
usesthe HDF5 standardtoaccomplishf ast I/Ooperationsfor
largedatasets;speedofteni s onlyboundbythehardwareused.
BasicI/OwithPython
Pythonitself comeswithamultitudeofI/Ocapabilites,someoptimizedfor
performance,othersmoreforf l e x i b i l i t y . Ingeneral,however,theyare
easilyusedininteractiveaswellasinlarge­scaledeploymentsettings.
WritingObjectstoDisk
Forl a t e r use,fordocumentation, or forsharingwithothers,onemightwant
tostore Python objectsondisk.Oneoptioni
Thismodulecanserializethemajorityof s tousethe
Python pickle module.
objects.Serialization
referstotheconversionofanobject(hierarchy)toabytestream;
deserializationi s theoppositeoperation.Intheexamplethatfollows,we
[1]:path ='/flash/data/' his timestoredina listobject:
workagainwith(pseudo)randomdata,t
In
In [2]: import numpy as np
import
from random gauss
In [3]: a =#generation
[gauss(1.5,of2)normally
for i indistributedrandoms
range(1000000)]
Thetasknowi s task: his list objecttodiskforlater retrieval. pickle
accomplishestshitowritet
In [4]: import pickle
In [5]: pkl_file
##Note: = open(path
open file for + 'data.pkl', 'w')
writing
existing file mightbe overwritten
Thetwomajorfunctionsweneedare
forloadingthemintothememory: dump, forwritingobjects,and load,
In [6]: %time pickle.dump(a, pkl_file)
Out[6]: CPUWalltimes:
time: user
4.36 4.3s s, sys: 43 ms, total: 4.35 s
In [7]: pkl_file
Out[7]: <open file '/flash/data/data.pkl', mode 'w' at 0x3df0540>
In [8]:pkl_file.close()
1,000,000 floats takesabout 20 megabytes
Wecannowinspectthesizeofthef (MB) ofdiskspace:
ile ondisk.The list objectwith
In [9]: ll $path*
Out[9]: -rw-r--r-- 1 root 20970325 28. Sep 15:16 /flash/data/data.pkl
Nowthatwehavedataondisk,wecanreadi
pickle.load: t intomemoryvia
[10]: pkl_file = open(path + 'data.pkl', 'r') # open file for
Inreading
In [11]: %time b = pickle.load(pkl_file)
Out[11]: CPUWalltimes:
time: user
3.39 3.37
s s,sys:18 ms, total: 3.38 s
In [12]: b[:5]
Out[12]: [-3.6459230447943165,
1.4637510875573307,
2.5483218463404067,
0.9822259685028746,
3.594915396586916]
Letuscomparethis withthefirst five floats oftheoriginalobject:
In [13]: a[:5]
Out[13]: [-3.6459230447943165,
1.4637510875573307,
2.5483218463404067,
0.9822259685028746,
3.594915396586916]
function allclose: aandb areindeedthesame, NumPy providesthe
Toensurethatobjects
In [14]: np.allclose(np.array(a), np.array(b))
Out[14]: True
Inprinciple,t his is thesameascalculatingthedifferenceoftwo ndarray
objectsandcheckingwhetheritis0:
In [15]: np.sum(np.array(a) - np.array(b))
Out[15]: 0.0
However,
i s allclose takesasaparameteratolerancelevel,whichby
Storingandretrievingasingle
default setto1e-5. objectwith pickle obviouslyis quite
simple.Whatabouttwoobjects?
[16]: pkl_file = open(path + 'data.pkl', 'w') # open file for
Inwriting
In [17]: %time pickle.dump(np.array(a), pkl_file)
Out[17]: CPUWalltimes:
time: user
846 ms799 ms, sys: 47 ms, total: 846 ms
In[18]: %time pickle.dump(np.array(a) ** 2, pkl_file)
Out[18]: Wall time: 784userms742 ms, sys: 41 ms, total: 783 ms
CPU times:
In [19]: pkl_file.close()
In [20]: ll $path*
Out[20]: -rw-r--r-- 1 root 44098737 28. Sep 15:16
/flash/data/data.pkl
Whathashappened?Mainlythefollowing:
Wehavewrittenanndarrayversionoftheoriginalobjecttodisk.
Wehavealsowrittenasquared
file. ndarray versiontodisk,intothesame
Bothoperationswerefasterthantheoriginaloperation(duetotheuse
ndarray objects).
ofThefileisapproximatelydoublethesizeasbefore,sincewehave
storeddoubletheamountofdata.
Letusreadthetwo ndarray objectsbackintomemory:
Inreading
[21]: pkl_file = open(path + 'data.pkl', 'r') # open file for
pickle.load doesthejob.However,noticethati
ndarray object: t onlyreturnsasingle
In [22]: xx = pickle.load(pkl_file)
Out[22]:
2.87048515,array([-3.64592304, 1.46375109, 2.54832185, ...,
0.66186994, -1.38532837])
Calling pickle.load forthesecondtimereturnsthesecondobject:
In [23]: yy = pickle.load(pkl_file)
Out[23]: array([ 13.29275485, 2.14256725, 6.49394423, ...,
8.23968501, 0.43807181, 1.9191347])
In [24]: pkl_file.close()
Obviously, pickle storesobjectsaccordingtothef i r s t
principle.Thereisonemajorproblemwiththis:thereis nometa­ i n , f i r s t out(FIFO)
informationavailabletotheusertoknowbeforehandwhati
pickle s storedina
buta dictfile.Asometimeshelpfulworkaroundistonotstoresingleobjects,
objectcontainingal theotherobjects:
Inwriting
[25]: pkl_file = open(path + 'data.pkl', 'w') # open file for
pickle.dump({'x' : x, 'y' : y}, pkl_file)
pkl_file.close()
forexample,toiterateoverthe dict object’skeyvalues:
Usingthisapproachallowsustoreadthewholesetofobjectsatonceand,
writing pkl_file =open(path + 'data.pkl', 'r') # openfile for
In[26]:
data = pickle.load(pkl_file)
pkl_file.close()
for keyprintinkey,data.keys():
data[key][:4]
Out[26]: yx [[-3.64592304
13.29275485 1.46375109
2.142567252.54832185
6.493944230.98222597]
0.96476785]
In [27]: !rm -f $path*
Thisapproach,however,requiresustowriteandreada
giventhemuchhigherconvenience i t l objectsa
Thisisacompromiseonecanprobablylivewithinmanycircumstances
bringsalong. t once.
ReadingandWritingTextFiles
TextprocessingcanbeconsideredastrengthofPython.Inf
corporateandscientificusersuse
youhaveamultitudeof Python a c t
forexactlythistask.With, many
Python
withtextf i l e s optionstoworkwith string objects,aswellas
ingeneral.
Supposewehavegeneratedquitealargesetofdatathatwewanttosave
andshareasacomma­separatedvalue(CSV)file.Although theyhavea
specialstructure,suchfiles arebasicallyplaintextfiles:
=5000
In [28]: arows= np.random.standard_normal((rows, 5)) # dummy data
In [29]: a.round(4)
Out[29]: array([[[0.151.381 ,, -1.1236,
0.967 , 1.0622,
1.8391, -1.3997,
0.5633, -0.7374],
0.0569],
[-0.9504,
..., 0.4779, 1.8636, -1.9152, -0.3005],
0.8843, -1.3932,
[[-1.0352, -0.0506,
1.0368, 0.4562, 0.2717, -1.4921],
[0.9952, -0.6398, 0.8467, -0.0667,
-1.6951, -1.3391],
1.122]])
Tomakethecaseab
mixandusethe i t morer e a l i s t i c , weadddate­timeinformationtothe
pandasdate_range functiontogenerateaseriesofhourly
date­timepoints(fordetails,seeChapter6andAppendixC):
In [30]: import pandas as pd
t= #pd.date_range(start='2014/1/1',
setof hourly datetime objectsperiods=rows, freq='H')
In [31]: t
Out[31]: <class 'pandas.tseries.index.DatetimeIndex'>
[2014-01-01
Length: 5000,00:00:00,
Freq: H,...,Timezone:
2014-07-28
None 07:00:00]
Towritethedata,weneedtoopenanew file objectondisk:
[32]: csv_file = open(path + 'data.csv', 'w') # open file for
Inwriting
Thef i r s t lineofa CSV f i l e generallycontainsthenamesforeachdata
columnstoredinthefile, sowewritethis first:
In [33]: header = 'date,no1,no2,no3,no4,no5\n'
csv_file.write(header)
Theactualdataisthenwritten rowby row,mergingthedate­time
informationwiththe(pseudo)randomnumbers:
In [34]: for t_,s = (no1, no2, no3, no4,
'%s,%f,%f,%f,%f,%f\n'no5)
% in
(t_, zip(t,
no1, a):no3, no4,
no2,
no5) csv_file.write(s)
csv_file.close()
In [35]: ll $path*
Out[35]: -rw-r--r-- 1 root 337664 28. Sep 15:16 /flash/data/data.csv
Theotherwayaroundworksquitesimilarly.F
CSVfile. Second,read i t s i r s t
contentlinebylineusingthe , openthenow­existing
readline methodof
the file object:
[36]: csv_file = open(path + 'data.csv', 'r') # open file for
Inreading
In [37]: for iin
printrange(5):
csv_file.readline(),
Out[37]: date,no1,no2,no3,no4,no5
00:00:00,1.381035,-1.123613,1.062245,-1.399746,-0.737369
2014-01-01
01:00:00,0.149965,0.966987,1.839130,0.563322,0.056906
2014-01-01
2014-01-01
02:00:00,-0.950360,0.477881,1.863646,-1.915203,-0.300522
2014-01-01
03:00:00,-0.503429,-0.895489,-0.240227,-0.327176,0.123498
You can alsoreadallthecontent at oncebyusingthe readlines method:
In [38]: csv_file
content = open(path + 'data.csv', 'r')
=csv_file.readlines()
for line
printinline,content[:5]:
Out[38]: date,no1,no2,no3,no4,no5
00:00:00,1.381035,-1.123613,1.062245,-1.399746,-0.737369
2014-01-01
01:00:00,0.149965,0.966987,1.839130,0.563322,0.056906
2014-01-01
2014-01-01
02:00:00,-0.950360,0.477881,1.863646,-1.915203,-0.300522
2014-01-01
03:00:00,-0.503429,-0.895489,-0.240227,-0.327176,0.123498
Wefinishwithsomeclosingoperationsinthis example:
In [39]: csv_file.close()
!rm -f $path*
SQL Databases
Python canworkwithanykindofSQLdatabaseandingeneralalsowith
anykindofNoSQLdatabase.Onedatabasethatisdeliveredwith Pythonby
defaultisSQLite3.Withi
beeasilyil ustrated:[29] t , thebasic Python approachto SQL databasescan
In [40]: import sqlite3 as sq3
SQLqueries areformulatedas string objects.Thesyntax,datatypes,etc.
ofcoursedependonthedatabaseinuse:
In [41]: query ='CREATE TABLE numbs (Date date, No1 real,No2 real)'
Openadatabaseconnection.
on disk: In this case, wegenerateanewdatabasefile
In [42]: con = sq3.connect(path + 'numbs.db')
Thenexecutethequerystatementtocreatethetablebyusingthemethod
execute:
In [43]: con.execute(query)
Out[43]: <sqlite3.Cursor at 0xb8a4490>
Tomakethequeryeffective,cal themethod commit:
In [44]: con.commit()
Nowthatwehaveadatabasefilewithatable,wecanpopulatethattable
withdata.Eachrowconsistsofdate­timeinformationandtwo floats:
In [45]: import datetime as dt
Asingledatarowcanbewrittenwiththerespective
follows: SQL statement,as
In [46]: con.execute('INSERT INTO numbs VALUES(?,
(dt.datetime.now(), 0.12, 7.3))?, ?)',
Out[46]: <sqlite3.Cursor at 0xb8a4570>
However,youusuallyhaveto (or want to) writealargerdatasetinbulk:
In [47]: data = np.random.standard_normal((10000, 2)).round(5)
In [48]: for rowcon.execute('INSERT
in data: INTO numbs VALUES(?, ?, ?)',
con.commit() (dt.datetime.now(), row[0], row[1]))
Therei s alsoamethodcalled executemany. Sincewehavecombined
currentdate­timeinformationwithourpseudorandomnumberdataset,we
cannotuseithere.Whatwecanuse,however,isfetchmanytoretrievea
certainnumberofrows at oncefromthedatabase:
In [49]: con.execute('SELECT * FROM numbs').fetchmany(10)
Out[49]: [(u'2014-09-28
(u'2014-09-28 15:16:19.486021',
15:16:19.762476', 0.30736,
0.12,7.3),
-0.21114),
(u'2014-09-28
(u'2014-09-28 15:16:19.762640',
15:16:19.762702', 0.95078,
0.95896, 0.50106),
0.15812),
(u'2014-09-28
(u'2014-09-28 15:16:19.762774',
15:16:19.762825', -0.42919,
-0.99502, -1.45132),
-0.91755),
(u'2014-09-28
(u'2014-09-28 15:16:19.762862',
15:16:19.762890', 0.25416,
-0.55879,-0.85317),
-0.36144),
(u'2014-09-28 15:16:19.762945', -2.04225, -1.29589),
(u'2014-09-28 15:16:19.762918', -1.61041, 0.43446)]
Orwecanjust readasingledatarowat atime:
In [50]: pointer = con.execute('SELECT * FROM numbs')
In[51]: for iprintin range(3):
pointer.fetchone()
Out[51]: (u'2014-09-28
(u'2014-09-28 15:16:19.486021', 0.12, 7.3)
0.30736,
(u'2014-09-28 15:16:19.762476',
15:16:19.762640', 0.95078, -0.21114)
0.50106)
In [52]: con.close()
!rm -f$path*
SQLcoveredinanysignificantwayinthischapter.Thebasicmessagesonlyare:
databasesarearatherbroadtopic;indeed,toobroadandcomplextobe
Python integratesprettywellwithalmostanydatabasetechnology.
ThebasicSQLsyntaxismainlydeterminedbythedatabaseinuse;the
rest is, aswesay,real Pythonic.
WritingandReadingNumPyArrays
NumPy i t s e l f hasfunctionstowriteandread
circumstances,suchaswhenyouhavetoconvert ndarray objectsina
NumPydtypes intocan
convenientandperformantfashion.Thissavesalotofeffortinsome
specificdatabasetypes(e.g.,forSQLite3).ToillustratethatNumPy
sometimes bean efficientreplacementfora SQL­based
replicatetheexamplefrombefore,this timeonlyusing NumPy: approach,we
In [53]: import numpy as np
Insteadofpandas,weusethe arange functionof
arrayobject with datetime objectsstored:[30] NumPyto generatean
In[54]: dtimes = np.arange('2015-01-01 10:00:00', '2021-12-31
22:00:00',
len(dtimes) dtype='datetime64[m]') # minute intervals
Out[54]: 3681360
Whati s atableina SQL databasei s astructuredarraywith
special dtype objectmirroringthe SQL tablefrombefore: NumPy. Weusea
In('No2',
[55]:'f')])
dtydata= np.dtype([('Date', 'datetime64[m]'), ('No1', 'f'),
= np.zeros(len(dtimes), dtype=dty)
Withthe dates object,wepopulatethe Date column:
In [56]: data['Date'] = dtimes
Theothertwocolumnsarepopulatedasbeforewithpseudorandom
numbers:
In [57]: data['No1']
a= np.random.standard_normal((len(dtimes),
= a[:, 0] 2)).round(5)
data['No2'] = a[:, 1]
Saving of ndarray objects i s highlyoptimizedandthereforequitef
Almost60MBofdatatakeslessthan0.1secondstosaveondisk(here a s t .
usinganSSD):
In [58]: %time np.save(path + 'array', data) # suffix .npy is added
Out[58]: Wall
CPUtimes:
time: 77.1
user ms0 ns, sys: 77 ms, total: 77 ms
In [59]: ll$path*
Out[59]: -rw-r--r-- 1 root 58901888 28. Sep 15:16
/flash/data/array.npy
Readingis evenfaster:
In [60]: %time np.load(path + 'array.npy')
Out[60]: Wall
CPU times: user ms10 ms, sys: 29 ms, total: 39 ms
time: 37.8
array([ (datetime.datetime(2015, 1, 1, 9, 0),
-1.4985100030899048,
0.9664400219917297),
(datetime.datetime(2015, 1, 1, 9, 1),
-0.2501699924468994,
-0.9184499979019165),
(datetime.datetime(2015, 1, 1, 9, 2),
1.2026900053024292,
0.49570000171661377),
0.8927800059318542,...,
(datetime.datetime(2021, 12, 31, 20, 57),
-1.0334899425506592),
(datetime.datetime(2021, 12, 31, 20, 58),
1.0062999725341797,
-1.3476499915122986),
(datetime.datetime(2021, 12, 31, 20, 59),
-0.08011999726295471,
0.4992400109767914)],
'<f4')]) dtype=[('Date', '<M8[m]'), ('No1', '<f4'), ('No2',
Adatasetof60MBisnotthatlarge.Therefore,letustryasomewhat
largerndarrayobject:
In [61]: data = np.random.standard_normal((10000, 6000))
In [62]: %time np.save(path + 'array', data)
Out[62]: CPUWalltime:
times: user
633 ms0 ns, sys: 631 ms, total: 631 ms
In [63]: ll $path*
Out[63]: -rw-r--r-- 1 root 480000080 28. Sep 15:16
/flash/data/array.npy
Inthis case,thefile ondiski
thanasecond.Thisi l ustratess thatwritingtodiskint
about480MBlargeandihistcasei
is writteninless
s mainly
file/objectfrom
speedofbetterSSDsatthetimeoft
hardware­bound,since480MB/srepresentsroughlytheadvertisedwriting
his writing(512MB/s).Readingthe
playarolehere):diskis evenfaster(notethatcachingtechniquesmightalso
In [64]: %time np.load(path + 'array.npy')
Out[64]: CPUWalltimes:
time: user
216 ms2 ms, sys: 216 ms, total: 218 ms
-0.99051776,array([[ 0.88124291,
0.10989742, -0.48626177, -0.60849881, ...,
-1.34261656],
[-0.42301145, 0.29831708, 1.29729826, ...,
-0.73426192, -0.13484905, 0.91787421],
1.47888978, [0.12322789,
-1.12452641,
-0.28728811, 0.85956891, ...,
-0.528133 ],
...,
-1.4457718 , [ 0.49509821,
0.06507559, -0.37130379, 1.35427048, ...,
0.0738847 ],-2.94133788, ...,
[1.76525714, -0.07876135,
-0.62581084, 0.0933164 , 1.55788205],
0.0528656 , [-1.18439949, -0.73210571, -0.45845113, ...,
-0.39526633, -0.5964333 ]])
In [65]: data
!rm -f= $path*
0.0
Inanycase,youcanexpectthatthiSQLs formofdatastorageandretrievali
much,muchfasterascomparedto databasesorusingthestandard s
pickle libraryforserialization. Ofcourse,youdonothavethe
functionalityofaSQLdatabaseavailablewitht his approach,but PyTables
willhelpinthis regard,assubsequentsectionsshow.
I/Owithpandas
Oneofthemajorstrengthsofthe pandaslibrary isthatit canreadand
writedifferentdataformatsnatively,includingamongothers:
CSV(comma­separated value)
SQL(Structured Query Language)
XLS/XSLX (Microsoft Excel files)
JSON(JavaScriptObject Notation)
HTML(HyperText Markup Language)
exportfunctions/methodsof pandas. Theparametersthattheimport
Table7­1lists al thesupportedformatsandthecorrespondingimportand
functionstakearelistedanddescribedinTable6­6(dependingonthe
functions,someotherconventionsmightapply).
Table7­1.ParametersofDataFramefunction
Format Input Output Remark
CSV read_csv to_csv Textfile
XLS/XLSX read_excel to_excel Spreadsheet
HDF read_hdf to_hdf HDF5 database
SQL read_sql to_sql SQL table
JSON read_json to_json JavaScript Object Notation
MSGPACK read_msgpack to_msgpack Portablebinaryformat
HTML read_html to_html HTML code
GBQ read_gbq to_gbq Google Big Query format
DTAAny read_stata E.g.,from HTML page
to_clipboard Formats104,105,108,113­115,117
read_clipboard to_stata
Any read_pickle to_pickle (Structured) Python object
Ourtest casewillagainbealargesetoffloating­pointnumbers:
In [66]: import
data = numpy
pandasasasnppd
np.random.standard_normal((1000000,
# sample data set 5)).round(5)
In [67]: filename = path + 'numbs'
Tothis end,wewillalsorevisit SQLite3 andwillcomparetheperformance
withalternativeapproachesusing pandas.
SQL Database
Allthatfollowswithregardto SQLite3 shouldbeknownbynow:
In [68]: import sqlite3 as sq3
In [69]: query = 'CREATE numbers real,
No3 real, No4 real, No5 real)' No2 real,\
TABLE (No1
In [70]: con = sq3.Connection(filename + '.db')
In[71]: con.execute(query)
Out[71]: <sqlite3.Cursor at 0x9d59c00>
Thistime, executemany
ndarray object: canbeappliedsincewewritefromasingle
In [72]: %%time
?)', data)ccon.executemany('INSERT
on.commit() INTO numbers VALUES (?, ?, ?, ?,
Out[72]: Wall
CPU times: user s13.9 s, sys: 229 ms, total: 14.2 s
time: 14.9
In [73]: ll $path*
Out[73]: -rw-r--r-- 1 root 54446080 28. Sep 15:16
/flash/data/numbs.db
Writingthewholedatasetof1,000,000rowstakesquiteawhile.The
readingofthewholetableintoa list objectis muchfaster:
In [74]: %%time
temp =temp[:2]
con.execute('SELECT * FROM numbers').fetchall()
print
temp = 0.0
Out[74]:
0.82665, [(-1.67378, -0.58292,
.034168, -1.1676, -1.10616, 1.14929, -0.0393), (1.38006,
-0.53274)]
CPUWalltimes:
time: user
1.68 1.54
s s, sys: 138 ms, total: 1.68 s
Reading SQLquery resultsdirectlyintoa
accomplished.Accordingly,you can NumPyndarray objectiseasily
query,asshownbythefollowingcodeandtheoutputinFigure7­1:
alsoeasilyplottheresultsofsucha
In [75]: %%time
resquery= np.array(con.execute(query).fetchall()).round(3)
= 'SELECT * FROM numbers WHERE No1 > 0 AND No2 < 0'
Out[75]: CPUWalltime:
times: user
799 ms766 ms, sys: 34 ms, total: 800 ms
In [76]: res%matplotlib inline# everyas100thplt result
import= res[::100]
matplotlib.pyplot
plt.plot(res[:,
plt.grid(True); 0],plt.xlim(-0.5,
res[:, 1], 'ro')
4.5); plt.ylim(-4.5, 0.5)
Figure7­1.Plotofthequeryresult
FromSQLtopandas
Agenerallymoreefficientapproach,however,is thereadingofeither
wholetablesorqueryresultswith pandas. Whenyouareabletoreada
wholetableintomemory,analyticalqueriescangenerallybeexecuted
muchfaste
pandas.io.sql r thanwhenusingtheSQLdisk­basedapproach.Thesublibrary
containsfunctionstohandledatastoredin SQL databases:
In [77]: import pandas.io.sql as pds
Readingthewholetablewith
as
timebottleneckireadingi t intoa pandas
NumPyndarraytakesroughlythesameamountof
object.There as here,the
s the SQL database:
In [78]: %time data = pds.read_sql('SELECT * FROM numbers', con)
Out[78]: CPUWalltimes:
time: user
2.23 s2.16 s, sys: 60 ms, total: 2.22 s
In [79]: data.head()
Out[79]: 0 -1.67378No1 -0.58292No2 -1.10616No3 1.14929No4 -0.03930No5
12 1.38006
0.79329 0.82665
0.11947 0.34168
2.06403 -1.16760
-0.36208 -0.53274
1.77442
34 -0.35292
-0.33507 -0.00715 -1.01193 0.23157
0.67483 1.59507 -1.21263 0.14745 1.30225
[5rows x 5 columns]
pandasafewin­memory:
querythattakes secondswith SQLite3for muchfasteranalytics.The
Thedataisnowin­memory.Thisallows
secondswith finishesinlessthan0.1 SQL
In [80]: %time data[(data['No1'] > 0) & (data['No2'] < 0)].head()
Out[80]: CPUWalltime:
times: user
49.9 50ms ms, sys: 0 ns, total: 50 ms
1.17749No1 -0.99949
68 0.18625 -1.13017No2 -0.24176
No3 -0.64047No4 1.58002No5
2.29854 0.91816 -0.92661
918 1.09481 -0.26301 1.11341 0.68716 -0.71524
20 0.40261 -0.45917 0.37339 -1.09515 0.96595
0.31836 -0.33039 -1.50109 0.52961 0.23972
[5 rowsx 5 columns]
pandas canmasterevenmorecomplexqueries,althoughitisneithermeant
norabletoreplace SQL databaseswhenit comestocomplex,relationaldata
structures.Theresultofthenextqueryis showninFigure7­2:
In [81]: %%time
(data['No1']res =data[['No1',
< -0.5)) 'No2']][((data['No1'] > 0.5) |
& ((data['No2'] < -1) | (data['No2'] >
1))]
Out[81]: CPUWalltimes:
time: user
48.7 49ms ms, sys: 0 ns, total: 49 ms
In [82]: plt.plot(res.No1, res.No2, 'ro')
plt.grid(True); plt.axis('tight')

Figure7­2.Scatterplotofcomplexqueryresult
Asexpected,usingthein­memoryanalyticscapabilitiesof
asignificantspeedup,provided pandasisable pandas leadsto
toreplicatetherespective
SQL statement.Thisisnottheonlyadvantageofusing
pandas is tightlyintegratedwith PyTables, whichisthetopicofthenext
pandas, though
section.Here,itsufficestoknowthatthecombinationofboth
I/Ooperationsconsiderably.Thisis showninthefollowing: canspeedup
In [83]: h5s = pd.HDFStore(filename + '.h5s', 'w')
In [84]: %time h5s['data'] = data
Out[84]: Wall
CPU times:
time: 161userms43 ms, sys: 60 ms, total: 103 ms
In[85]: h5s
Out[85]: <class
File 'pandas.io.pytables.HDFStore'>
/data /flash/data/numbs.h5s
path: frame (shape->[1000000,5])
In [86]: h5s.close()
Thewhole DataFrame withal thedatafromtheoriginal SQL tableis
writteninwellbelow1second.Readingisevenfaster,asistobeexpected:
In [87]: %%time
h5stemp==pd.HDFStore(filename + '.h5s', 'r')
h5s['data']
h5s.close()
Out[87]: Wall
CPU times: user ms13 ms, sys: 22 ms, total: 35 ms
time: 32.7
Abriefcheckofwhetherthedatasetsareindeedthesame:
In [88]: np.allclose(np.array(temp), np.array(data))
Out[88]: True
In [89]: temp = 0.0
Also,alookatthetwof i l e s now on disk,showingthatthe HDF5 format
consumessomewhatlessdiskspace:
In [90]: ll$path*
Out[90]: -rw-r--r-- 1 root 54446080 28. Sep 15:16
/flash/data/numbs.db
-rw-r--r-- 1 root 48007368 28. Sep 15:16
/flash/data/numbs.h5s
Asasummary,wecans t a t e thefollowingwith regard toourdummydata
set, Writingthedatawith
whichisroughly50MBins i z e :
SQLite3 takesmultipleseconds,with pandas
takingmuchlessthanasecond.
Readingthedatafromthe SQL databasetakesab
seconds,with pandas takinglessthan0.1second. i t morethanafew
DataasCSVFile
Oneofthemostwidelyusedformatstoexchangedatai s the CSV format.
Althoughi t i s notreallystandardized,i t canbeprocessedbyanyplatform
andthevastmajorityofapplicationsconcernedwithdataandfinancial
analytics.Theprevioussectionshowshowtowriteandreaddatatoand
fromCSVf iles stepbystepwithstandard Python functionality(cf. Reading
andWritingTextF i l e s ) . pandas makest h i s wholeprocedureabitmore
convenient,thecodemoreconcise,andtheexecutioningeneralfaster:
In [91]: %time data.to_csv(filename + '.csv')
Out[91]: CPUWalltimes:
time: user
5.87 5.55
s s, sys: 137 ms, total: 5.69 s
Readingthedatanowstoredinthe CSV f i l e andplottingiti
withtheread_csvfunction(cf. Figure7­3fortheresult): s accomplished
In[92]: %%time
pd.read_csv(filename + '.csv')[['No1',
'No3', 'No2',
'No4']].hist(bins=20)
Out[92]: Wall
CPU times: user s1.72 s, sys: 54 ms, total: 1.77 s
time: 1.78
Figure7­3.Histogram of fourdatasets
DataasExcelFile
Althoughworkingwith Excel spreadsheetsis thetopicofalater chapter,
wewanttobrieflydemonstratehowpandascanwritedatain
andreaddatafrom Excel spreadsheets.Wer e s t r i c t Excel format
thedatasetto100,000
rowsinthiscase:
In [93]: %time data[:100000].to_excel(filename + '.xlsx')
Out[93]: CPUWalltimes:
time: user
27.7 s27.5 s, sys: 131 ms, total: 27.6 s
Generatingthe Excell uspreadsheetwitht
quiteawhile.Thisi his smallsubsetofthedatatakes
strates whatkindofoverheadthespreadsheet
structurebringsalongwithi
procedure(cf. Figure7­4): t. Reading(andplotting)thedatais afaster
In[94]: %time pd.read_excel(filename + '.xlsx',
'Sheet1').cumsum().plot()
Out[94]: CPUWalltimes:
time: user
12.9 12.9
s s, sys: 6 ms, total: 12.9 s

Figure7­4.PathsofrandomdatafromExcelfile
Inspectionofthegeneratedf
describedl iles revealsthatthe DataFrame with HDFStore
as
combinationiaters inthischapter,furtherincreasesthebenefits).Thesame
themostcompactalternative(usingcompression,
amount of data asa CSV file—i.e.,asatextfile—issomewhatlargerinsize.
Thisis onereasonfortheslowerperformancewhenworkingwith CSV files,
theotherbeingtheveryfactthattheyare“only”generaltextfiles:
In [95]: ll $path*
Out[95]:-rw-r--r-- 1 root 48831681 28. Sep 15:17
/flash/data/numbs.csv
-rw-r--r-- 1 root 54446080 28. Sep 15:16
/flash/data/numbs.db
-rw-r--r-- 1 root 48007368 28. Sep 15:16
/flash/data/numbs.h5s
-rw-r--r-- 1 root 4311424 28. Sep 15:17
/flash/data/numbs.xlsx
In [96]: rm -f$path*
FastI/OwithPyTables
PyTables
http://www.hdfgroup.org).I HDF5 database/filestandard(cf.
performanceofI/Ooperationsandmakebestuseoftheavailablehardware.
is aPythonbindingforthe
t is specificallydesignedtooptimizethe
Thelibrary’simportnameis tables. Similarto pandas
in­memoryanalytics, PyTables is neitherablenormeanttobeafwheni t comesto
ul
replacementforSQLdatabases.However,i t bringsalongsomefeaturesthat
furtherclosethegap.Forexample,aPyTablesdatabasecanhavemany
tables, andi t supportscompressionandindexingandalsonontrivialqueries
ontables.Inaddition,itcanstore NumPy arrays efficientlyandhasitsown
Webeginwithafewimports:
flavorofarray­likedatastructures.
In [97]: import
import numpy
tablesasasnptb
import
import datetimeas dt as plt
matplotlib.pyplot
%matplotlib inline
WorkingwithTables
PyTables providesafile­baseddatabaseformat:
In [98]: filename =path+'tab.h5' 'w')
h5 = tb.open_file(filename,
Forourexamplecase,wegenerateatablewith2,000,000rowsofdata:
In[99]: rows =2000000
Thetableitselfhasadatetimecolumn,two
columns: int columns,andtwo float
In [100]: row_des'Date':= { tb.StringCol(26, pos=1),
'No1':
'No2': tb.IntCol(pos=2),
tb.IntCol(pos=3),
'No3':
'No4': tb.Float64Col(pos=4),
} tb.Float64Col(pos=5)
Whencreatingthet a b
addcompressionaswell:l e , wechoosenocompression.Al a t e r examplewill
In [101]: tabfilters= h5.create_table('/',
= tb.Filters(complevel=0) # no compression
'ints_floats', row_des,
title='Integers and Floats',
expectedrows=rows, filters=filters)
In [102]: tab
Out[102]: /ints_floats (Table(0,)) 'Integers
descriptionStringCol(itemsize=26,
"Date": :={ and
shape=(),Floats'dflt='', pos=0),
"No1":
"No3": Int32Col(shape=(),
"No2": Float64Col(shape=(),dflt=0,
Int32Col(shape=(), dflt=0.0,pos=1),pos=3),
dflt=0,pos=2),
"No4": Float64Col(shape=(),
byteorder := 'little' dflt=0.0, pos=4)}
chunkshape := (2621,)
In [103]: pointer = tab.row
Nowwegeneratethesampledata:
In[104]: ran_flo np.random.randint(0, 10000, size=(rows,2))
ran_int = np.random.standard_normal((rows, 2)).round(5)
Thesampledatasetis writtenrow­by­rowtothetable:
In [105]: %%time
for ipointer['Date']=
in range(rows): dt.datetime.now()
ran_int[i,0]1]
pointer['No1'] = ran_int[i,
pointer['No2']
pointer['No3'] == ran_flo[i,
pointer['No4'] ran_flo[i, 1]0]
pointer.append()
#this appends thedataone androw forward
#
tab.flush()moves the pointer
Out[105]: CPUWalltimes:
time: user
19.4 15.7
s s, sys: 3.53 s, total: 19.2 s
Alwaysremembertocommityourchanges.Whatthecommitmethodisfor
SQLite3 database,the flushmethod is forTable
theinspectthedataondisk,firstlogicallyviaour PyTables. Wecannow
physicallyviathefile information: objectandsecond
In [106]: tab
Out[106]: /ints_floats (Table(2000000,))
:={ 'Integers
descriptionStringCol(itemsize=26,
"Date": shape=(),anddflt='',
Floats'pos=0),
"No1":
"No2": Int32Col(shape=(),
Int32Col(shape=(), dflt=0,
dflt=0, pos=1),
pos=2),pos=3),
"No3":
"No4": Float64Col(shape=(),
Float64Col(shape=(), dflt=0.0,
dflt=0.0, pos=4)}
byteorder := 'little'
chunkshape := (2621,)
In[107]: ll$path*
Out[107]: -rw-r--r--
/flash/data/tab.h5 1 root 100156256 28. Sep 15:18
Therei s amoreperformantand Pythonic waytoaccomplishthesame
result, bytheuseofNumPystructuredarrays:
In'<i4'),
[108]: dty = np.dtype([('Date', 'S26'), ('No3',
('No1', '<i4'),
'<f8'), ('No2',
('No4',
'<f8')]) sarray = np.zeros(len(ran_int), dtype=dty)
In [109]: sarray
Out[109]: array([('',
('', 0,0,...,('',0, 0.0, 0.0),
0.0,0, 0,0.0),0.0,0, 0.0), ('', 0,('',0,0,0.0,0, 0.0),
0.0, 0.0),
'<i4'), ('No3', dtype=[('Date',
('',0,0, 0.0,'S26'), 0.0)], ('No1', '<i4'), ('No2',
'<f8'), ('No4', '<f8')])
%%time
In [110]: sarray['No1'] ran_int[:, 0]
sarray['Date']==dt.datetime.now()
sarray['No2']
sarray['No3'] = ran_int[:, 1]
sarray['No4'] == ran_flo[:,
ran_flo[:, 0]1]
Out[110]: Wall
CPU times:
time: 131userms113 ms, sys: 18 ms, total: 131 ms
creation of thetableboilsdown to thefollowinglineofcode.Notethat the
Equippedwiththecompletedatasetnowstoredinthestructuredarray,the
instead: s notneededanymore; PyTables usesthe NumPydtype
rowdescriptioni
In [111]: %%time
h5.create_table('/', 'ints_floats_from_array', sarray,
expectedrows=rows, filters=filters)
title='IntegersandFloats',
Out[111]: Wall
CPU times:
time: 154userms38 ms, sys: 117 ms, total: 155 ms
Floats' /ints_floats_from_array (Table(2000000,)) 'Integers and
"Date": StringCol(itemsize=26, shape=(), dflt='', pos=0),
description:={
"No1": Int32Col(shape=(), dflt=0, pos=1),
"No2":
"No3": Int32Col(shape=(),
Float64Col(shape=(),dflt=0,pos=2),
dflt=0.0, pos=3),
"No4": Float64Col(shape=(),
byteorder := 'little' dflt=0.0, pos=4)}
chunkshape := (2621,)
Beinganorderofmagnitudefasterthanthepreviousapproach,t
approachachievesthesameresultandalsoneedslesscode: his
In [112]: h5
Out[112]:
root_uep='/',File(filename=/flash/data/tab.h5, title=u'', mode='w',
filters=Filters(complevel=0, shuffle=False,
fletcher32=False,
least_significant_digit=None))
/(RootGroup) u''
/ints_floats
description (Table(2000000,))
:= { 'Integers and Floats'
"Date": StringCol(itemsize=26, shape=(), dflt='',
pos=1), pos=0),
"No1": Int32Col(shape=(), dflt=0, pos=2),
"No2":
"No3":
"No4": Float64Col(shape=(),
Float64Col(shape=(), dflt=0.0,
dflt=0.0, pos=3),
pos=4)}
byteorder :='little'
chunkshape := (2621,)
Floats' /ints_floats_from_array
description := {
(Table(2000000,)) 'Integers and
"Date":
"No2":
"No3": StringCol(itemsize=26,
"No1": Float64Col(shape=(),
Int32Col(shape=(), dflt=0,shape=(),
dflt=0.0, dflt='', pos=0),
pos=1),pos=3),
pos=2),
"No4":
byteorderFloat64Col(shape=(),
:='little' dflt=0.0, pos=4)}
chunkshape:=(2621,)
Wecannowdeletetheduplicatetable, sinceit is nolongerneeded:
In [113]: h5.remove_node('/', 'ints_floats_from_array')
Tableobject
Thecomestos behavesliketypical Python and NumPy objectswhenit
licing, forexample:
In [114]: tab[:3]
Out[114]:
0.06343), array([('2014-09-28 15:17:57.631234', 4342, 1672, -0.9293,
0.3964), ('2014-09-28 15:17:57.631368', 3839, 1563, -2.02808,
('2014-09-28 15:17:57.631383', 5100, 1326, 0.03401,
0.46742)], dtype=[('Date', 'S26'), ('No1', '<i4'), ('No2',
'<i4'), ('No3',
'<f8'), ('No4', '<f8')])
Similarly,wecanselectsinglecolumnsonly:
In [115]: tab[:4]['No4']
Out[115]: array([ 0.06343, 0.3964 , 0.46742, -0.56959])
functionstotablesorsubsetsofthetable: NumPy universal
Evenmoreconvenientandimportant:wecanapply
In [116]: %time np.sum(tab[:]['No3'])
Out[116]: Wall
CPU times: user ms31 ms, sys: 58 ms, total: 89 ms
time: 88.3
-115.34513999999896
In [117]: %time np.sum(np.sqrt(tab[:]['No1']))
Out[117]: CPUWalltimes:
time: user
101 ms53 ms, sys: 48 ms, total: 101 ms
133360523.08794475
anndarrayobject( cf. Figure7­5):Table objectalsobehavesverysimilarlyto
Whenit comestoplotting,the
In[118]: %%time
plt.hist(tab[:]['No3'], bins=30)
plt.grid(True)
print len(tab[:]['No3'])
Out[118]: 2000000
CPU times:
Wall time: 485userms396 ms, sys: 89 ms, total: 485 ms
Figure7­5.Histogramofdata
is neatly
And,ofcourse,wehaveratherflexibletools to querydataviatypicalSQL­
likestatements,asinthefollowingexample(theresultofwhich
l ustrated inFigure7­6;compareit withFigure7­2,basedona pandas
iquery):
In [119]: %%time
res = np.array([(row['No3'], -1)|row['No4'])
&((No4 < -0.5)|(No3 for0.5))row\in
tab.where('((No3< (No4 > >1))')])[::100]
Out[119]: Wall
CPU times:
time: 469userms530 ms, sys: 52 ms, total: 582 ms
In [120]: plt.plot(res.T[0],
plt.grid(True) res.T[1], 'ro')
Figure7­6.Scatterplotofqueryresult
FAST COMPLEX QUERIES
Both pandas and PyTables areabletoprocesscomplex, SQL­like
queriesandselections.Theyarebothoptimizedforspeedwhenit
comestosuchoperations.
Asthefollowingexamplesshow,workingwithdatastoredin PyTables
TableobjectfrommakesyoufeellikeyouareworkingwithNumPyandin­
amemory,both as
asyntax anda performancepoint of view:
In [121]: %%time
values
print "Max= tab.cols.No3[:]
%18.3f" % values.max()
print
print "Ave
"Min %18.3f"
%18.3f" %% values.mean()
values.min()
print "Std %18.3f" % values.std()
Out[121]: MaxAve 5.152
-0.000
MinCPUStd times: user -5.537
441.000ms, sys: 39 ms, total: 83 ms
Wall time: 82.6 ms
In [122]: % timeresults = [(row['No1'], row['No2']) for row in
tab.where('((No1
& ((No2 > 9800)
4500) |& (No1<
(No2< 200))
5500))')]\
for print res inresresults[:4]:
Out[122]: (9987,
(9934, 4965)
5263)
4729)
(9960,
(130, 5023)user 167 ms, sys: 37 ms, total: 204 ms
CPUWalltime:
times: 118 ms
In [123]: %%time
results = [(row['No1'], row['No2']) for(No2row>in9776)')]
for print tab.where('(No1
res inresresults: ==1234) &
Out[123]: (1234,
(1234, 9805) 9785)
(1234,
CPUWalltimes:9821)user 93 ms, sys: 40 ms, total: 133 ms
time: 90.1 ms
WorkingwithCompressedTables PyTables istheapproachittakesto
improvetheperformanceofI/Ooperations.Howdoest
compression.Itusescompressionnotonlytosavespaceondisk,butalsoto
Amajoradvantageofworkingwith his work?WhenI/O
is thebottleneckandtheCPUi
effect
followingexamples s ableto(de)compressdatafast,the
are basedonthe net
I/Oofa state­of­the­art(atthetime
ofcompressionintermsofspeedmightbepositive.Sincethe
tobserved.However,therei
his writing)SSD,thereis nospeedadvantageofcompressiontobe of
compression: s alsoalmostnodisadvantageofusing
In[124]: h5cfilename = path + 'tab.h5c' 'w')
= tb.open_file(filename,
In[125]: filters = tb.Filters(complevel=4, complib='blosc')
In [126]: tabc = h5c.create_table('/', 'ints_floats', sarray,
title='Integers andfilters=filters)
Floats',
expectedrows=rows,
In [127]: %%time
res = np.array([(row['No3'], row['No4']) for row> 0.5))
in \
tabc.where('((No3 <-0.5) | (No3
&((No4 < -1)| (No4 > 1))')])
[::100]
Out[127]: Wall
CPU times:
time: 602userms670 ms, sys: 41 ms, total: 711 ms
Generatingthetablewiththeoriginaldataanddoinganalyticsonit is
slightlyslowercomparedtotheuncompressedtable.Whataboutreading
thedataintoan ndarray? Let’scheck:
In [128]: %time arr_non = tab.read()
Out[128]: Wall
CPU times: user ms13 ms, sys: 49 ms, total: 62 ms
time: 61.3
In [129]: %time arr_com = tabc.read()
Out[129]: CPUWalltimes:
time: user
193 ms161 ms, sys: 33 ms, total: 194 ms
iThisindeedtakesmuchlongerthanbefore.However,thecompressionratio
s about20%,saving80%ofthespaceondisk.Thismaybeofimportance
evendatacenters:
forbackuproutinesorwhenshufflinglargedatasetsbetweenserversor
In[130]: ll$path*
Out[130]: -rw-r--r--
/flash/data/tab.h5 1 root 200313168 28. Sep 15:18
-rw-r--r-- 1 root 41335178 28. Sep 15:18
/flash/data/tab.h5c
In [131]: h5c.close()
WorkingwithArrays
Wehavealreadyseenthat NumPy hasbuilt­in fast writingandreading
capabilitiesfor ndarray objects. PyTablesis alsoquitefastandefficient
whenitcomestostoringandretrieving ndarray objects:
In [132]: %%time
arr_int = h5.create_array('/', 'integers', ran_int)
arr_flo = h5.create_array('/', 'floats', ran_flo)
Out[132]: Wall
CPU times:
time: 35userms2 ms, sys: 33 ms, total: 35 ms
WritingtheseobjectsdirectlytoanHDF5databaseis ofcoursemuchfaster
thanloopingovertheobjectsandwritingthedatarow­by­row toa Table
object.Af inal inspectionofthedatabaseshowsnowthreeobjectsinit, the
tableandthetwoarrays:
In [133]: h5
Out[133]:
root_uep='/',File(filename=/flash/data/tab.h5,
f title=u'', mode='w',
ilters=Filters(complevel=0,
fletcher32=False, least_sig shuffle=False,
nificant_digit=None))
/(RootGroup) u'' 2))
/floats
atom := (Array(2000000, '' dflt=0.0)
Float64Atom(shape=(),
maindim:=:=0'numpy'
flavor
byteorder
chunkshape :=:='little'
None 2)) ''
/integers
atom (Array(2000000,
maindim:=:=:=0
flavor Int64Atom(shape=(),
'numpy'
:=
dflt=0)
byteorder
chunkshape :='little'
None 'Integers and Floats'
/ints_floats
description (Table(2000000,))
:= {
"Date":
"No1": StringCol(itemsize=26,
Int32Col(shape=(), shape=(),
dflt=0, pos=1), dflt='', pos=0),
"No2": Float64Col(shape=(),
"No3": dflt=0.0,pos=2),pos=3),
Int32Col(shape=(),dflt=0,
"No4": Float64Col(shape=(),
byteorder := 'little' dflt=0.0, pos=4)}
chunkshape := (2621,)
In [134]: ll $path*
Out[134]:-rw-r--r--
/flash/data/tab.h5 1 root 200313168 28. Sep 15:18
-rw-r--r-- 1 root 41335178 28. Sep 15:18
/flash/data/tab.h5c
In [135]: h5.close()
In[136]: !rm -f $path*
HDF5­BASED DATA STORAGE
The HDF5 database(file) formatis apowerfulalternativeto,for
example,relationaldatabaseswheni t comestostructurednumerical
andfinancialdata.Bothonastandalonebasiswhenusing
directlyandwhencombiningitwiththecapabilitiesof PyTablesyou
pandas,
availablehardwareallows.
canexpecttogetalmostthemaximumI/Operformancethatthe
Out­of­MemoryComputations
PyTablessupports out­of­memoryoperations,whichmakesit possibleto
implementarray­basedcomputationsthatdonotfitintothememory:
In [137]: h5filename = path + 'array.h5''w')
= tb.open_file(filename,
Wecreatean EArray objectthati s extendableinthef
hasafixedwidthof1,000intheseconddimension: i r s t dimensionand
In [138]: near= =1000h5.createEArray(h5.root, 'ear',
atom=tb.Float64Atom(),
shape=(0, n))
Sinceit is extendable,suchanobjectcanbepopulatedchunk­wise:
In [139]: %%time
rand =innp.random.standard_normal((n, n))
forear.flush()
iear.append(rand)
range(750):
time: user
Out[139]: CPUWalltimes: s s, sys: 7.29 s, total: 9.71s
20.6 2.42
caninspectthemeta­informationprovidedfortheobjectaswellasthedisk
Tocheckhowmuchdatawehavegeneratedlogicallyandphysically,we
spaceconsumption:
In [140]: ear
Out[140]: /earatom:=
(EArray(750000, 1000)) '' dflt=0.0)
Float64Atom(shape=(),
flavor :='numpy'
maindim:=0
byteorder
chunkshape:=:=(8,
'little'1000)
In [141]: ear.size_on_disk
Out[141]: 6000000000L
EArray objectis6GBlarge.Foranout­of­memorycomputation,we
Theneedatarget EArray objectinthedatabase:
In [142]: out = h5.createEArray(h5.root, 'out',
atom=tb.Float64Atom(),
shape=(0, n))
PyTables
elibrary hasaspecialmoduletocopewithnumericalexpressions
f iciennumexpr.
tly. It iscalled Expr andi s basedonthenumericalexpression
Thisiswhatwewanttousetocalculatethemathematical
expressioninEquation7­1onthewhole EArray objectthatwegenerated
before.
Equation7­1.Examplemathematicalexpression
Thefollowingcodeshowsthecapabilitiesforout­of­memorycalculations
inaction:
In [143]: expr# the= tb.Expr('3
numerical *expression
sin(ear) +assqrt(abs(ear))')
astringobject
expr.setOutput(out, append_mode=True)
# target to store results is disk-based array
In [144]: %time# evaluation
expr.eval()of the numerical expression
# and storage of results in disk-based array
Out[144]: CPU/out
Walltimes: user 34.4 s, sys: 11.6 s, total: 45.9 s
time:1min41s
(EArray(750000,1000))''
atom := :=0
maindim Float64Atom(shape=(), dflt=0.0)
flavor := 'numpy'
byteorder
chunkshape:=:='little'
(8, 1000)
In [145]: out[0, :10]
Out[145]:
-0.05596624,array([-0.95979563, -1.21530335, 0.02687751, 2.88229293,
2.81602673]) -1.70266651, -0.58575264, 1.70317385, 3.54571202,
consideredquitefast, inparticular as place
Giventhatthewholeoperationtakes out­of­memory,it can be
it is executedonstandardhardware.
us brieflycomparethis tothein­memoryperformanceofthe numexpr
Letmodule(seealsoChapter8):
In [146]: %time# readwhole
imarray =array
ear.read()
into memory
Out[146]: CPUWalltimes:
time: user
5.39 1.26
s s, sys: 4.11 s, total: 5.37 s
In [147]: import
expr = numexpr as ne + sqrt(abs(imarray))'
'3 * sin(imarray)
In [148]: ne.set_num_threads(16)
%time ne.evaluate(expr)[0, :10]
Out[148]: CPUWalltime:3.81s
times: user 24.2-1.21530335,
array([-0.95979563, s, sys: 29.1 0.02687751,
s, total: 53.32.88229293,
s
-0.05596624, -1.70266651, -0.58575264, 1.70317385, 3.54571202,
2.81602673])
In [149]: h5.close()
In [150]: !rm -f $path*
Conclusions
SQL­based (i.e., relational)databaseshaveadvantageswhenitcomesto
complexdatastructuresthatexhibitl ots ofrelationsbetweensingle
objects/tables.ThismightjNumPyndarray­based
disadvantageoverpure ustify insomecircumstancest
or heir performance
pandasDataFrame­based
approaches.
However,manyapplicationareasinfinanceorscienceingeneral,can
succeedwithamainlyarray­baseddatamodelingapproach.Inthesecases,
NumPyI/Ocapabilities, acombinationof NumPyandPyTables capabilities,
hugeperformanceimprovementscanberealizedbymakinguseofnative
orofthe pandasapproach viaHDF5­basedstores.
Whilearecenttrendhasbeentousecloud­basedsolutions—wherethe
cloud ismade upofalargenumberofcomputingnodesbasedonin a
commodityhardware—oneshouldcarefullyconsider,especially
financialcontext,whichhardwarearchitecturebestservestheanalytics
requirements.ArecentstudybyMicrosoftshedssomelightonthistopic:
We claimthatasingle“scale­up”servercanprocesseachofthesejobsand
doaswellorbetterthanaclusterintermsofperformance,cost,power,and
serverdensity.
Companies,researchi nstitutions, andothersinvolvedindataanalytics
—Appuswamyetal.(2013)
ingeneralandthendecideonthehardware/softwarearchitecture,interms
shouldthereforeanalyzef irst whatspecifictaskshavetobeaccomplished
of:Scalingout
Usingaclusterwith manycommoditynodeswithstandardCPUsand
relativelylowmemory
Scalingup
GPU,andlargeamountsofmemory
Usingoneorafewpowerfulserverswithmany­coreCPUs,possiblya
Ourout­of­memoryanalyticsexample in this chapterunderpinsthe
observation.Theout­of­memorycalculationofthenumericalexpression
PyTables takesroughly1.5minutes
withtaskexecutedin­memory(usingthe on standardhardware.Thesame
numexpr library)takesabout4
seconds,whilereadingthewholedatasetfromdisktakesj
seconds.Thisvalueisfromaneight­coreserverwithenoughmemory(in
ust over5
tuphardwareandapplyingdifferentimplementationapproachesmight
significantlyinfluenceperformance.Moreonthisinthenextchapter.
his particularcase,64GBofRAM)andanSSDdrive.Therefore,scaling
FurtherReading
Thepapercitedatthebeginningofthechapteraswellasinthe
“Conclusions”sectioni s agoodread,andagoodstartingpointtothink
abouthardwarearchitectureforfinancialanalytics:
Appuswamy,Rajaetal.(2013):“NobodyEverGotFiredforBuyinga
Cluster.”MicrosoftResearch,Cambridge,England,
http://research.microsoft.com/apps/pubs/default.aspx?id=179615.
Asusual,theWebprovidesmanyvaluableresourceswithregardtothe
topicscoveredint h i s chapter:
ForserializationofPythonobjectswithpickle,refertothe
documentation:http://docs.python.org/2/library/pickle.html.
Anwebsite:http://docs.scipy.org/doc/numpy/reference/routines.io.html.
overviewoftheI/OcapabilitiesofNumPyisprovidedontheSciPy
ForI/Owith pandas seetherespectivesectionintheonline
documentation:http://pandas.pydata.org/pandas­docs/stable/io.html.
ThePyTableshomepageprovidesbothtutorials anddetailed
documentation:http://www.pytables.org.
Here,we do notdistinguishbetweendifferentlevelsofRAMandprocessor
[28]caches.Theoptimaluseofcurrentmemoryarchitecturesi s atopic in itself.
Anotherfirst­clas citizeninthedatabaseworldis MySQL, withwhich Python
[29]alsointegratesverywell.Whilemanywebprojectsareimplementedonthebasisof
MySQL,
theso­calledPHP,there
LAMP stack,whichgenerallystandsfor Linux,PythonreplacesPHPfor
arealsoalargenumberofstackswhere ApacheWeb server,
the P inthestack.Foranoverviewofavailabledatabaseconnectors,visit
https://wiki.python.org/moin/DatabaseInterfaces.
[30]Cf.http://docs.scipy.org/doc/numpy/reference/arrays.datetime.html.
Chapter8.PerformancePython
Don’tloweryourexpectationstomeetyourperformance.Raiseyourlevelof
performancetomeetyourexpectations.
—RalphMarston
Whenitcomestoperformance­criticalapplicationstwothingsshould
areweusingtherightperformancelibraries?Anumberofperformance
alwaysbechecked:areweusingtherightimplementationparadigmand
libraries canbeusedtospeeduptheexecutionof
presented
others,youwillfindthefollowingl Python code.Among
in thischapter(althoughinadifferentorder):
ibraries useful,allofwhichare
Cython, formerging Python with C paradigmsforstatic compilation
IPython.parallel, fortheparallelexecutionofcode/functionslocally
oroveracluster
numexpr, forfast numericaloperations
multiprocessing,Python’s
processing built­inmodulefor(local)parallel
Numba, fordynamicallycompiling Python codefortheCPU
NumbaPro,
andGPUs fordynamicallycompiling Python codeformulticoreCPUs
Throughoutt his chapter,wecomparetheperformanceofdifferent it
implementationsofthesamealgorithms.Tomakethecomparisonab
easier,wedefineaconveniencefunctionthatallowsustosystematically
comparetheperformanceofdifferentfunctionsexecutedonthesameor
differentdatasets:
In [1]: def perf_comp_data(func_list,
''' Function to compare data_list,rep=3,
the performance of number=1):
different
functions.
Parameters
func_list
==========
list with:: list
data_list listfunction names as strings
rep list intwithof data
:number set names as strings
numbernumber: intof repetitions
executions
of the whole comparison
for every function
'''from timeit import repeat
res_list
for name = {}
enumerate(func_list):
in= name[1]+'('+
stmt
setup ="from __main__ data_list[name[0]]
import"+ name[1]+ + ')'', '\
results = repeat(stmt=stmt, +setup=setup,
data_list[name[0]]
res_list[name[1]] repeat=rep,
=sum(results)number=number)
/rep
res_sort= sorted(res_list.iteritems(),
key=lambda (k,v): (v, k))
for item in res_sort:
relprint= item[1]/ res_sort[0][1]
'function:
',av. time 'sec:%9.5f,'
+ item[0] + \% item[1] \
+ 'relative: %6.1f' %rel
Infinance,likeinotherscientand
PythonParadigms Performance
ific anddata­intensivedisciplines,numerical
computationsonlargedatasetscanbequitetime­consuming.Asan
expression onan arraywith500,000numbers.Wechoosetheexpressionin
example,wewanttoevaluateasomewhatcomplexmathematical
Equation8­1,whichleadstosomecomputationalburdenpercalculation.
Apartfromthat, it doesnothaveanyspecificmeaning.
Equation8­1.Examplemathematicalexpression
Equation8­1iseasilytranslatedintoa Python function:
In [2]:from math import * ** 0.5 sin(2
def f(x):
return abs(cos(x)) + + 3 * x)
Usingthe range functionwecangenerateefficientlya list objectwith
500,000numbersthatwecanworkwith:
In [3]: Ia_py=
= 500000range(I)
Asthef irst implementation,considerfunction f1, whichloopsoverthe to
wholedatasetandappendsthesingleresultsofthefunctionevaluations
aresults list object:
In [4]: def f1(a):
resfor =x []in a:
returnres.append(f(x))
res
Thisisnottheonly way toimplementthis.Onecanalsousedifferent
Python paradigms,likeiteratorsorthe eval function,togetfunctionsof
theform f2 and f3:
In [5]: deff2(a):
return [f(x) for x in a]
In [6]: def f3(a):
ex = 'abs(cos(x))
return [eval(ex) for** x0.5in+a]sin(2 + 3 * x)'
Ofcourse,thesamealgorithm
Python can
his case,beimplementedbytheuseof
whatsoever;allloopingtakesplaceonthe
vectorizationtechniques.Int the arrayNumPyofdatais
insteadofalistobject.Thefunctionimplementationf4showsnoloops
level: levelandnot onNumPy
an ndarrayobject
the
In [7]: import numpy as np
In [8]: a_np = np.arange(I)
In [9]: def f4(a):
return (np.abs(np.cos(a))
np.sin(2 + 3 * a))** 0.5 +
Then,wecanuseaspecializedlibrarycalled numexpr toevaluatethe
numericalexpression.Thislibraryhasbuilt­insupportformultithreaded
execution.Therefore,tocomparetheperformanceofthesinglewiththe
multithreadedapproach,wedefinetwodifferentfunctions,
thread)and f6 (multiplethreads): f5 (single
In [10]: import numexpr as ne
ex='abs(cos(a)) ** 0.5 + sin(2 + 3 * a)'
In [11]: def f5(a):
ne.set_num_threads(1)
return ne.evaluate(ex)
In [12]: def f6(a):
ex = 'abs(cos(a)) ** 0.5 + sin(2 + 3 * a)'
ne.set_num_threads(16)
return ne.evaluate(ex)
InEquation 8­1 on anarrayof size500,000—isimplementedinsixdifferent
total, thesametask—i.e., the evaluation of thenumericalexpressionin
ways:Standard Python functionwithexplicitlooping
Iterator approachwithimplicitlooping
Iterator approachwithimplicitloopingandusing eval
NumPy vectorizedimplementation
Single­threadedimplementationusing numexpr
Multithreadedimplementationusing numexpr
First,letuscheckwhethertheimplementationsdeliverthesamer
Weusethe IPython cel magiccommand%%timetorecordthetotal e s u l t s .
executiontime:
In [13]: %%time
r1r2 =f1(a_py)
r3r4 =f2(a_py)
=f3(a_py)
r5r6 =f4(a_np)
=f5(a_np)
= f6(a_np)
Out[13]: CPUWalltimes:
time: user
16 s 16 s, sys: 125 ms, total: 16.1 s
Thendarray(­like)
NumPy functionobjectscontainthesamedata:
allclose allowsforeasycheckingofwhethertwo
In[14]: np.allclose(r1, r2)
Out[14]: True
In [15]: np.allclose(r1, r3)
Out[15]: True
In [16]: np.allclose(r1, r4)
Out[16]: True
In [17]: np.allclose(r1, r5)
Out[17]: True
In [18]: np.allclose(r1, r6)
Out[18]: True
Thisobviouslyisthecase.Themoreinterestingquestion,ofcourse,ishow
thedifferentimplementationscomparewithrespecttoexecutionspeed.To
this end,weusethe perf_comp_data
anddatasetnamestoi t: functionandprovideal thefunction
data_list == ['f1',
In [19]: func_list ['a_py','f2','f3', 'f4', 'a_np',
'a_py', 'a_py', 'f5','f6']'a_np', 'a_np']
Wenowhaveeverythingtogethertoinitiate thecompetition:
In [20]: perf_comp_data(func_list, data_list)
function: f5,f6, av.av. time
Out[20]: function: sec: 0.02711,
time sec: 0.00583, relative:
relative: 4.61.0
function:
function: f4,
f2, av.
av. time
time sec:
sec: 0.06331,
0.46864, relative:
relative: 10.9
80.3
function: f1, av. time sec: 0.59660, relative: 102.3
function: f3, av. time sec: 15.15156, relative: 2597.2
Therei f4isslower thannumexpr
s aclearwinner:themultithreaded pure Python f6. Its
f5.The implementation
speedadvantage,ofcourse,dependsonthenumberofcoresavailable.The
vectorizedNumPyversion
implementations
f3istheslowest f1andf2 aremorethan80timesslowerthanthewinner.
version,sincetheuse of the eval functionforsuchalarge
thestring­basedexpressioni s evaluatedonceandthencompiledforlnumexpr,
numberofevaluationsgeneratesahugeoverhead.Inthecaseof ater
use;withthe Pythoneval functionthisevaluationtakesplace500,000
times.
MemoryLayoutandPerformance
NumPy allowsthespecificationofaso­called dtype perndarrayobject: for
example, np.int32 or f8.NumPy alsoallows
onthestructureoftheobject,onelayoutcan us tochoosefromtwo
differentmemorylayoutswheninitializing anhaveadvantagescomparedto
ndarray object.Depending
theother.Thisis il ustrated inthefollowing:
In [21]: import numpy as np
In [22]: np.zeros((3, 3), dtype=np.float64, order='C')
Out[22]: array([[[ 0.,0., 0.,0., 0.],
[ 0., 0., 0.], 0.]])
ThewayyouinitializeaNumPy
influenceontheperformance ofndarray object
operations on can haveasignificant
thesearrays(givenacertain
size of array).Insummary,thei
np.zeros or np.array) takesasinput:
nitialization ofan ndarray object(e.g., via
shapeEitheran int, asequenceof ints, orareferencetoanother
numpy.ndarray
dtypeA numpy.dtype—these
(optional) are NumPy­specific basicdatatypesfor
numpy.ndarrayobjects
orderTheorderinwhichtostoreelementsinmemory:
(optional) C for C­like (i.e., row­
wise)or FforFortran­like (i.e., column­wise)
Considerthe C­like (i.e., row­wise),storage:
In [23]: c = np.array([[[ 1.,2., 1.,2., 1.],
[ 3., 3., 2.], 3.]], order='C')
In this case,the1s,the2s,andthe3s are stored nextto eachother.By
contrast,considerthe Fortran­like (i.e., column­wise)storage:
In [24]: f = np.array([[[ 1.,2., 1.,2., 1.],
[ 3., 3., 2.], 3.]], order='F')
somewaywhenthe array is large:way that 1,2, and3 are nextto each other
Now,thedataisstoredinsucha
ineachcolumn.Let’sseewhetherthememorylayoutmakesadifferencein
In [25]: xFC == np.random.standard_normal((3,
order='C') 1500000))
np.array(x, order='F')
x = 0.0
Nowl e t ’ s implementsomestandardoperationsonthe
First, calculatingsums: C­like layoutarray.
In [26]: %timeit C.sum(axis=0)
Out[26]: 100 loops, best of 3: 11.3 ms per loop
In [27]: %timeit C.sum(axis=1)
Out[27]: 100 loops, best of 3: 5.84 ms per loop
Calculatingsumsoverthefirstaxisisroughlytwotimesslowerthanover
thesecondaxis.Onegetssimilarresultsforcalculatingstandarddeviations:
In [28]: %timeit C.std(axis=0)
Out[28]: 10 loops, best of 3: 70.6 ms per loop
In [29]: %timeit C.std(axis=1)
Out[29]: 10loops, best of 3: 32.6 ms per loop
Forcomparison,considerthe Fortran­like layout.Sumsfirst:
In [30]: %timeit F.sum(axis=0)
Out[30]: 10 loops, best of 3: 29.2 ms per loop
In [31]: %timeit F.sum(axis=1)
Out[31]: 10 loops, best of 3: 37 ms per loop
Althoughabsolutelyslowercomparedtotheotherlayout,therei
relativedifferenceforthetwoaxes.Now,standarddeviations: s hardlya
In [32]: %timeit F.std(axis=0)
Out[32]: 10 loops, best of 3: 107 ms per loop
In [33]: %timeit F.std(axis=1)
Out[33]: 10 loops, best of 3: 98.8 ms per loop
likelayout.Thereis asmalldifferencebetweenthetwoaxes,butagainC­it is
Again,thislayoutoptionleadstoworseperformancecomparedtothe
notgeneralas pronounced as withtheotherlayout.Theresultsindicatethat in
the C­like optionwillperformbetter—which is alsothereasonwhy
NumPyndarray
specified: objectsdefaulttot h i s memorylayouti f nototherwise
In [34]: C = 0.0; F = 0.0
ParallelComputing
processorsthathavemultiplecores.Moreover,moderncloud­based
Nowadays,eventhemostcompactnotebookshavemainboardswith
highlyscalable,parallelarchitecturesa t ratherlow,variablecosts.Thisfor
bringslarge­scalecomputingtothesmallbusiness,theresearcher,andeven
computingofferings,likeAmazon’sEC2orMicrosoft’sAzure,allow
appropriatetoolsarenecessary.Onesuchtoolis the IPython.parallel
theambitiousamateur.However,toharnessthepowerofsuchofferings,
library.
TheMonteCarloAlgorithm
Afinancialalgorithmthatleadstoahighcomputationalburdeni s the
MonteCarlovaluationofoptions.Asaspecificexample,wepickthe
MonteCarloestimatorforaEuropeancalloptionvalueintheBlack­
Scholes­Mertonsetup(seealsoChapter3forthesameexample).Inthis
setup,theunderlyingoftheoptiontobevaluedfollowsthestochastic
underlyingattimet;ristheconstant,risklessshortrate; ᵰis theconstant
differentialequation(SDE),asinEquation8­2.Stisthevalueofthe
instantaneousvolatility; andZt is aBrownianmotion.
Equation8­2.Black­Scholes­MertonSDE
Equation8­3,whereST(i)i
TheMonteCarloestimatorforaEuropeanc
s theith simulatedvalueoftheunderlyingat
al optionisgivenby
maturityT.
Equation8­3.MonteCarloestimatorforEuropeancalloption
AfunctionimplementingtheMonteCarlovaluationfortheBlack­Scholes­
Mertonset­upcouldlooklikethefollowing,ifweonlyallowthestrikeof
theEuropeancal optiontovary:
In [35]: def bsm_mcs_valuation(strike):
'''DynamicBlack-Scholes-Merton Monte Carlo estimator
forEuropeancalls.
Parameters
==========
strike:
strikefloatprice of the option
Results: float
value
=======
estimate for present value of call option
S0'''import= 100.;numpyT=1.0;
as np r = 0.05; vola = 0.2
dtM ==50;=T /np.random.standard_normal((M
rand IM= 20000 + 1, I))
tnp.zeros((M
forS = S[t] in range(1, + M1,+I));1): S[0]-0.5
= S0
= S[t-1] * np.exp((r+ vola * np.sqrt(dt)*
* vola**2)* dt
rand[t]) value = (np.exp(-r * T)
/ I) return value * np.sum(np.maximum(S[-1] - strike,0))
TheSequentialCalculation
Asthebenchmarkcasewetakethevaluationof100optionswithdifferent
strikeprices.Thefunctionlistseq_value
estimatorsandreturns calculatestheMonteCarlo esults:
objectscontainingstrikesandvaluationr
In [36]: def seq_value(n):
'''Sequential option valuation.
Parameters
==========
n:intnumber of option valuations/strikes
'''strikes= np.linspace(80, 120,n)
option_values
for strike in =strikes:
[]
returnoption_values.append(bsm_mcs_valuation(strike))
strikes, option_values
strikes,numberoption_values_seq
In [37]: n=%time100# ofoptionsto be= valued
seq_value(n)
Wall time:CPU11.7times:s user 11.7 s, sys: 1e+03 0s, total: 11.7 s
Out[37]:
Theproductivityi
valuationresults: s roughly8.5optionspersecond.Figure8­1showsthe
In [38]: import matplotlib.pyplot
%matplotlib inline as plt
plt.figure(figsize=(8,
plt.plot(strikes, 4)) 'b')
option_values_seq,
plt.plot(strikes,
plt.grid(True) option_values_seq, 'r.')
plt.xlabel('strikes')
plt.ylabel('European call option values')
Figure8­1.EuropeancalloptionvaluesbyMonteCarlosimulation
TheParallelCalculation
Fortheparallelcalculationofthe100optionvalues,weuse
IPython.parallel
startedviathe andalocal“cluster.”Alocalclusteri
Clusters s mosteasily
tabintheIPythonNotebookdashboard.The
numberofthreadstobeusedofcoursedependsonthemachineandthe
processoryouarerunningyourcodeon.Figure8­2showstheIPython
pageforstartingacluster.
Figure8­2.Screenshot of IPythonclusterpage
IPython.parallel needstheinformationonwhichclustertouseforthe
“default”profile.Inaddition,weneedtogenerateaviewonthec
parallelexecutionofcode.Inthiscase,theclusterprofileis storedinthe
luster:
In [39]: from IPython.parallel import Client
cview= Client(profile="default")
=c.load_balanced_view()
Thefunctionimplementingtheparallelvaluationoftheoptionslooksrather
similartothesequentialimplementation:
In [40]: def par_value(n):
'''Parallel option valuation.
Parameters
==========
n :intnumber of option valuations/strikes
'''strikes= np.linspace(80, 120, n)
option_values =[]
strike instrikes:
for value=view.apply_async(bsm_mcs_valuation,
option_values.append(value) strike)
c.wait(option_values)
return strikes, option_values
Therearetwomajordifferencestonote.Thef
functionisappliedasynchronouslyvia i r s
view.apply_sync t i s thatthevaluation
toourcluster
view,whichineffecti niticanates bevalued
ntil thequeueiscompletelyfinished;t hinis parallelbecausethereare
u(generally)notenoughcores/threadsavailable.Therefore,wehave
Ofcourse,notalloptions theparallelvaluationofa
is accomplishedbythe waitto wait
l optionsatonce.
method of the Client object c. Whenal resultsareavailable,thefunction
returns,asbefore,
valuationr list objectscontainingthestrikepricesandthe
Executionoftheparallel
esults, respectively.
valuationfunctionyieldsaproductivitythat
ideallyscaleslinearlywiththenumberofcores(threads)available.For
example,havingeight cores (threads)availablereducestheexecutiontime
tomaximallyone­eighthofthetimeneededforthesequentialcalculation:
In [41]: %time strikes, option_values_obj = par_value(n)
Out[41]: Wall
CPU times: user s415 ms, sys: 30 ms, total: 445 ms
time: 1.88
Theparallelexecutiondoesnotreturnoptionvaluesdirectly;i
returnsmorecomplexresultobjects: t rather
In [42]: option_values_obj[0].metadata
Out[42]: {'after': [], datetime.datetime(2014, 9, 28, 16, 6, 54,
'completed':
93979), 'data':{},
'engine_id':
'engine_uuid': 5,u'6b64aebb-39d5-49aa-9466-e6ab37d3b2c9',
'follow':
'msg_id': [],u'c7a44c22-b4bd-46d7-ba5e-34690f178fa9',
'outputs': [], True,
'outputs_ready':
'pyerr':
'pyin': None,
None,
'received':None,datetime.datetime(2014, 9, 28, 16, 6, 54,
97195), 'pyout':
921633), 'started': datetime.datetime(2014, 9, 28, 16, 6, 53,
'status': '',u'ok',
'stderr':
'stdout': '',datetime.datetime(2014, 9, 28, 16, 6, 53,
917290)} 'submitted':
Thevaluationresultitself is storedinthe result attributeoftheobject:
In [43]: option_values_obj[0].result
Out[43]: 24.436651486350289
Toarriveataresultslistaswiththesequentialcalculation,weneedtoread
thesingleresultsoutfromthereturnedobjects:
In [44]: option_values_par =[]
for resinoption_values_obj:
option_values_par.append(res.result)
Thiscouldhavebeendone,ofcourse,intheparallelvaluationloop
irectly. Figure 8­3 comparesthevaluationresultsofthesequential
dcalculationwiththoseoftheparallelcalculation.Differences
numericalissuesconcerningtheMonteCarlovaluation: aredueto
In [45]: plt.figure(figsize=(8,
plt.plot(strikes, 4)) 'b',
option_values_seq,
label='Sequential')
plt.plot(strikes,plt.legend(loc=0)
plt.grid(True); option_values_par, 'r.', label='Parallel')
plt.xlabel('strikes')
plt.ylabel('European call option values')
Figure8­3.ComparisonofEuropeancalloptionvalues
PerformanceComparison
Withthehelpofthe perf_comp_func function,wecancomparethe
performanceabit morerigorously:
= 50 # number
In [46]: nfunc_list = of option'par_value']
['seq_value', valuations
data_list = 2*['n']
In [47]: perf_comp_data(func_list, data_list)
Out[47]:
1.0 function: par_value, av. time sec: 0.90832, relative:
6.3 function: seq_value, av. time sec: 5.75137, relative:
Theresultsclearlydemonstratethatusing IPython.parallel
executionoffunctionscanleadtoanalmostlinearscalingofthe forparallel
performancewiththenumberofcoresavailable.
multiprocessing
Theadvantage of IPython.parallel isthatitscalesoversmall­and
medium­sizedclusters( e . g . , with256nodes).Sometimesi
helpfultoparallelizecodeexecutionlocally.Thisi t i s , however,
s wherethe“standard”
multiprocessing moduleof Python mightprovebeneficial:
In [48]: import multiprocessing as mp
ConsiderthefollowingfunctiontosimulateageometricBrownianmotion:
In [49]: import math
def simulate_geometric_brownian_motion(p):
M,I=
#S0 =time100;p steps, pathssigma = 0.2; T = 1.0
dt#=T/ r= 0.05;
modelMparameters
paths
paths[0]in= np.zeros((M
fort = S0 M ++ 1):1, I))
range(1,
paths[t]
2)np.random.standard_normal(I))
* dt + = paths[t - 1] * np.exp((r - 0.5 * sigma **
sigma * math.sqrt(dt) *
return paths
Thisfunctionreturnssimulatedpathsgiventheparameterizationfor
I: M and
In [50]: paths
paths = simulate_geometric_brownian_motion((5, 2))
Out[50]: array([[[ 100.93.65851581,, 100.],
94.70157252, 98.93916652],
[[96.73499004, 93.44208625],
97.88294562],
[110.64677908, 96.04515015],
[ 124.09826521, 101.86087283]])
Letusimplementatest seriesonaserverwitheightcoresandthefollowing
parametervalues.Inparticular,wewanttodo100simulations:
In [51]: IM =10000
=100 # #number
numberofoftimepathssteps
t =100 # number of tasks/simulations
In [52]: #running on server
import
from time= [] time with 8 cores/16 threads
times
for pool
t0w in= =time()
range(1, 17):
#resultthemp.Pool(processes=w)
=poolpool.map(simulate_geometric_brownian_motion,
ofworkers
t*[(M,I),])
tuples times.append(time()the- t0)function to the list of parameter
# the mapping of
Weagaincometotheconclusionthatperformancescaleswiththenumber
ofcoresavailable.Hyperthreading,however,doesnotaddmuch(ori s even
worse)inthiscase,asFigure8­4il ustrates:
In [53]: plt.plot(range(1, 17), times)
plt.plot(range(1,
plt.xlabel('number17),of processes')
plt.grid(True) times,'ro')
plt.ylabel('timeMontein Carlo
plt.title('%d seconds')simulations' % t)

Figure8­4.Executionspeeddependingonthenumberofthreadsused(eight­
coremachine)
EASY PARALLELIZATION
Manyproblemsinfinanceallowfortheapplicationofsimple
betweeninstancesofanalgorithm.The multiprocessing moduleof
parallelizationtechniques,forexample,whennodataisshared
Pythonallows ustoefficientlyharnessthepowerofmodernhardware
architectureswithoutingeneralchangingthebasicalgorithmsand/or
Python functionstobeparallelized.
DynamicCompiling
Numba is anopensource, NumPy­aware optimizingcompilerfor Python
code.ItusestheLLVMcompileri n f r a s t r u c t u r e [ 3
codetomachinecodeespeciallyforuseinthe NumPy runtimeand SciPy 1 ] tocompile Pythonbyte
modules.
IntroductoryExample
Letuss t a r t withaproblemthattypicallyleadstoperformanceissuesin
Python:
problem:algorithmswithnestedloops.Asandboxvariantcanil ustrate the
In [54]: from math 0import
def res=
f_py(I, J): cos, log
foriforinjrange(I):
in rangeint(cos(log(1)))
(J):
res+=
return res
Inasomewhatcompute­intensiveway,this functionreturnsthetotal
numberofloopsgiventheinputparameters
5,000leadsto25,000,000loops: I and J. Settingbothequalto
In [55]: I,%timeJ =f_py(I,
5000, 5000J)
Out[55]: CPUWalltimes:
time: user
15.2 17.4
s s, sys: 2.3 s, total: 19.7 s
25000000
Inprinciple,this canbevectorizedwiththehelpof NumPyndarray objects:
In[56]: deff_np(I,
return J): J), dtype=np.float64) a
a = np.ones((I,
int(np.sum(np.cos(np.log(a)))),
In [57]: %time res, a = f_np(I, J)
Out[57]: CPUWalltime:
times: user
1.65 1.41
s s, sys: 285 ms, total: 1.69 s
Thisi s muchfaster, roughlybyafactorof8–10times,butnotreally
memory­efficient.Thendarrayobjectconsumes200MBofmemory:
In [58]: a.nbytes
Out[58]: 200000000
IcertainsizeofRAM.
and J caneasilybechosentomakethe NumPy approachinfeasiblegivena
Numba providesanattractivealternativetotacklethe
performanceissueofsuchloopstructureswhilepreservingthememory
efficiencyofthepure Python approach:
In [59]: import numba as nb
With Numbayou onlyneedtoapplythe jitfunction tothepure Python
functiontogeneratea Python­callable, compiledversionofthefunction:
In [60]: f_nb = nb.jit(f_py)
Aspromised,t
Python i n t e r p r htiesrnewfunctioncanbecalleddirectlyfromwithinthe
e , realizingasignificantspeedupcomparedtothe NumPy
vectorizedversion:
In [61]: %time f_nb(I, J)
Out[61]: Wall time: 139userms143 ms, sys: 12 ms, total: 155 ms
CPU times:
25000000L
Again,l e t uscomparetheperformanceofthedifferentalternativesab
moresystematically: i t
In [62]: func_list
data_list == ['f_py',
3 *['I, 'f_np',
J'] 'f_nb']
In [63]: perf_comp_data(func_list, data_list)
Out[63]: function:
function: f_nb,
f_np, av.av. time
time sec:
sec: 0.02022,
1.67494, relative:
relative: 82.81.0
function: f_py, av. time sec: 15.82375, relative: 782.4
Numbarversionofthenestedloopimplementationisbyfarthef
Themuchfaste a
eventhanthe NumPy vectorizedversion.Thepure Pythons t e s t ;
versionis muchslowerthantheothertwoversions.
QUICK WINS
Manyapproachesforperformanceimprovements(ofnumerical
algorithms)involveconsiderablee f ort. With Python and Numba you
haveanapproachavailablethatinvolvesonlythesmallesteffort
possible—ingeneral,importingthelibraryandasingleadditionalline
ofcode.Itdoesnotworkforal kindsofalgorithms,butit is often
wortha(quick)tryandsometimesindeedyieldsaquickwin.
BinomialOptionPricing
TheprevioussectionusesMonteCarlosimulationtovalueEuropeancal
options,usingaparallelcomputingapproach.Anotherpopularnumerical
methodtovalueoptionsisthebinomialoptionpricingmodelpioneeredby
Mertonsetup,thereisariskyasset,anindexorstock,andarisklessasset,a
Cox,Ross,andRubinstein(1979).Inthismodel,asintheBlack­Scholes­
St =Ssanofindexlevel
maturity theoptionatistimesofS
bond.AswithMonteCarlo,therelevanttimeintervalfromtodayuntilthe
Given dividedintogenerallyequidistantsubintervals,ᵮt.
s, theindexlevelatt=s+ᵮtisgivenby
·m, wheremasiswellas
chosenrandomlyfromfrom{u,d}with
.ris theconstant,risklessshortr a t e . The
risk­neutralprobabilityforanup­movementisgivenas
thata
Consider parameterizationforthemodelisgiven follows: as .
In [64]: S0# model
= 100.& option parameters
#initial index level
Tr == 1.0.05# call option maturity
vola = 0.20# constant
# constantshortvolatility
rate factor of diffusion
dtM# time
==T/ length oftime interval
1000parameters
M# #timesteps
df = exp(-r * dt) # discount factor per time interval
u# =binomial
exp(volaparameters
*sqrt(dt)) # up-movement
1 / u #* down-movement
dq ==(exp(r dt) - d) / (u - d) # martingale probability
AnimplementationofthebinomialalgorithmforEuropeanoptionsconsists
mainlyoftheseparts:
Indexlevelsimulation
Simulatestepbysteptheindexlevels.
Innervaluecalculation
Calculatetheinnervaluesat maturityand/orat everytimestep.
Risk­neutraldiscounting
Discountthe(expected)innervaluesat maturitystepbysteptoarriveat
thepresentvalue.
Python this mighttakeontheformseeninthefunction
InThisfunctionuses binomial_py.
NumPyndarray objectsasthebasicdatastructureand
sketched:
implementsthreedifferentnestedloopstoaccomplishthethreestepsjust
In [65]: import numpy as np
def binomial_py(strike):
'''Binomial option pricing via looping.
Parameters
==========
strike: strikefloatprice of the European call option
'''#S LOOP 1-Index+1,LevelsM + 1), dtype=np.float64)
=np.zeros((M
#z1 index
S[0, 0] = level
S0 array
for=j0in xrange(1, M + 1, 1):
forz1 =iz1in+xrange(z1
1 + 1):
S[i, j] = S[0, 0] * (u ** j) * (d ** (i * 2))
#LOOP
iv#inner 2 -value
Innerarray
= np.zeros((M Values
+ 1, M + 1), dtype=np.float64)
z2for=j0in xrange(0, M + 1, 1):
for iv[i,
i in xrange(z2
j]= + 1):j] - strike, 0)
max(S[i,
z2 =z2 + 1
#pv=LOOPnp.zeros((M+1,
3 - Valuation M + 1), dtype=np.float64)
iv[:, M]#
# presentvalue
pv[:,M]= arrayinitialize last time point
z3for=jMin+ 1xrange(M - 1, -1, -1):
z3for=iinz3-xrange(z3):
1
pv[i, j] = (q(1-* pv[i,
q) * jpv[i+ 1]+ +1, j + 1]) * df
ThisfunctionreturnsthepresentvalueofaEuropeanc
return pv[0, 0] al optionwith
parametersasspecifiedbefore:
In [66]: %time round(binomial_py(100), 3)
Out[66]: CPUWalltimes:
time: user
3.64 4.18
s s, sys: 312 ms, total: 4.49 s
10.449
Wecancomparet h i s resultwiththeestimatedvaluetheMonteCarlo
function bsm_mcs_valuation returns:
In [67]: %time round(bsm_mcs_valuation(100), 3)
Out[67]: Wall time: 126userms133 ms, sys: 0 ns, total: 133 ms
CPUtimes:
10.318
MonteCarlovaluationasimplementedwith bsm_mcs_valuation is nottoo
Thevaluesaresimilar.Theyareonly“similar”andnotthesamesincethe
differentestimates.20,000pathspersimulationcanalsobeconsidereda
precise,inthatdifferentsetsofrandomnumberswillleadto(slightly) bit
valuationspeeds). By contrast,thebinomialoptionpricingmodelwith
toolowforrobustMonteCarloestimates(leading,however,tohigh
1,000timestepsis ratherprecisebutalsotakes much longerinthiscase.
Again,wecantry NumPy vectorizationtechniquestocomeupwithequally
precisebutfasterresultsfromthebinomialapproach.The binomial_np
functionmightseemab i t cryptica t f i r s t sight;however,whenyoustep
throughtheindividualconstructionstepsandinspectther e s u l t s , itbecomes
clearwhathappensbehindthe(NumPy)scenes:
In [68]: def binomial_np(strike):
''' Binomial option pricing with NumPy.
Parameters
==========
strike: strike floatprice of the European call option
'''#Index Levels with NumPy
mu = np.resize(mu,(M
np.arange(M + 1) + 1, M + 1))
mumdmd === dunp.transpose(mu)
** md(mu - md)
**
S = S0 * mu * md
#Valuation
pvz ==0np.maximum(S Loop - strike, 0)
for pv[0:Mtin range(M- z, t]- =1,+(q(1-1,*Lpv[0:M #-backward
z, t +z1]+iteration
-1):q) * pv[1:M- 1, t + 1])
* df
returnz +=pv[0,1 0]
Letusbrieflytakealookbehindthescenes.Forsimplicityandreadability,
consideronlyM=4timesteps.Thefirststep:
# four time+ steps
In [69]: Mmu==4np.arange(M 1) only
Out[69]: array([0, 1, 2, 3, 4])
Thesecondstepoftheconstruction:
In [70]: mumu = np.resize(mu, (M + 1, M + 1))
Out[70]: array([[0,[0, 1,1, 2,2, 3,3, 4],4],
[0,[0, 1,1, 2,2, 3,3, 4],4],
[0, 1, 2, 3, 4]])
Thethirdone:
In [71]: mdmd = np.transpose(mu)
Out[71]: array([[0,[1, 0,1, 0,1, 0,1, 0],1],
[2,[3, 2,3, 2,3, 2,3, 2],3],
[4, 4, 4, 4, 4]])
Thefourthandfifth steps:
In [72]: mumu.round(3)
= u ** (mu - md)
Out[72]: array([[[ 0.987,
1. , 0.994,
0.994, 1.006, 1.1.006,
1., 1.013, 1.013,
1.019, 1.026],
1.019],
1.013],
[ 0.981, 0.987, 0.994,, 1.1.006,, 1.006],
[0.975, 0.981, 0.987, 0.994, 1. ]])
In [73]: mdmd.round(3)
= d ** md
1. , 1.,
Out[73]: array([[[0.994, 0.994, 1.,
0.994, 1.,
0.994, 1. ],
0.994],
[[ 0.981,
0.987, 0.981, 0.987, 0.981,
0.987, 0.981, 0.987, 0.981],
0.987],
[ 0.975, 0.975, 0.975, 0.975, 0.975]])
Finally,bringingeverythingtogether:
In [74]: S=S0
S.round(3)* mu * md
Out[74]: array([[[ 100.98.743,, 100.634,
99.37 , 101.273,
100. , 101.915,
100.634, 102.562],
101.273],
[[ 97.502,
96.276, 98.121,
96.887, 98.743,
97.502, 99.37,
98.121, 100.98.743],],
[ 95.066, 95.669, 96.276, 96.887, 97.502]])
neededinprinciple,theapproachi
importance.Althoughwe s, as expected, muchhisfasterthanthef
Fromthendarrayobject Sdo, onlytheuppertriangularmatrixis
morecalculationswitht of iarerst
approachthan
version,whichrelies heavilyonnestedloopsonthe Python level:
# reset number of time3) steps
In [75]: M%time= 1000round(binomial_np(100),
Out[75]: CPUWalltimes:
time: user
304 ms308 ms, sys: 6 ms, total: 314 ms
10.449
Numba hasprovenavaluableperformanceenhancementtoolforour
sandboxexample.Here,i t canproveits worthinthecontextofavery
importantfinancialalgorithm:
In [76]: binomial_nb = nb.jit(binomial_py)
In [77]: %time round(binomial_nb(100), 3)
Out[77]: CPUWalltimes:
time: user
1.59 1.71
s s, sys: 137 ms, total: 1.84 s
10.449
Wedonotyetseeasignificantspeedupoverthe
sincethef i r s t c a l NumPy vectorizedversion
ofthecompiledfunctioninvolvessomeoverhead.
lTherefore,usingthe
iperformance.Obviously,theperf_comp_func functionshallshedamorerealistic
ght onhowthethreedifferentimplementationscomparewithregardto
fasterthantheNumPyversion: Numba compiledversionisindeedsignificantly
In [78]: func_list
= 100. == ['binomial_py',
Kdata_list 'binomial_np', 'binomial_nb']
3 * ['K']
In [79]: perf_comp_data(func_list, data_list)
Out[79]:
1.0 function: binomial_nb, av. time sec: 0.14800, relative:
2.1 function: binomial_np, av. time sec: 0.31770, relative:
22.8 function: binomial_py, av. time sec: 3.36707, relative:
Insummary,wecans t a t
Efficiency:using Numba e thefollowing:
involvesonlyalt alit;laeladditionale f ort. Thes cal
originalfunctioni s oftennotchangeda youneedtodoi
thejitfunction.
Speed­up: Numbaoften
speed,notonly leadstosignificantimprovementsinexecution
comparedtopure Python butalsotovectorized NumPy
implementations.
Memory:withNumbathereisnoneedtoi nitialize largearrayobjects;(as
thecompilerspecializesthemachinecodetotheproblemathand
efficiency,aswithpure Python. NumPy) andmaintainsmemory
comparedtothe“universal”functionsof
StaticCompilingwithCython
arbitraryfunctions.However, Numba willonly“effortlessly”generate
ThestrengthofNumbaistheeffortlessapplicationoftheapproachto
significantperformanceimprovementsforcertaintypesofproblems.
Anotherapproach,whichi
therouteofstaticcompilingwith Cython. Inef ect, Cythonis ahybridtogo
s moreflexiblebutalsomoreinvolved,is
languageofPythonand
benoticedarethes t a t i c C. Comingfrom Python,C)themajordifferencesto
typedeclarations(asin andaseparatecompiling
step(aswithanycompiledlanguage).
Asasimpleexamplefunction,considerthefollowingnestedloopthatagain
loopi terations. Insuchtoacase,
troubleswhenyoutry NumPy foraspeedup:
example,thistimethenumberofinnerloopiterationsisscaledbytheouter
returnssimplythenumberofloops.Comparedtothepreviousnestedloop
applyyouwillprettyquicklyrunintomemory
In [80]: def f_py(I,
resfori= 0.inJ):range(I):
# we work on a float object
for res+=1
j in range (J * I):
return res
ndarrayobjectPythonallowingustovectorizethefunctionf_py
Letuscheck performanceforI =500 andJ =500.insuchacase
A NumPy
wouldalreadyhavetohaveashapeof (500, 250000):
In [81]: I,%timeJ =500,
f_py(I,500J)
Out[81]: Wall
CPU times: user 17s s, sys: 2.72 s, total: 19.7 s
time: 14.2
125000000.0
ile is .pyx. t takestheverysame
thatthesuffixofthis Cythontaticftypedeclarationsforusewith
ConsidernextthecodeshowninExample8­1.I
functionandintroducess Cython.Note
Example8­1.NestedloopexamplewithCythons
# t a t i c typedeclarations
## Nested loop example with Cython
nested_loop.pyx
#def f_cy(int I, int J):
cdef
#foridoubledouble resmuch= 0 slower than int or long
float
forinres+=1
range(I): (J * I):
jinrange
return res
Insuchasimplecase,whennospecial
to C modulesareneeded,therei
easyway importsuchamodule—namely,via pyximport: s an
In [82]: import pyximport
pyximport.install()
Out[82]: (None, <pyximport.pyximport.PyxImporter at 0x92cfc10>)
Thisallowsusnowtodirectlyimportfromthe Cython module:
import sys
In [83]: sys.path.append('data/')
to theifCython
## notpathneeded in samescriptdirectory
In [84]: from nested_loop import f_cy
Now,wecanchecktheperformanceofthe Cython function:
In [85]: %time res = f_cy(I, J)
Out[85]: CPUWalltimes:
time: user
153 ms154 ms, sys: 0 ns, total: 154 ms
In [86]: res
Out[86]: 125000000.0
Whenworkingin IPythonNotebook
Cython—cythonmagic: therei s amoreconvenientwaytouse
In [87]: %load_ext cythonmagic
Loadingthis extensionfromwithinthe IPython Notebook allowsusto
compilecodewith Cython fromwithintheto l:
In [88]: %%cython
# Nested loop example with Cython
#def f_cy(intI, int J):=0
floatmuch
#cdefdoubledoubleres
in range (Jslower
for fori injrange(I): * I):than int or long
return resres += 1
Theperformanceresultsshould,ofcourse,be(almost)thesame:
In[89]: %time res = f_cy(I, J)
Out[89]: Wall time: 154userms156 ms, sys: 0 ns, total: 156 ms
CPU times:
In [90]: res
Out[90]: 125000000.0
straightforward asNumbabefore:candointhis case.Theapplicationis as
Letusseewhat
In [91]: import numba as nb
In [92]: f_nb = nb.jit(f_py)
Theperformanceis—wheninvokingthefunctionforthef
thanthatofthe Cython version( i r s t time—worse
recal thatwiththefirstcallofthe Numba
compiledfunctiontherei s alwayssomeoverheadinvolved):
In [93]: %time res = f_nb(I, J)
Out[93]: CPUWalltimes:
time: user
273 ms285 ms, sys: 9 ms, total: 294 ms
In [94]: res
Out[94]: 125000000.0
Finally,themorerigorouscomparison—showingthatthe
indeedkeepsupwiththe Cython version(s): Numba version
In [95]: I,func_list
J = =
500, ['f_py',
500 'f_cy', 'f_nb']
data_list = 3 * ['I, J']
In [96]: perf_comp_data(func_list, data_list)
Out[96]: function:
function: f_nb,
f_cy, av.av. time
time sec:
sec: 0.15162,
0.15275, relative:
relative: 1.01.0
function: f_py, av. time sec: 14.08304, relative: 92.9
GenerationofRandomNumbersonGPUs
Thelasttopicinthis chapteristheuseofdevicesformassivelyparallel
operations—i.e.,GeneralPurposeGraphicalProcessingUnits(GPGPUs,
orsimplyGPUs).TouseanNvidiaGPU,weneedtohaveCUDA(Compute
Unified
iNumbaPro, Device Architecture, c f . https://developer.nvidia.com)
nstal ed. AneasywaytoharnessthepowerofNvidiaGPUsistouse
compiles aperformancelibrarybyContinuumAnalyticsthatdynamically
Python codefortheGPU(oramulticoreCPU).
Python
ThischapterdoesnotallowustogointothedetailsofGPUusagefor
programming.However,thereis onefinancialfieldthat canbenefit
stronglyfromtheuseofaGPU:MonteCarlosimulationand
(pseudo)randomnumbergenerationinp a r t i c u l a r . [ 3 2 ] Inwhatfollows,we
GPU: CUDA library curand togeneraterandomnumbersonthe
usethenative
In [97]: from numbapro.cudalib import curand
two­dimensional array ofstandardnormallydistributedpseudorandom
Asthebenchmarkcase, we defineafunction,using NumPy,that deliversa
numbers:
In [98]: def get_randoms(x,
rand= y):
np.random.standard_normal((x, y))
return rand
First, let’s checkif it works:
In [99]: get_randoms(2, 2)
Out[99]: array([[-0.30561007,
[-0.04382143, 1.33124048],
2.31276888]])
NowthefunctionfortheNvidiaGPU:
In [100]: def get_cuda_randoms(x,
rand#=randnp.empty((x *y):y), np.float64)for therandoms
serves asacontainer
prng##the
=CUDAcurand.PRNG(rndtype=curand.PRNG.XORWOW)
only fills1-dimensional arrays
argument0,1)
sets the# filling
random number algorithm
prng.normal(rand,
rand#=tobe
rand.reshape((x, the container
y)) rand to 2 dimensions
return rand "fair", we reshape
Again,abriefcheckofthefunctionality:
In[101]: get_cuda_randoms(2, 2)
Out[101]: array([[[0.89437398,
1.07102161, -0.86693007]])
0.70846868],
Andafirstcomparisonoftheperformance:
In [102]: %timeit a = get_randoms(1000, 1000)
Out[102]: 10 loops, best of 3: 72 ms per loop
In [103]: %timeit a = get_cuda_randoms(1000, 1000)
Out[103]: 100 loops, best of 3: 14.8 ms per loop
Now,amoresystematicroutinetocomparetheperformance:
In [104]: step
import= time
1000 as t
def time_comparsion(factor):
cpu_times ==list()
cuda_times list()
for ji in= jrange(1,
*factor 10002, step):
at0==t.time()
get_randoms(i, 1)
cpu_times.append(t1 - t0)
t1=t.time()
at2==t.time()
get_cuda_randoms(i,-t2)1)
printt3=t.time()
cuda_times.append(t3
"Bytesof largest array
return cuda_times, cpu_times %i" % a.nbytes
Andahelperfunctiontovisualizeperformanceresults:
In[105]: defplot_results(cpu_times,
plt.plot(x * factor, cuda_times, label='NUMPY')
cpu_times,'b', factor):
plt.plot(x * factor, cuda_times, 'r', label='CUDA')
plt.legend(loc=0)
plt.grid(True)
plt.xlabel('size of random number array')
plt.axis('tight')
plt.ylabel('time')
Let’stakealookat thefirst test serieswithamediumworkload:
In [106]: cuda_times,
factor = 100cpu_times = time_comparsion(factor)
Out[106]: Bytes of largest array 8000800
CalculationtimefortherandomnumbersontheGPUi s almost
independentofthenumberstobegenerated.Byconstrast,timeontheCPU
risessharplywithincreasingsizeoftherandomnumberarraytobe
generated.BothstatementscanbeverifiedinFigure8­5:
In [107]: x = np.arange(1, 10002, step)
In [108]: plot_results(cpu_times, cuda_times, factor)

Figure8­5.RandomnumbergenerationonGPUandCPU(factor=100)
Nowlet’s lookat thesecondtest series, withaprettylowworkload:
In [109]: cuda_times,
factor = 10 cpu_times = time_comparsion(factor)
Out[109]: Bytesof largest array 800080
TheoverheadofusingtheGPUistoolargeforlowworkloads—something
quiteobviousfrominspectingFigure8­6:
In [110]: plot_results(cpu_times, cuda_times, factor)

Figure8­6.RandomnumbergenerationonGPUandCPU(factor=10)
Nowl e t ’ s considerat e s t serieswithacomparativelyheavyworkload.The
largestrandomnumberarrayis400MBinsize:
In [111]: %%time
factor = 5000cpu_times = time_comparsion(factor)
cuda_times,
Out[111]: Bytes of largest
CPU times: user 22arrays, sys:400040000
3.52 s, total: 25.5 s
Wall time: 25.4 s
ForheavyworkloadstheGPUclearlyshowsi
impressivelyil ustrates: ts advantages,asFigure8­7
In [112]: plot_results(cpu_times, cuda_times, factor)

Figure8­7.RandomnumbergenerationonGPUandCPU(factor=5,000)
Conclusions
Nowadays,the Python
of ecosystemprovidesanumberofwaystoimprove
theperformance
Paradigms code:
Some Python paradigmsmightbemoreperformantthanothers,givena
specificproblem.
Libraries
Thereisawealthoflibrariesavailablefordifferenttypesofproblems,
whichoftenleadtomuchhigherperformancegivenaproblemthatf
intothescopeofthelibrary( e.g., numexpr). its
Anumberofpowerfulcompilingsolutions are available, including
Compiling
static (e.g., Cython) anddynamicones(e.g., Numba).
Parallelization
SomePythonl ibraries havebuilt­inparallelizationcapabilities(e.g.,
numexpr),
NumbaPro). whileothersallowustoharnessthef
coreCPUs,wholeclusters(e.g., IPython.parallel),or ul powerofmultiple­
GPUs(e.g.,
AmajorbenefitofthePythonecosystemi s thata l theseapproaches
generallyareeasilyimplementable,meaningthattheadditionaleffort
(even fornonexperts).Inotherwords,
includedisgenerallyquitelow
performanceimprovementsoftenarelow­hangingf r u i t s giventhe
performancelibraries availableasoftoday.
FurtherReading
Forallperformancelibraries introducedinthis chapter,therearevaluable
webresourcesavailable:
Fordetailsonnumexpr seehttp://github.com/pydata/numexpr.
IPython.parallel
doc/stable/parallel. is explainedhere:http://ipython.org/ipython­
Findthedocumentationforthe multiprocessing modulehere:
https://docs.python.org/2/library/multiprocessing.html.
Informationon Numba canbefoundat http://github.com/numba/numba.
http://cython.orgis thehomeofthe Cython compilerproject.
ForthedocumentationofNumbaPro,referto
http://docs.continuum.io/numbapro.
Forareferenceinbookform,seethefollowing:
Gorelick,Mishaand Ian
O’Reilly,Sebastopol,CA.Ozsvald(2014):HighPerformancePython.
[31]Formerly, LLVM wasmeanttobeanacronymfor
now“it is thefullnameoftheproject.” Low Level Virtual Machine;
[32] SeealsoChapter10onthesetopics.
Chapter9.MathematicalTools
Themathematiciansarethepriestsofthemodernworld.
—BillGaede
Sincethearrivaloftheso­calledRocketScientistsonWallStreetinthe
’80sand’90s,financehasevolvedintoadisciplineofappliedmathematics.
Whileearlyresearchpapersinfinancecamewithfewmathematical
around.
mathematicalexpressionsandequations,withsomeexplanatorytext
expressionsandequations,currentonesaremainlycomprisedof
Thischapterintroducesanumberofusefulmathematicaltoolsforfinance,
withoutprovidingadetailedbackgroundforeachofthem.Therearemany
usefulbooksonthis topicavailable.Therefore,this chapterfocusesonhow
tousethetoolsandtechniqueswith Python. Amongothertopics, it covers:
Approximation
Regressionandinterpolationareamongthemostoftenusednumerical
techniques in finance.
Convexoptimization
(Anumberoffinancialdisciplinesneedtoolsforconvexoptimization
e.g., optionpricingwhenitcomestomodelcalibration).
Integration
Inparticular,thevaluationoffinancial(derivative)assetsoftenboils
downtotheevaluationofintegrals.
Python provideswith SymPy a powerfultoolforsymbolicmathematics,
Symbolicmathematics
e.g., tosolve(systemsof)equations.
Approximation
Tobeginwith,letusimportthelibraries thatweneedforthemoment
—NumPyand matplotlib.pyplot:
In [1]: import
import numpyas np as plt
matplotlib.pyplot
%matplotlib inline
following,whichiscomprised of atrigonometrictermandalinearterm:
Throughoutthisdiscussion,themainexamplefunctionwewilluse is the
In [2]: def f(x):
return np.sin(x) + 0.5 * x
Themainfocusi s theapproximationoftethusgenerateaplotofthefunction
byregressionandinterpolation.First,l is functionoveragiveninterval
togetabetterviewofwhatexactlytheapproximationshallachieve.The
intervalofi nterest shallbe[–2ᵰ,2ᵰ].Figure9­1displaysthefunctionover
thefixedintervaldefinedviathelinspacefunction.
stop,num)returnsnum pointsbeginningwith start np.linspace(start,
andendingwith
stop, withthesubintervalsbetweentwoconsecutivepointsbeingevenly
spaced:
In [3]: x = np.linspace(-2 * np.pi, 2 * np.pi, 50)
In [4]: plt.plot(x, f(x), 'b')
plt.grid(True)
plt.xlabel('x')
plt.ylabel('f(x)')

Figure9­1.Examplefunctionplot
Regression
Regressionis aratherefficienttoolwhenit comestofunction
approximation. Itis notonlysuitedtoapproximateone­dimensional
functionsbutalsoworkswellinhigherdimensions.Thenumerical
techniquesneededtocomeupwithregressionresultsareeasily
implementedandquicklyexecuted.Basically,thetaskofregression,given
asetofso­calledbasisfunctionsb d,dò{1, ,D},istofindoptimal
Ģ
parameters accordingtoEquation9­1,where yi f(xi) foriò
{1,⋯,I}observationpoints.Thexiareconsideredindependent
observationsandthey i dependentobservations(inafunctionalors t a t i s t i c a l
sense).
Equation9­1.Minimizationproblemofregression

Monomialsasbasisfunctions
x, b3 =x2,b4 =x3,s to. takemonomials
One of2=thesimplestcasesi
=1,b In suchacase, NumPy has built­infunctions
as basisfunctions—i.e.,b1
forboththedeterminationoftheoptimalparameters(namely,
andtheevaluationoftheapproximationgivenasetofinputvalues polyfit)
(namely,
Table9­1l polyval).
ists theparametersthe polyfit functiontakes.Giventhe
returnedoptimalregressioncoefficientspfrompolyfit,
thenreturnstheregressionvaluesforthe x coordinates. np.polyval(p,x)
Table9­1.Parametersofpolyfit function
Parameter Description
x x coordinates(independentvariablevalues)
y y coordinates(dependentvariablevalues)
deg Degreeofthefit ing polynomial
full If True, returnsdiagnosticinformationinaddition
w Weightstoapplytothe y coordinates
cov IfTrue,covariancematrixis alsoreturned
onthefollowing form foralinearregression(ofi.epolyfitandpolyvaltakes
Intypicalvectorizedfashion,theapplication ., for deg=1):
In [5]: regry ==np.polyval(reg,
np.polyfit(x, f(x),
x) deg=1)
Giventheregressionestimatesstoredinthe ry array,wecancomparethe
course,alinearregressioncannotaccountforthesinpartoftheexample
regressionresultwiththeoriginalfunctionaspresentedinFigure9­2.Of
function:
In [6]: plt.plot(x,
plt.plot(x, f(x),
ry, 'b',label='regression')
'r.', label='f(x)')
plt.legend(loc=0)
plt.ylabel('f(x)')
plt.xlabel('x')
plt.grid(True)
Figure9­2.Examplefunctionandlinearregression
Toaccountforthe sin partoftheexamplefunction,higher­order
monomialsarenecessary.Thenextregressionattempttakesmonomialsup
totheorderof5asbasisfunctions.Itshouldnotbetoosurprisingthatthe
originalfunction.However,i
regressionresult, asseeninFigure9­3,nowlooksmuchclosertothe
tis til farawayfrombeingperfect:
In [7]: regry ==np.polyval(reg,
np.polyfit(x, f(x),
x) deg=5)
In [8]: plt.plot(x,
plt.plot(x, f(x),
ry, 'b',label='regression')
'r.', label='f(x)')
plt.ylabel('f(x)')
plt.xlabel('x')
plt.grid(True)
plt.legend(loc=0)
Thelast attempttakesmonomialsuptoorder7toapproximatetheexample
function.Int h i s casether e s u l t , aspresentedinFigure9­4,i s quite
convincing:
In [9]: regry ==np.polyval(reg,
np.polyfit(x, f(x),
x) 7)

Figure9­3.Regressionwithmonomialsuptoorder5
In [10]: plt.plot(x,
plt.plot(x, f(x),
ry, 'b',label='regression')
'r.', label='f(x)')
plt.legend(loc=0)
plt.ylabel('f(x)')
plt.xlabel('x')
plt.grid(True)
Figure9­4.Regressionwithmonomialsuptoorder7
Abriefcheckrevealsthattheresult is not perfect:
In [11]: np.allclose(f(x), ry)
Out[11]: False
However,themeansquarederror(MSE)i
narrowrangeofxvalues: s nottoolarge—atleast, overthis
In [12]: np.sum((f(x) - ry) ** 2) / len(x)
Out[12]: 0.0017769134759517413
functiontoapproximate.Inthis case, theindividualbasisfunctionshaveto
Individualbasisfunctions
Ingeneral,youcanreachbetterregressionresultswhenyoucanchoose
bettersetsofbasisfunctions,e.g.,byexploitingknowledgeaboutthe
bedefinedviaamatrixapproach(i.e., usinga NumPyndarray object).First,
thecasewithmonomialsuptoorder3:
In [13]: matrix = np.zeros((3
matrix[3, :] = x ** 3 + 1, len(x)))
matrix[2, :]:] == xx ** 2
matrix[1,
matrix[0, :] = 1
squaresoptimizationproblemsliketheone in Equation9­1:
Thesublibrary numpy.linalg providesthefunction lstsqto solveleast­
In [14]: reg = np.linalg.lstsq(matrix.T, f(x))[0]
Applying lstsq toourprobleminthis wayyieldstheoptimalparameters
forthesinglebasisfunctions:
In [15]: reg
Out[15]: array([ -5.43553615e-03])
1.13968447e-14, 5.62777448e-01, -8.88178420e-16,
Togettheregressionestimatesweapplythe
matrix arrays.Figure9­5showstheresult. dot functiontothe reg and
dotproductforthetwoarraysa and b: np.dot(a,b) simplygivesthe
In [16]: ry = np.dot(reg, matrix)
In [17]: plt.plot(x,
plt.plot(x, f(x),
ry, 'b',label='regression')
'r.', label='f(x)')
plt.legend(loc=0)
plt.grid(True)
plt.xlabel('x')
plt.ylabel('f(x)')
TheresultinFigure9­5isnotreallyasgood as expectedbasedonour
previousexperiencewithmonomials.Usingthemoregeneralapproach
allowsustoexploitourknowledgeabouttheexamplefunction.Weknow
thatthereis a sin partinthefunction.Therefore,it makessensetoinclude
highest­ordermonomial:
asinefunctioninthesetofbasisfunctions.Forsimplicity,wereplacethe
In[18]: matrix[3, :] = np.sin(x)
regry ==np.linalg.lstsq(matrix.T,
np.dot(reg, matrix) f(x))[0]
Figure9­5.Regressionvialeast­squaresfunction
Figure9­6i l u s t r a t e s thattheregressioni s nowprettyclose to theoriginal
function:
In [19]: plt.plot(x,
plt.plot(x, f(x),
ry, 'b',label='regression')
'r.', label='f(x)')
plt.xlabel('x')
plt.grid(True)
plt.legend(loc=0)
plt.ylabel('f(x)')
Figure9­6.Regressionusingindividualfunctions
Indeed,theregressionnowis “perfect”inanumericalsense:
In [20]: np.allclose(f(x), ry)
Out[20]: True
In [21]: np.sum((f(x) - ry) ** 2) / len(x)
Out[21]: 2.2749084503102031e-31
Infthe asinct, theminimizationroutinerecoversthecorrectparametersof1for
partand0.5forthelinearpart:
In [22]: reg
Out[22]: array([ 1.55428020e-16,
1.00000000e+00]) 5.00000000e-01, 0.00000000e+00,
Noisydata To il ustrate this point,let
Regression can copeequallywellwithnoisydata,beitdatafrom
simulationorfrom(non­perfect)measurements.
usgeneratebothindependentobservationswithnoiseandalsodependent
observationswithnoise:
In [23]: ynxnxn =f(xn)
==np.linspace(-2 * np.pi, 2 * np.pi, 50)
xn +0.15+ 0.25* np.random.standard_normal(len(xn))
* np.random.standard_normal(len(xn))
Theveryregressionisthe same:
In [24]: ryreg==np.polyval(reg,
np.polyfit(xn, yn,xn) 7)
Figure9­7revealsthattheregressionresultsareclosertotheoriginal
functionthanthenoisydatapoints.Inasense,theregressionaveragesout
thenoisetosomeextent:
In [25]: plt.plot(xn,
plt.plot(xn, yn,ry, 'b^',
'ro', label='f(x)')
label='regression')
plt.legend(loc=0)
plt.xlabel('x')
plt.grid(True)
plt.ylabel('f(x)')
Figure9­7.Regressionwithnoisydata
Unsorteddata
Anotherimportantaspectofregressioni s thattheapproachalsoworks
seamlesslywithunsorteddata.Thepreviousexamplesallrelyonsortedx
theindependentdatapointsasfollows:
data.Thisdoesnothavetobethecase.Tomakethepoint,letusrandomize
In [26]: yuxu == np.random.rand(50)
f(xu) * 4 * np.pi - 2 * np.pi
Intinspectingthe
his case,youcanhardlyidentifyanystructurebyj
raw data: ust visually
In [27]: print xu[:10].round(2)
print yu[:10].round(2)
Out[27]:
] [[ 4.09 0.5 1.48 -1.85 1.65 4.51 -5.7 1.83 4.42 -4.2
-1.23] 1.23 0.72 1.74 -1.89 1.82 1.28 -2.3 1.88 1.25
Aswiththenoisydata, the regressionapproachdoesnotcarefortheorder
oftheobservationpoints.Thisbecomesobviousuponinspectingthe
therstructureoftheminimizationprobleminEquation9­1.I
esults, aspresentedinFigure9­8: t is alsoobviousby
In [28]: regry==np.polyval(reg,
np.polyfit(xu, yu,xu) 5)
In[29]: plt.plot(xu, ry,yu, 'ro',
'b^', label='regression')
label='f(x)')
plt.legend(loc=0)
plt.grid(True)
plt.xlabel('x')
plt.ylabel('f(x)')
Figure9­8.Regressionwithunsorteddata
Multipledimensions
Anotherconvenientcharacteristicoftheleast­squaresregressionapproach
modifications.Asanexamplefunctionwetake fm,as presented next:
isthatitcarriesovertomultipledimensionswithouttoomany
In [30]: def fm((x,
return y)):
np.sin(x) + 0.25 * x + np.sqrt(y) + 0.05 * y ** 2
Tovisualizethis function,weneedagridof(independent)datapoints:
In [31]: yx == np.linspace(0,
np.linspace(0, 10,10, 20)20)
=np.meshgrid(x,gridsy)out of the 1-d arrays
X,#Ygenerates2-d
Zx == fm((X,Y))
y =# X.flatten()
Y.flatten()
yields 1-d arrays from the 2-d grids
Basedonthegridofindependentanddependentdatapointsasembodied
nowby X, Y, Z, Figure9­9presentstheshapeofthefunction fm:
In [32]: from
importmpl_toolkits.mplot3d
matplotlib as mpl import Axes3D
figax ==fig.gca(projection='3d')
plt.figure(figsize=(9, 6))
surf = ax.plot_surface(X, Y, Z, rstride=2, cstride=2,
cmap=mpl.cm.coolwarm,
linewidth=0.5, antialiased=True)
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('f(x, y)')
fig.colorbar(surf, shrink=0.5, aspect=5)
Figure9­9.Functionwithtwoparameters
Togetgoodregressionresultswecompileasetofbasisfunctions,
oftheexamplefunction: sqrt function,whichleveragesourknowledge
includingbothasinanda
In [33]: matrix
matrix[:,= np.zeros((len(x),
=np.sin(x) 6 + 1))
6]5] =np.sqrt(y)
matrix[:, 4]3] == y**2
matrix[:, xy **2
matrix[:, 2] =
matrix[:, 0]1] == 1x
matrix[:,
Thestatsmodelslibraryoffersthequitegeneralandhelpfulfunction
forleast­squaresregressionbothinonedimensionandmultiple OLS
dimensions:[33]
In [34]: import statsmodels.api as sm
In [35]: model = sm.OLS(fm((x, y)), matrix).fit()
Oneadvantageofusingthe OLS functionis thatitsprovidesawealthof
additionalinformationabouttheregressionandi quality.Asummaryof
theresultsisaccessedbycalling model.summary. Singlestatistics,like the
coefficientofdetermination,caningeneralalsobeaccesseddirectly:
In[36]: model.rsquared
Out[36]: 1.0
Forourpurposes,weofcourseneedtheoptimalregressionparameters,
whicharestoredinthe params attributeofour model object:
In [37]: aa = model.params
Out[37]: array([ -1.02348685e-16,
7.14706072e-15, 2.50000000e-01,
5.00000000e-02, -2.22044605e-16,
1.00000000e+00,
1.00000000e+00])
Thefunction reg_func givesback,forthegivenoptimalregression
parametersandtheindpendentdatapoints,thefunctionvaluesforthe
regressionfunction:
In [38]: def reg_func(a,
f6f5 == a[6] * (x, y)):
np.sqrt(y)
f4f3 == a[5]
a[4]
a[3]
** np.sin(x)
* yx **** 22
f2f1 == a[2]
a[1] ** yx
f0= a[0](f6*+1f5 + f4 + f3 +
return
f2 + f1 + f0)
Thesevaluescanthenbecomparedwiththeoriginalshapeoftheexample
function,asshowninFigure9­10:
In [39]: RZ =reg_func(a, (X, Y))
In[40]: figax ==fig.gca(projection='3d')
plt.figure(figsize=(9, 6))
surf1 = ax.plot_surface(X, Y, Z, rstride=2, cstride=2,
cmap=mpl.cm.coolwarm, linewidth=0.5,
antialiased=True)
surf2 = ax.plot_wireframe(X,label='regression')
Y, RZ, rstride=2, cstride=2,
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('f(x,
ax.legend() y)')
fig.colorbar(surf, shrink=0.5, aspect=5)
Figure9­10.Higher­dimensionregression
REGRESSION
Least­squaresregressionapproacheshavemultipleareasof
application,includingsimplefunctionapproximationandfunction
approximationbasedonnoisyorunsorteddata.Theseapproachescan
beappliedtosingleaswellasmultidimensionalproblems.Duetothe
underlyingmathematics,theapplicationisalways“almostthesame.”
Interpolation
Comparedtoregression,interpolation( e.g., withcubicsplines),ismuch
moreinvolvedmathematically.It isalsolimitedtolow­dimensional
problems.Givenanorderedsetofobservationpoints(orderedinthex
bytheresulting,piecewise­definedinterpolationfunction,butalsothat
datapointsinsuchawaythatnotonlyarethedatapointsperfectlymatched
dimension),thebasicideais todoaregressionbetweentwoneighboring
the
differentiabilityrequiresatleastinterpolationofdegree3—i.e.,withcubic
functioniscontinuouslydifferentiableatthedatapoints.Continuous
splines.However, the approachalsoworks in generalwithquadraticand
evenlinearsplines.First, theimportingoftherespectivesublibrary:
In [41]: import scipy.interpolate as spi
In [42]: x = np.linspace(-2 * np.pi, 2 * np.pi, 25)
Wetakeagaintheoriginalexamplefunction for il ustration purposes:
In [43]: def f(x):
return np.sin(x) + 0.5 * x
Theapplicationi t s e l f , givenan x­ordered setofdatapoints,i
theapplicationof polyfit9­2liststhemajorparametersthatthe
splrepandsplev.Table s assimpleas
and polyval. Here,therespectivefunctionsare
splrep
functiontakes.
Table9­2.Parametersofsplrepfunction
Parameter Description
x (Ordered) x coordinates(independentvariablevalues)
y (x­ordered) y coordinates(dependentvariablevalues)
w Weightstoapplytothe y coordinates
xb,xe Intervaltofit, if None[x[0], x[-1]]
k Orderofthesplinefit (1 <= k <= 5)
s Smoothingfactor(thelarger, themoresmoothing)
full_output IfTrueadditionaloutputis returned
quiet IfTruesuppressmessages
Table 9­3 lists theparametersthatthesplevfunctiontakes.
Table9­3.Parametersofsplevfunction
Parameter Description
x (Ordered) x coordinates(independentvariablevalues)
tck Sequenceoflength3returnedby splrep (knots,coefficients,degree)
der Orderofderivative(0forfunction,1forfirst derivative)
ext Behaviorifxnotinknotsequence(0extrapolate,1return0,2raise
ValueError)
Appliedtothecurrentexample,this translatesintothefollowing:
In [44]: ipo = spi.splrep(x, f(x), k=1)
In [45]: iy = spi.splev(x, ipo)
AsFigure9­11shows,theinterpolationalreadyseemsreallygoodwith
linearsplines(i.e., k=1):
In [46]: plt.plot(x,
plt.plot(x, f(x),
iy, 'b',label='interpolation')
'r.', label='f(x)')
plt.legend(loc=0)
plt.grid(True)plt.xlabel('x')
plt.ylabel('f(x)')

Figure9­11.Exampleplotwithlinearinterpolation
Thiscanbeconfirmednumerically:
In [47]: np.allclose(f(x), iy)
Out[47]: True
Splineinterpolationisoftenused in financetogenerateestimates
dependentvaluesofindependentdatapointsnotincludedintheoriginalfor
closerlooka
observations.Tothisend,letuspickamuchsmallerintervalandhavea
t theinterpolatedvalueswiththelinearsplines:
In [48]: xdiyd=np.linspace(1.0,
= spi.splev(xd, ipo)3.0, 50)
Figure9­12revealsthattheinterpolationfunctionindeedinterpolates
it is evidentthatthefunctionisnot
linearlybetweentwoobservationpoints.Forcertainapplicationsthismight
notbepreciseenough.Inaddition,
continuouslydifferentiableat theoriginaldatapoints—anotherdrawback:
In [49]: plt.plot(xd,
plt.plot(xd, f(xd),
iyd, 'b',label='interpolation')
'r.', label='f(x)')
plt.xlabel('x')
plt.grid(True)
plt.legend(loc=0)
plt.ylabel('f(x)')
Figure9­12.Exampleplot(detail) withlinearinterpolation
Therefore,let usrepeatthecompleteexercise,this timeusingcubicsplines:
In [50]: ipoiyd == spi.splrep(x,
spi.splev(xd, f(x),
ipo) k=3)
Now,thedetailedsubintervalinFigure9­13showsagraphicallyperfect
interpolation:
In [51]: plt.plot(xd,
plt.plot(xd, f(xd),
iyd, 'b',label='interpolation')
'r.', label='f(x)')
plt.xlabel('x')
plt.grid(True)
plt.legend(loc=0)
plt.ylabel('f(x)')
Figure9­13.Exampleplot(detail) withcubicsplineinterpolation
Numerically,theinterpolationis notperfect,buttheMSEis reallysmall:
In [52]: np.allclose(f(xd), iyd)
Out[52]: False
In [53]: np.sum((f(xd) - iyd) ** 2) / len(xd)
Out[53]: 1.1349319851436252e-08
INTERPOLATION
Inthosecaseswheresplineinterpolationcanbeappliedyoucan
expectbetterapproximationresultscomparedtoaleast­squares
regressionapproach.However,rememberthatyouneedtohavesorted
dimensionalproblems.Itisalsocomputationallymoredemandingand
(and“nonnoisy”)dataandthattheapproachislimitedtolow­
mightthereforetake(much)longerthanregressionincertainuse
cases.
ConvexOptimization
optimization of anagent’sutioflityoptionpricingmodelstomarketdataorthe
Examplesarethecalibration . As an examplefunctionthatwewant to
Infinanceandeconomics,convexoptimizationplaysanimportantrole.
minimize,wetake fm, asdefined in thefollowing:
In [54]: def fm((x,
return+y)):
(np.sin(x)
np.sin(y) ++ 0.05
0.05 ** xy **** 22)
In [55]: xy == np.linspace(-10,
np.linspace(-10, 10,10, 50)50)
ZX,=Yfm((X, Y)) y)
=np.meshgrid(x,
Figure9­14showsthefunctiongraphicallyforthedefinedintervalsfor
y.
minima.Theexistence ofa
and Visualinspectionalreadyrevealsthatt
globalminimumcannotreallybeconfirmedbyx
his functionhasmultiplelocal
this particulargraphicalrepresentation:
In [56]: axfig==fig.gca(projection='3d')
plt.figure(figsize=(9,Y,Z,rstride=2,
6))
surf = ax.plot_surface(X,
cmap=mpl.cm.coolwarm, cstride=2,
linewidth=0.5, antialiased=True)
ax.set_xlabel('x')
ax.set_zlabel('f(x, y)')
ax.set_ylabel('y')
fig.colorbar(surf, shrink=0.5, aspect=5)

Figure9­14.Functiontominimizewithtwoparameters
Inwhatfollows,wewanttoimplementbothaglobalminimization
approachandalocalone.Thefunctions brute and fmin thatwewantto
usecanbefoundinthesublibrary scipy.optimize:
In [57]: import scipy.optimize as spo
GlobalOptimization
procedures,we amend theoriginalfunction by nitiate theminimization
Tohaveacloserlookbehindthesceneswhenwei
parametervaluesaswellasthefunctionvalue: anoptiontooutputcurrent
In [58]: def fo((x,
z = y)): + 0.05 * x ** 2 + np.sin(y) + 0.05 * y **
np.sin(x)
2 ifoutput
returnprintz =='%8.4fTrue:%8.4f %8.4f' % (x, y, z)
Thisallowsustokeep trackof
thefollowingcodewithi ts respectiveoutputi l ustrates. brute takesthe(-10,
parameterrangesasinput.Forexample,providingparameterrange
al relevantinformationfortheprocedure,as
10.1,
10: 5)for the x valuewillleadto“tested”valuesof-10, -5, 0, 5,
In [59]: output = True ((-10, 10.1, 5), (-10, 10.1, 5)), finish=None)
spo.brute(fo,
Out[59]: -10.0000
-10.0000 -10.0000
-10.0000 11.0880
11.0880
-10.0000
-10.0000 -5.0000
0.0000 7.7529
5.5440
-10.0000
-10.0000 5.0000
10.0000 5.8351
10.0000
-5.0000 -10.0000 7.7529
-5.0000 -5.0000
-5.0000 0.0000 4.4178
2.2089
-5.0000
-5.0000 5.0000
10.0000 2.5000
6.6649
0.0000
0.0000 -10.0000
-5.0000 5.5440
2.2089
0.0000
0.0000 0.0000
5.0000 0.0000
0.2911
0.0000
5.0000 10.0000
-10.0000 4.4560
5.8351
5.0000
5.0000 -5.0000
0.0000 2.5000
0.2911
5.0000
5.0000 5.0000
10.0000 0.5822
4.7471
10.0000 -10.0000
10.0000 -5.0000 10.0000
6.6649
10.0000
10.0000 0.0000 4.4560
10.0000 10.0000 4.7471
5.0000 8.9120
array([ 0., 0.])
function,are x = y =0.Theresultingfunctionvaluei
Theoptimalparametervalues,giventhei nitial parameterizationofthe
s also0,asaquick
reviewoftheprecedingoutputreveals.Thefirstparameterizationherei
quiterough,inthatweusedstepsofwidth5forbothinputparameters.Thiss
canofcourseberefinedconsiderably,leadingtobetterresultsinthiscase:
In [60]: output
opt1 = =spo.brute(fo,
False ((-10, 10.1, 0.1), (-10, 10.1, 0.1)),
finish=None)opt1
Out[60]: array([-1.4, -1.4])
In [61]: fm(opt1)
Out[61]: -1.7748994599769203
functionvaluefortheglobalminimizationiy=s about–1.7749.
Theoptimalparametervaluesarenowx= –1.4andtheminimal
LocalOptimization
Forthelocalconvexoptimizationwewant toasdrawontheresultsfromthe
globaloptimization.Thefunction fmin takes inputthefunctionto
minimizeandthestartingparametervalues.Inaddition,youcandefine
wellasforthemaximumnumberofiterationsandfunctionc
levelsfortheinputparametertoleranceandthefunctionvaluetolerance,as
al s:
In[62]: output
opt2 =True opt1, xtol=0.001, ftol=0.001,
=spo.fmin(fo,
maxiter=15,opt2maxfun=20)
-1.4000 -1.4000
Out[62]: -1.4700 -1.4000 -1.7749
-1.7743
-1.4000
-1.3300 -1.4700
-1.4700 -1.7743
-1.7696
-1.4350
-1.4350 -1.4175
-1.3475 -1.7756
-1.7722
-1.4088
-1.4438 -1.4394
-1.4569 -1.7755
-1.7751
-1.4328
-1.4591 -1.4427
-1.4208 -1.7756
-1.7752
-1.4213
-1.4235 -1.4347
-1.4096 -1.7757
-1.7755
-1.4305
-1.4168 -1.4344
-1.4516 -1.7757
-1.7753
-1.4305
-1.4396 -1.4260 -1.7757
-1.4259 -1.4325 -1.7756
-1.4257 -1.7757
-1.4259
-1.4304 -1.4241
-1.4177 -1.7757
-1.7757
-1.4270
Warning: -1.4288
Maximum -1.7757of function evaluations has been
number
exceeded.
array([-1.42702972, -1.42876755])
Again, we
functionvalue:canobservearefinementofthesolutionandasomewhatlower
In [63]: fm(opt2)
Out[63]: -1.7757246992239009
Formanyconvexoptimizationproblemsi
convexoptimizationalgorithms t is advisabletohaveaglobal
can easilybetrappedinalocalminimum
minimizationbeforethelocalone.Themajorreasonforthisi s thatlocal
(ordo“basinhopping”),ignoringcompletely“better”localminimaand/or
aglobalminimum.Thefollowingshowsthatsettingthestarting
parameterizationtox= y=2 givesa“minimum”valueofabovezero:
In [64]: output = False(2.0, 2.0), maxiter=250)
spo.fmin(fo,
Out[64]: OptimizationCurrentterminated successfully.
function
Function 46 value:0.015826
Iterations:evaluations: 86
array([ 4.2710728 ,4.27106945])
ConstrainedOptimization
However,largeclassesofeconomicorfinancialoptimizationproblemsare
Sofar,wehaveonlyconsideredunconstrainedoptimizationproblems.
constrainedbyoneormultipleconstraints.Suchconstraintscanformally
takeontheformofequations or inequalities.
Asasimpleexample,considertheutilitymaximizationproblemofan
(expectedu t i l i
s12USD,respectively, t y maximizing)investor who caninvest
eapayoffof15USDand5USD,respectively,instateu,andof5USDand
curities. Bothsecuritiescostq a=qb=10
in tworisky
today.Afteroneyear,theyhave
in stated.Bothstatesareequallylikely.
vectorpayoffsforthetwosecuritiesbyra andrb,respectively. Denotethe
Theinvestorhasabudgetofw
fromfuturewealthaccordingtotheu 0 =100USDtoinvestandderivesutility
tility function,wherewis
thewealth(USDamount)available.Equation9­2isaformulationofthe
maximizationproblemwherea,b are thenumbersofsecuritiesboughtby
theinvestor.
Equation9­2.Expectedutility maximizingproblem

Puttingina
utility. l numericalassumptions,wegettheprobleminEquation9­3.
Notethatwealsochangetotheminimization of thenegativeexpected
Equation9­3.Expectedutility maximizingproblem

Tosolvethisproblem,weusethe scipy.optimize.minimize function.


Thisfunctiontakesasinput—inadditiontothefunctiontobeminimized—
equationsandinequalities(asa
fortheparameters list of dict objects)aswellasboundaries
(as atupleof tuple objects).[34] Wecantranslatethe
problemfromEquation9­3intothefollowingcode:
In [65]: #function
from math to be minimized
import
def return b)): sqrt
Eu((s, -(0.5 * sqrt(s * 15 + b * 5) + 0.5 * sqrt(s * 5 +
b * 12))
#cons=({'type':
constraints 'ineq', 'fun': lambda (s, b): 100 - s * 10
- b * 10}) # budget constraint
bnds = ((0, 1000),(0, 1000)) # uppper bounds large enough
Wehaveeverythingweneedtousethe minimize function—wejust have
toaddaninitialguess for theoptimalparameters:
In [66]: result = spo.minimize(Eu, [5, 5], method='SLSQP',
bounds=bnds, constraints=cons)
In [67]: result
Out[67]: success:
status: 0True
njev:
nfev: 521
fun:
jac:x: -9.700883611487832
message: array([
'Optimization
array([-0.48508096,
8.02547122, 1.97452878])
terminated
-0.48489535,
successfully.'
0. ])
nit: 5
asfollows:
Thefunctionreturnsadictobject.Theoptimalparameterscanbereadout
In [68]: result['x']
Out[68]: array([ 8.02547122, 1.97452878])
Theoptimalfunctionvalueis (changingthesignagain):
In [69]: -result['fun']
Out[69]: 9.700883611487832
Giventheparameterizationforthesimplemodel,i
security t is optimalfortheof
b. Thebudgetconstraintis binding;i.e., theinvestorinvestshis/her
investortobuyabouteightunitsofsecurityaandabouttwounits
total wealthof100USDintothesecurities. Thisis easilyverifiedthrough
takingthedotproductoftheoptimalparametervectorandthepricevector:
In [70]: np.dot(result['x'], [10, 10])
Out[70]: 99.999999999999986
Integration
Especiallywhenitcomes to valuation and optionpricing,integrationisan
importantmathematicalt
valuesofderivatives can obeexpressedingeneralasthediscounted
l. Thisstemsfromthefactthatrisk­neutral
expectationoft scipy.integrate providesdifferent
expectationinturniheir payoffundertherisk­neutral(martingale)measure.The
continuouscase.Thesublibrary
s asuminthediscretecaseandanintegralinthe
functionsfornumericalintegration:
In[71]: import scipy.integrate as sci
Again, we sticktotheexamplefunctioncomprisedofa sin componentand
alinearone:
In [72]: def f(x):
return np.sin(x) + 0.5 * x
Weareinterestedintheintegralovertheinterval[0.5,9.5];i
asinEquation9­4. .e., theintegral
Equation9­4.Integralofexamplefunction
Ģ
Figure9­15providesagraphicalrepresentationoftheintegralwithaplotof
thefunctionf(x) sin(x)+0.5x:
In [73]: a = 0.5 # left integral limit
xby ===f(x) # right integral
9.5np.linspace(0, 10) limit
In[74]: from matplotlib.patches import Polygon
fig, ax
plt.plot(x, = plt.subplots(figsize=(7,
'b',
y, linewidth=2) 5))
plt.ylim(ymin=0)
## area
between underlowertheandupper
function limit
Ix= = np.linspace(a,
Iyverts= b)
f(Ix)[(a, 0)] + list(zip(Ix,
poly = Polygon(verts, Iy)) + [(b,
facecolor='0.7', 0)]
edgecolor='0.5')
ax.add_patch(poly)
#plt.text(0.75
labels * (a + b), 1.5, r"$\int_a^b f(x)dx$",
horizontalalignment='center', fontsize=20)
plt.figtext(0.9,0.075,'$x$')
plt.figtext(0.075, 0.9, '$f(x)$')
ax.set_xticks((a, b))
ax.set_xticklabels(('$a$',
ax.set_yticks([f(a), f(b)])'$b$'))

Figure9­15.Examplefunctionwithintegralarea
NumericalIntegration
Theintegratesublibrarycontainsaselectionoffunctionstonumerically
integrateagivenmathematicalfunctiongivenupperandlowerintegration
imits. Examplesarefixed_quadforfixedGaussianquadrature, quadfor
ladaptivequadrature,andrombergforRombergintegration:
In [75]: sci.fixed_quad(f, a, b)[0]
Out[75]: 24.366995967084588
In [76]: sci.quad(f, a, b)[0]
Out[76]: 24.374754718086752
In [77]: sci.romberg(f, a, b)
Out[77]: 24.374754718086713
ndarray objectswithfunctionvaluesandinputvalues.Examplesintlisthisor
Therearealsoanumberofintegrationfunctionsthattakeasinput
regardaretrapz,usingthetrapezoidalr
Simpson’srule: u l e , and simps, implementing
In [78]: xi =np.linspace(0.5, 9.5, 25)
In [79]: sci.trapz(f(xi), xi)
Out[79]: 24.352733271544516
In [80]: sci.simps(f(xi), xi)
Out[80]: 24.374964184550748
IntegrationbySimulation
ThevaluationofoptionsandderivativesbyMonteCarlosimulation(cf.
Chapter10)rests on theinsightthatyoucanevaluate
limitsandevaluatetheintegrationfunctiona an integral by .
simulation.Tothisend,drawIrandomvaluesofxbetweentheintegral
t everyrandomvalueofx
Sumupallthefunctionvaluesandtaketheaveragetoarriveatanaverage
functionvalueovertheintegrationi n t e r v a l . Multiplythisvaluebythe
lengthoftheintegrationintervaltoderiveanestimatefortheintegralvalue.
converges to therealonewhenoneincreasesthenumberofrandomdraws.
ThefollowingcodeshowshowtheMonteCarloestimatedintegralvalue
Theestimatoris alreadyquitecloseforreallysmallnumbersofrandom
draws:
In [81]: for np.random.seed(1000)
i in range(1, 20):
print
x = np.random.random(i
np.sum(f(x)) / len(x)
* 10)**(b(b--a)a) + a
Out[81]: 24.8047622793
26.5229188983
26.2655475192
26.0277033994
24.9995418144
23.8818101416
23.5279122748
23.507857659
23.6723674607
23.6794104161
24.4244017079
24.2390053468
24.115396925
24.4241919876
23.9249330805
24.1948421203
24.1173483782
24.1006909297
23.7690510985
SymbolicComputation
Theprevioussectionsaremainlyconcernedwithnumericalcomputation.
Thissectionnowintroducessymboliccomputation,whichcanbeapplied
beneficiallyinmanyareasoffinance.Tothisend,letus import SymPy, the
libraryspecificallydedicatedtosymboliccomputation:
In [82]: import sympy as sy
Basics
SymPy introducesnewclassesofobjects.Afundamentalclassis the Symbol
class:
In [83]: yx = sy.Symbol('y')
sy.Symbol('x')
In [84]: type(x)
Out[84]: sympy.core.symbol.Symbol
NumPy,SymPy hasanumberof(mathematical)functiondefinitions.
LikeForexample:
In [85]: sy.sqrt(x)
Out[85]: sqrt(x)
Thisalreadyi l ustrates amajordifference.Although
value,thesquarerootof xis x hasnonumerical
neverthelessdefinedwith SymPy since xisa
In
Symbol object. thatsense, sy.sqrt(x)SymPycanbepartofarbitrary
mathematicalexpressions.Noticethat ingeneralautomatically
simplifiesagivenmathematicalexpression:
In [86]: 3 + sy.sqrt(x) - 4 ** 2
Out[86]: sqrt(x) - 13
arenottobeconfusedwith Python functions: Symbol objects.They
Similarly,youcandefinearbitraryfunctionsusing
In [87]: f = x ** 2 + 3 + 0.5 * x ** 2 + 3 / 2
In [88]: 1.5*x**2
Out[88]: sy.simplify(f)
+
4
SymPyprovides
LaTeX­basedthreebasicrenderersformathematicalexpressions:
Unicode­based
ASCII­based
Whenworking,forexample,solelyinthe
renderingisgenerallyagood( i . e . , IPythonNotebook,LaTeX
visuallyappealing)choice.Inwhat
follows,westicktothesimplestoption,ASCII,toillustratethatthereisno
handmadetypesettinginvolved:
In [89]: sy.init_printing(pretty_print=False, use_unicode=False)
In [90]: print sy.pretty(f)
Out[90]: 1.5*x2 + 4
Asyoucanseefromtheoutput,multiplelinesareusedwheneverneeded.
Also,forexample,seethefollowingforthevisualrepresentationofthe
square­rootfunction:
In [91]: print sy.pretty(sy.sqrt(x) + 0.5)
Out[91]: \/___x + 0.5
Wecannotgointodetailshere,but SymPy alsoprovidesmanyotheruseful
mathematicalfunctions—forexample,wheni t comestonumerically
evaluatingᵰ.Thefollowingshowsthef
representationofᵰ up tothe400,000thdiirgsitt40charactersofthe
: string
In [92]: pi_str = str(sy.N(sy.pi, 400000))
pi_str[:40]
Out[92]: '3.14159265358979323846264338327950288419'
Andherearethelast 40digitsofthefirst 400,000:
In [93]: pi_str[-40:]
Out[93]: '8245672736856312185020980470362464176198'
Youcanalsolookupyourbirthdayi
guaranteeofah it: f youwish;however,thereis no
In [94]: pi_str.find('111272')
Out[94]: 366713
Equations
Astrengthof SymPy is solvingequations,e.g., oftheformx2–1=0:
In [95]: sy.solve(x ** 2 - 1)
Out[95]: [-1, 1]
SymPy presumesthatyouarelookingforasolutiontothe
equationobtainedbyequatingthegivenexpressiontozero.Therefore,
Ingeneral,
esult: x2– 1=3mighthave to bereformulatedtogetthedesired
requationslike
In [96]: sy.solve(x ** 2 - 1 - 3)
Out[96]: [-2, 2]
Ofcourse,
–1=0: SymPy cancopewithmorecomplexexpressions,likex3+0.5x2
In [97]: sy.solve(x ** 3 + 0.5 * x ** 2 - 1)
Out[97]: [0.858094329496553, -0.679047164748276 -
0.839206763026694*I,
-0.679047164748276 + 0.839206763026694*I]
mathematicalpointofview(
However,thereisobviouslynoguaranteeofasolution,eitherfroma
i.e., theexistenceofa solution)orfroman
SymPy workssimilarlywithfunctionsexhibitingmorethanoneinput
algorithmicpointofview(i.e.,animplementation).
parameter,andtothisendalsowithcomplexnumbers.Asasimple
exampletaketheequationx2+y2=0:
In [98]: sy.solve(x ** 2 + y ** 2)
Out[98]: [{x: -I*y}, {x: I*y}]
Integration iWes andintegration
simulation­basedintegration
follows,werevisit
numericallyexactsolution.
AnotherstrengthofSymPy needsymbols for theintegrationlimits:
the examplefunctionusedfornumerical­and
and differentiation.Inwhat
derivenowbothasymbolicanda
In [99]: a, b = sy.symbols('a b')
Havingdefinedthenewsymbols,wecan“prettyprint”thesymbolic
integral:
Inb)))[100]: print sy.pretty(sy.Integral(sy.sin(x) + 0.5 * x, (x, a,
Out[100]: b/
|| (0.5*x + sin(x)) dx
/|
a
function:integrate, wecanthenderivetheantiderivativeoftheintegration
Using
In [101]: int_func = sy.integrate(sy.sin(x) + 0.5 * x, x)
In [102]: print sy.pretty(int_func)
Out[102]: 0.25*x2 - cos(x)
al themethod evalf on the new expression: SymPy expression,replaceand
ctherespectivesymbolwiththenumericalvalueusingthemethodsubs
Equippedwiththeantiderivative,thenumericalevaluationoftheintegralis
onlythreestepsaway.Tonumericallyevaluatea
In [103]: FbFa == int_func.subs(x,
int_func.subs(x, 9.5).evalf()
0.5).evalf()
Thedifferencebetween Fb and Fa thenyieldstheexactintegralvalue:
In [104]: Fb - Fa # exact value of integral
Out[104]: 24.3747547180867
lTheintegralcanalsobesolvedsymbolicallywiththesymbolicintegration
imits:
In [105]: int_func_limits = sy.integrate(sy.sin(x) + 0.5 * x, (x, a,
b)) print sy.pretty(int_func_limits)
Out[105]: - 0.25*a2 + 0.25*b2 + cos(a) - cos(b)
Asbefore,numericalsubstitution—thistimeusinga dict objectfor
multiplesubstitutions—andevaluationthenyieldstheintegralvalue:
In [106]: int_func_limits.subs({a : 0.5, b : 9.5}).evalf()
Out[106]: 24.3747547180868
Finally,providingquantifiedintegrationlimitsyieldstheexactvalueina
singlestep:
In [107]: sy.integrate(sy.sin(x) + 0.5 * x, (x, 0.5, 9.5))
Out[107]: 24.3747547180867
Differentiation
Thederivativeoftheantiderivativeshallyieldingeneraltheoriginal
function.Letuscheckthisbyapplyingthe
from
antiderivative before: diff functiontothesymbolic
In [108]: int_func.diff()
Out[108]: 0.5*x + sin(x)
Aswiththeintegrationexample,wewanttousedifferentiationnowto
ederivetheexactsolutionoftheconvexminimizationproblemwelookeda
rlier. Tothis end,wedefinetherespectivefunctionsymbolicallyas t
afollows:
In [109]: f =+(sy.sin(x)
sy.sin(y) ++ 0.05
0.05 ** xy **** 22)
Fortheminimization,weneedthetwop
bothvariables, x and y: artial derivativeswithrespectto
In [110]: del_x = sy.diff(f, x)
Out[110]: 0.1*x +cos(x)
In[111]: del_y
del_y = sy.diff(f, y)
Out[111]: 0.1*y + cos(y)
Anecessarybutnotsufficientconditionforaglobalminimumi s thatboth
psymbolicsolution.Bothalgorithmicand(multiple)existenceissuescome
providing“educated”guessesbasedontheglobalandlocalminimization
intoplayhere.However,wecansolvethetwoequationsnumerically,
artial derivativesarezero.Asstatedbefore,thereisnoguaranteeofa
effortsfrombefore:
In [112]: xoxo = sy.nsolve(del_x, -1.5)
Out[112]: mpf('-1.4275517787645941')
In [113]: yoyo = sy.nsolve(del_y, -1.5)
Out[113]: mpf('-1.4275517787645941')
In [114]: f.subs({x
# global:xo,y
minimum: yo}).evalf()
Out[114]: -1.77572565314742
Again,providinguneducated/arbitraryguessesmighttrapthealgorithmina
localminimuminsteadoftheglobalone:
In[115]: xoxo = sy.nsolve(del_x, 1.5)
Out[115]: mpf('1.7463292822528528')
In [116]: yoyo = sy.nsolve(del_y, 1.5)
Out[116]: mpf('1.7463292822528528')
In [117]: f.subs({x : xo, y : yo}).evalf()
#local minimum
Out[117]: 2.27423381055640
Thisnumericallyi
notsuf icient. l u s t r a t e s thatzerop a r t i a l derivativesarenecessarybut
SYMBOLIC COMPUTATIONS
WhendoingmathematicswithPython,youshouldalwaysthinkof
SymPy andsymboliccomputations.Especiallyforinteractivefinancial
analytics,thiscanbeamoreefficientapproachcomparedtonon­
symbolicapproaches.
Conclusions
finance.Forexample,theapproximationoffunctionsisimportantinmany
Thischaptercoverssomemathematicaltopicsandtoolsimportantto
financialareas,likeyieldcurveinterpolationandregression­basedMonte
CarlovaluationapproachesforAmericanoptions.Convexoptimization
techniquesarealsoregularlyneeded in
calibratingparametricoptionpricingmodels finance;
to for example,when
marketquotesorimplied
vof)stochasticprocess(es),optionpricingboilsdowntotakingthe
oexpectationoftheoption’spayoffundertherisk­neutralmeasureand
lNumericalintegrationi
aderivatives.Havingderivedtherisk­neutralprobabilitymeasurefora(set
tilities ofoptions. s, forexample,centraltothepricingofoptionsand
simulationofseveraltypesofstochasticprocessesundertherisk­neutral
measure.
discountingthis valuebacktothepresentdate.Chapter10coversthe
Finally,t his chapterintroducessymboliccomputationwith SymPy. Fora
numberofmathematicaloperations,likeintegration,differentiation,orthe
solvingofequations,symboliccomputationcanproveareallyusefuland
efficientto l.
FurtherReading
Forfurtherinformationonthe Pythonlibraries usedinthis chapter,you
shouldconsultthefollowingwebresources:
Seehttp://docs.scipy.org/doc/numpy/reference/foral functionsused
from NumPy.
Thestatsmodelslibraryisdocumentedhere:
http://statsmodels.sourceforge.net.
Visithttp://docs.scipy.org/doc/scipy/reference/optimize.htmlfordetails
on scipy.optimize.
Integrationwith scipy.integrate is explainedhere:
http://docs.scipy.org/doc/scipy/reference/integrate.html.
ThehomeofSymPyishttp://sympy.org. see:
Foragoodreferencetothemathematicaltopicscovered,
Brandimarte,Paolo(2006):NumericalMethodsinFinanceand
Economics,2nded.JohnWiley&Sons,Hoboken,NJ.
[33] Fordetailsontheuseof OLS, refertothedocumentation.
Fordetailsandexamplesofhowtousethe minimize function,refertothe
[34]documentation.
Chapter10.Stochastics
Predictabilityi s
—RaheelFarooqnothowthingswillgo,buthowtheycango.
numericaldisciplinesinfinance.Inthebeginningofthemoderneraof
finance,mainlyinthe1970sand1980s,themajorgoaloffinancialresearch
Nowadays,stochasticsis oneofthemostimportantmathematicaland
wastocomeupwithclosed­formsolutionsf
recentyearsinthatnotonlyi
specificfinancialmodel.Therequirementshavedrasticallychangedin
s thecorrectvaluationofsinglefinancial
or, e.g., optionpricesgivena
instrumentsimportanttoparticipants in thefinancialmarkets,butalsothe
consistentvaluationofwholederivativesbooks,forexample.Similary,to
comeupwithconsistentriskmeasuresacrossawholefinanciali nstitution,
likevalue­at­riskandcreditvalueadjustments,oneneedstotakeinto
accountthewholebookoftheinstitutionandalli ts counterparties.Such
dauntingtaskscanonlybetackledbyflexibleandefficientnumerical
methods.Therefore,stochasticsingeneralandMonteCarlosimulationin
Thischapterintroducesthefollowing
particularhaverisentoprominence. topicsfroma Python perspective:
Randomnumbergeneration
It allstartswith(pseudo)randomnumbers,whichbuildthebasisfora l
simulationefforts;althoughquasirandomnumbers,e . g
sequences,havegainedsomepopularityinfinance,pseudorandom. , basedonSobol
numbersstillseemtobethebenchmark.
Simulation
Infinance,twosimulationtasksareofparticularimportance:simulation
ofrandomvariablesandofstochasticprocesses.
Valuation
Thetwomaindisciplineswhenitcomestovaluationarethevaluation
exercise(overaspecifictimei
ofderivativeswithEuropeanexercise(ataspecificdate)andAmerican
nterval); therearealsoinstrumentswith
Bermudanexercise,orexerciseat afinite setofspecificdates.
RiskmeasuresSimulation lendsitself prettywelltothecalculationofriskmeasures
likevalue­at­risk,creditvalue­at­risk,andcreditvalueadjustments.
RandomNumbers
Throughoutthischapter, to generaterandomnumbers[35]wewillworkwith
thefunctionsprovidedbythe numpy.random sublibrary:
In [1]: import
import numpy as np as npr
numpy.random
import matplotlib.pyplot as plt
%matplotlibinline
Forexample,the
returnobjectisanrandndarrayfunctionreturnsrandomnumbersfromtheopen
interval[0,1)intheshapeprovidedasaparametertothefunction.The
object:
In [2]: npr.rand(10)
Out[2]: array([
0.2729951 , 0.40628966, 0.43098644, 0.9435419 , 0.26760198,
0.95130158]) 0.67519064, 0.41349754, 0.3585647, 0.07450132,
In [3]: npr.rand(5, 5)
Out[3]: array([[ 0.87263851, 0.8143348, 0.34154499, 0.56695052,
0.60645041], [ 0.39398181, 0.71671577, 0.63568321,
0.93526172], [ 0.12632038, 0.35793789, 0.04241014, 0.61652708,
0.88085228,
0.54260211], [ 0.14503456, 0.32939077, 0.28834351, 0.4050322
0.21120017], [ 0.45345805, 0.29771411, 0.67157606, 0.73563706,,
0.48003387]])
follows:
interval[a,b)=[5,10),you can transformthereturnednumbersfrom rand as
line. Forinstance,ifyouwanttogeneraterandomnumbersfromthe
Suchnumberscanbeeasilytransformedtocoverotherintervalsofthereal
In [4]: ab == 5.10.
npr.rand(10) * (b - a) + a
Out[4]:
7.62199611,array([ 7.27123881,
8.86229349, 6.51309437,
6.78202851, 7.51380629,
6.33248656, 7.84258434,
8.10776244,
9.48668419])
Thisalsoworksformultidimensionalshapesdueto NumPy broadcasting:
In [5]: npr.rand(5, 5) * (b - a) + a
Out[5]: array([[ 6.65649828, 6.51657569, 9.7912274 , 8.93721206,
6.66937996],
6.05374605], [8.97919481, 8.27547365, 5.00975386, 8.99797249,
5.53651748], [ 7.50268777, 8.43810167, 9.33608096, 8.5513646 ,
6.39226557], [ 7.04179874, 6.98111966, 8.42677435, 6.22325043,
5.28435207]]) [ 9.88334499, 7.59597546, 5.93724861, 5.39285822,
Table10­1lists functionsforgeneratingsimplerandomnumbers.[36]
Table10­1.Functionsforsimplerandomnumbergeneration
Function Parameters Description
rand d0,d1, , dn Randomsinthegivenshape
randn d0,d1, , dn Asample(orsamples)fromthestandardnormal
distribution
randint low[, size]high, Randomintegersfrom
(exclusive) low (inclusive)to high
random_integers low[,high,
size] Randomintegersbetween low and high, inclusive
random_sample [size] Randomfloatsinthehalf­openinterval[0.0,1.0)
random [size] Randomfloatsinthehalf­openinterval[0.0,1.0)
ranf [size] Randomfloats in thehalf­openinterval[0.0,1.0)
sample [size] Randomfloatsinthehalf­openinterval[0.0, 1.0)
choice a[,size, replace,p] Randomsamplefromagiven1Darray
bytes length Randombytes
Letusvisualizesomerandomdrawsgeneratedbyselectedfunctionsfrom
Table10­1:
In [6]: sample_size = 500 3)
rn1rn2== npr.rand(sample_size,
npr.randint(0,
25,50,75, 100]10, sample_size)
arn3= [0,= npr.sample(size=sample_size)
rn4 = npr.choice(a, size=sample_size)
Figure10­1showstheresultsgraphicallyfortwocontinuousdistributions
andtwodiscreteones:
In[7]:
ncols=2,fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2,
ax1.hist(rn1, bins=25, stacked=True) figsize=(7, 7))
ax1.set_title('rand')
ax1.set_ylabel('frequency')
ax1.grid(True)
ax2.hist(rn2, bins=25)
ax2.set_title('randint')
ax2.grid(True)
ax3.hist(rn3, bins=25)
ax3.set_title('sample')
ax3.set_ylabel('frequency')
ax3.grid(True)
ax4.hist(rn4, bins=25)
ax4.set_title('choice')
ax4.grid(True)
Figure10­1.Simplepseudorandomnumbers
Table10­2l i s t s functionsforgeneratingrandomnumbersaccordingto
differentdistributions.[37]
Table10­2.Functionstogeneraterandomnumbersaccordingto
differentdistributionlaws
Function Parameters Description
beta a,b[,size] Samplesforbetadistributionover[0, 1]
binomial n,p[,size] Samplesfromabinomialdistribution
chisquare df[,size] Samplesfromachi­squaredistribution
dirichlet alpha[,size] SamplesfromtheDirichletdistribution
exponential [scale, size] Samplesfromtheexponentialdistribution
f dfnum,dfden[,
size] SamplesfromanFdistribution
gamma shape[,scale,
size] Samplesfromagammadistribution
geometric p[,size] Samplesfromthegeometricdistribution
gumbel [size]loc, scale, SamplesfromaGumbeldistribution
nsample[,size] Samplesfromahypergeometricdistribution
hypergeometric ngood,nbad,
laplace [size]loc, scale, SamplesfromtheLaplaceordouble
exponentialdistribution
logistic [size]loc, scale, Samplesfromalogistic distribution
lognormalv [mean,
size] sigma, Samplesfromalog­normaldistribution
logseries p[, size] Samplesfromalogarithmicseries
distribution
multinomial n,pvals[,size] Samplesfromamultinomialdistribution
multivariate_normal mean,cov[,size] Samplesfromamultivariatenormal
distribution
negative_binomial n,p[,size] Samplesfromanegativebinomial
distribution
noncentral_chisquare df,nonc[,size] Samplesfromanoncentralchi­square
distribution
nonc[,size] samplesfromthenoncentralFdistribution
noncentral_f dfnum,dfden,
normal [size]loc, scale, Samplesfromanormal(Gaussian)
distribution
pareto a[,size] SamplesfromaParetoIIorLomax
distributionwithspecifiedshape
poisson [lam, size] SamplesfromaPoissondistribution
power a[,size] Samplesin[ 0, 1]fromapowerdistribution
withpositiveexponent a–1
rayleigh [scale, size] SamplesfromaRayleighdistribution
standard_cauchy [size] withmode SamplesfromstandardCauchydistribution
=0
standard_exponential [size] Samplesfromthestandardexponential
distribution
standard_gamma shape[,size] Samplesfromastandardgamma
distribution
standard_normal [size] Samplesfromastandardnormal
distribution(mean=0,stdev=1)
standard_t df[,size] SamplesfromaStudent’stdistributionwith
dfdegrees offreedom
triangular right[,size] Samplesfromthetriangulardistribution
left,mode,
uniform [low, high,size] Samplesfromauniformdistribution
vonmises mu,kappa[,size] SamplesfromavonMisesdistribution
size] scale[, SamplesfromaWald,orinverseGaussian,
wald mean, distribution
weibull a[,size] SamplesfromaWeibulldistribution
zipf a[,size] SamplesfromaZipfdistribution
Althoughthereismuchcriticismaroundtheuse
widelyusedtypeofdistribution, in
arean of (standard)normal
distributionsinfinance,they analyticalaswellasnumerical
indispensibletoolandstillthemost
applications.Onereasoni
wayoranotheron s that many financialmodelsdirectlyr e s
anormaldistribution ora log­normaldistribution. t inone
Anotherreasoni s thatmanyfinancialmodelsthatdonotrestdirectlyona
(log­)normalassumptioncanbediscretized,andtherewithapproximated
Asani
forsimulationpurposes,bytheuseofthenormaldistribution.
l ustration, wewanttovisualizerandomdrawsfromthe following
distributions:
Standard normalwithmeanof0andstandarddeviationof1
Normalwithmeanof100andstandarddeviationof20
Chisquarewith0.5degreesoffreedom
Poissonwithlambda of1
Wedothis asfollows:
In [8]: sample_size =500
rn1rn2 == npr.normal(100,20,
npr.standard_normal(sample_size)
sample_size)
rn3rn4 == npr.chisquare(df=0.5, size=sample_size)
npr.poisson(lam=1.0, size=sample_size)
Figure10­2showstheresultsforthethreecontinuousdistributionsandthe
instrumentoranexogenicshock.Herei
simulatethearrivalof(rar
discreteone(Poisson).ThePoissondistribution is used,forexample,to
e)externalevents,likeajumpinthepriceofan
s thecodethatgeneratesit:
Inncols=2,
[9]: fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2,
ax1.hist(rn1, bins=25) figsize=(7,7))
ax1.set_title('standard normal')
ax1.set_ylabel('frequency')
ax1.grid(True)
ax2.hist(rn2, bins=25) 20)')
ax2.set_title('normal(100,
ax3.hist(rn3, bins=25)
ax2.grid(True)
ax3.set_title('chi
ax4.hist(rn4, square')
ax3.grid(True)
ax3.set_ylabel('frequency')
bins=25)
ax4.set_title('Poisson')
ax4.grid(True)
Figure10­2.Pseudorandomnumbersfromdifferentdistributions
Simulation
MonteCarlosimulation(MCS)is amongthemostimportantnumerical
techniquesinfinance,i f notthemostimportantandwidelyused.This
mainlystemsfromthefactthatit isthemostflexiblenumericalmethod
iwhenitcomestotheevaluationofmathematicalexpressions(e.g.,
ntegrals), andspecificallythevaluationoffinancialderivatives.The
fthough,sinceoftenhundredsofthousandsorevenmillionsofcomplex
lexibility comesatthecostofarelativelyhighcomputationalburden,
computationshavetobecarriedouttocomeupwithasinglevalue
estimate.
RandomVariables
Consider,forexample,theBlack­Scholes­Mertonsetupforoptionpricing
(dateTgivenalevelS
cf. alsoChapter3). Int0 asoftodayi
heir setup,thelevelofastockindexS T a t afuture
s givenaccordingtoEquation10­1.
Equation10­1.SimulatingfutureindexlevelinBlack­Scholes­Mertonsetup
Thevariablesandparametershavethefollowingmeaning:
ST Indexlevelat dateT
r Constantrisklessshortrate
M Constantvolatility (=standarddeviationofreturns)ofS
z Standardnormallydistributedrandomvariable
Thissimplefinancialmodeliseasilyparameterizedandsimulatedas
follows:
In [10]: S0r==0.05100 ## constantshort initial value rate
sigma # constant
ST1IT ==2.0=10000S0= 0.25*# np.exp((r
#inyears
number volatility
draws** 2)*T
of- 0.5*sigma
random
+ sigma * np.sqrt(T) * npr.standard_normal(I))
Theoutputofthis simulationcodeis showninFigure10­3:
In [11]: plt.hist(ST1, bins=50)
plt.xlabel('index level')
plt.ylabel('frequency')
plt.grid(True)
Figure10­3.SimulatedgeometricBrownianmotion(viastandard_normal)
Figure10­3suggeststhatthedistribution of therandomvariableasdefined
inEquation10­1islog­normal.Wecouldthereforealsotrytousethe
lognormal functiontodirectlyderivethevaluesfortherandomvariable.In
function:
thatcase,wehavetoprovidethemeanandthestandarddeviationtothe
In [12]: ST2 = S0 * npr.lognormal((rsigma- 0.5* np.sqrt(T),
* sigma **size=I)
2) * T,
Figure10­4showstheoutputofthefollowingsimulationcode:
In [13]: plt.hist(ST2, bins=50)
plt.xlabel('index level')
plt.ylabel('frequency')
plt.grid(True)

Figure10­4.SimulatedgeometricBrownianmotion(vialognormal)
Byvisualinspection,Figure10­4andFigure10­3indeedlookpretty
similar.Butletusverifythismorerigorouslybycomparings
Tocomparethedistributionalcharacteristics
momentsoftheresultingdistributions. tatistical
ofsimulationresultsweuse
thescipy.statssublibraryandthehelperfunctionprint_statistics, as
definedhere:
In [14]: import scipy.stats as scs
In [15]: def '''print_statistics(a1, a2):
Prints selected statistics.
Parameters
==========
a1, a2:resultsndarray
objectobjects
from simulation
'''sta1 == scs.describe(a1)
sta2('statistic',
print"%14s%14s %14s"
scs.describe(a2)
'data%set\ 1', 'data set 2')
print 45* "%14s"-"%14.3f %14.3f" % ('size', sta1[0], sta2[0])
[0]) print print "%14s %14.3f %14.3f" % ('min', sta1[1][0], sta2[1]
[1]) print "%14s %14.3f %14.3f" % ('max', sta1[1][1], sta2[1]
"%14s %14.3f
%14.3f %14.3f"
%14.3f" %% ('std',
('mean',np.sqrt(sta1[3]),
sta1[2], sta2[2])
print "%14s np.sqrt(sta2[3]))
print
print "%14s
"%14s %14.3f
%14.3f %14.3f"
%14.3f" %
% ('skew', sta1[4],
('kurtosis', sta2[4])
sta1[5],
sta2[5])
In [16]: print_statistics(ST1, ST2)
Out[16]: ---------------------------------------------
statistic data set 1 data set 2
sizemin 10000.000
27.936 10000.000
27.266
max 410.795
meanstd 110.442 358.997
110.528
skew 39.932
1.082 40.894
1.150
kurtosis 1.927 2.273
Obviously,thestatistics ofbothsimulationresultsarequitesimilar.The
differencesaremainlyduetowhati s calledthesamplingerrorin
simulation.Errorcanalsobeintroducedwhendiscretelysimulating
continuousstochasticprocesses—namelythediscretizationerror,which
playsnorolehereduetothestatic natureofthesimulationapproach.
StochasticProcesses
Roughlyspeaking,astochasticprocessisasequenceofrandomvariables.
Inthatsense,weshouldexpectsomethingsimilartoasequence
simulationsofarandomvariablewhensimulatingaprocess.Thisismainly of repeated
true, apart fromthefactthatthedrawsare
ratherdependonther in generalnotindependentbut
esult(s) ofthepreviousdraw(s).Ingeneral,however,
stochasticprocessesusedinfinanceexhibittheMarkovproperty,which
smainlysaysthattomorrow’svalueoftheprocessonlydependsontoday’s
wholepathhistory.Theprocesstheni
tate oftheprocess,andnotanyothermore“historic”stateoreventhe
s alsocalledmemoryless.
GeometricBrownianmotion
by thestochasticdifferentialequation(SDE)inEquation10­2.
ConsidernowtheBlack­Scholes­Mertonmodelini
described ts dynamicform, as
Brownianmotion.ThevaluesofS
Here,Zt is astandardBrownianmotion.TheSDEi s calledageometric
t arelog­normallydistributedandthe
(marginal)returns normally.
Equation10­2.StochasticdifferentialequationinBlack­Scholes­Merton
setupdSt =rStdt+ᵰStdZt
TheSDEinEquation10­2canbediscretizedexactlybyanEulerscheme.
Suchascheme is presentedinEquation10­3,withᵮtbeingthefixed
discretizationintervalandz
variable. t beingastandardnormallydistributedrandom
Equation10­3.SimulatingindexlevelsdynamicallyinBlack­Scholes­
Mertonsetup
Asbefore,translationinto Python and NumPy codeis straightforward:
In [17]: IM == 10000
Sdt==50np.zeros((M
T/M + 1, I))
S[0] t=inS0range(1,
for S[t] - 1]M *+ np.sqrt(dt)
= S[t+ sigma 1): - *0.5npr.standard_normal(I))
np.exp((r * sigma** 2) * dt
Theresultingendvaluesfortheindexlevel
again,asFigure10­5il ustrates: are log­normallydistributed
In [18]: plt.hist(S[-1], bins=50)
plt.xlabel('index
plt.grid(True) level')
plt.ylabel('frequency')
Figure10­5.SimulatedgeometricBrownianmotionatmaturity
Thef i r s t fourmomentsarealsoquiteclosetothoseresultingfromthes
simulationapproach: t a t i c
In [19]: print_statistics(S[-1], ST2)
Out[19]: ---------------------------------------------
statistic data set 1 data set 2
sizemin 10000.000
25.531 10000.000
27.266
max 425.051
meanstd 110.900 358.997
110.528
40.135 40.894
kurtosis 2.224 1.150
skew 1.086 2.273
Figure10­6showsthefirst 10simulatedpaths:
In [20]: plt.plot(S[:,
plt.ylabel('index:10],level')
plt.xlabel('time') lw=1.5)
plt.grid(True)

Figure10­6.SimulatedgeometricBrownianmotionpaths
Usingthedynamicsimulationapproachnotonlyallowsustovisualize
is path­dependent.
pathsasdisplayedinFigure10­6,butalso to valueoptionswith
American/Bermudanexerciseoroptionswhosepayoff
Yougetthefulldynamicpicture,sotosay:
Square­rootdiffusion
Anotherimportantclassoffinancialprocessesismean­revertingprocesses,
popularandwidelyusedmodelisthesquare­rootdiffusion,asproposedby
whichareusedtomodelshortratesorvolatility processes,forexample.A
Cox,Ingersoll,andRoss(1985).Equation10­4providestherespective
SDE.
Equation10­4.Stochasticdifferentialequationforsquare­rootdiffusion
Thevariablesandparametershavethefollowingmeaning:
xt Processlevelatdatet
D Mean­reversionfactor
B Long­termmeanoftheprocess
M Constantvolatility parameter
Z StandardBrownianmotion
t is wellknownthatthevaluesofxt arechi­squareddistributed.However,
Iasstatedbefore,manyfinancialmodelscanbediscretizedand
approximated by usingthenormaldistribution(i.e., aso­calledEuler
discretizationscheme).WhiletheEulerschemeisexactforthegeometric
Brownianmotion,i t is biasedforthemajorityofotherstochasticprocesses.
diffusionwillbepresentedshortly—theuseofanEulerschememightbe
Evenif thereis anexactschemeavailable—oneforthesquare­root
desirableduetonumericaland/orcomputationalreasons.Defining
tandx+Ģmax(x,0),Equation10­5presentssuchanEulerscheme.This sĢt –
Hilpisch(2015)).s generallycalledful truncationintheliterature (cf.
particularonei
Equation10­5.Eulerdiscretizationforsquare­rootdiffusion

rate model:
Weparameterizethemodelforthesimulationstofollowwithvaluesthat
couldrepresentthoseofashort
In [21]: x0kappa=0.05= 3.0
theta
sigma == 0.02 0.1
Thesquare­rootdiffusionhastheconvenientandr e a l i s t i c characteristicthat
thevaluesofxtremain strictly positive.WhendiscretizingitbyanEuler
scheme,negativevaluescannotbeexcluded.Thatisthereasonwhyone
In thesimulationcode,onethereforeneedstwo ndarray objectsinsteadof
worksalwayswiththepositiveversionoftheoriginallysimulatedprocess.
onlyone:
In [22]: IM == 10000
50 M
dtdef=T/
srd_euler():
xhx1 == np.zeros((M + 1,
np.zeros_like(xh) I))
xh[0]tin= x0range(1, M +
forx1[0]
xh[t] += kappa 1]1): - np.maximum(xh[t - 1], 0)) *
(xh[t *- (theta
dt + sigma * np.sqrt(np.maximum(xh[t - 1], 0)) *
np.sqrt(dt) * npr.standard_normal(I))
x1=
returnx1np.maximum(xh, 0)
x1 =srd_euler()
Figure10­7showstheresultofthesimulationgraphicallyasahistogram:
In[23]: plt.hist(x1[-1], bins=50)
plt.xlabel('value')
plt.ylabel('frequency')
plt.grid(True)
Figure10­7.Simulatedsquare­rootdiffusion at maturity(Eulerscheme)
Figure10­8thenshowsthef i r s t 10 simulatedpaths,il ustrating the
resultingnegative,averagefd
0.02: r i f t (duetox 0 >ᵰ)andtheconvergencetoᵰ=
In [24]: plt.plot(x1[:,
plt.ylabel('index:10],level')
plt.xlabel('time') lw=1.5)
plt.grid(True)
Figure10­8.Simulatedsquare­rootdiffusionpaths(Eulerscheme)
us nowgetmoreexact.Equation10­6presentstheexactdiscretization
Letschemeforthesquare­rootdiffusionbasedonthenoncentralchi­square
distribution. with degreesoffreedomandnoncentralityparameter
Equation10­6.Exactdiscretizationforsquare­rootdiffusion
Python implementationoft
Theinvolvedbuts til quiteconcise: his discretizationschemeisabitmore
In [25]: def srd_exact():
x2forx2[0]=cdftnp.zeros((M + 1, I))
=in=4*
(sigma
x0range(1,
theta** 2M**+kappa
(11):- np.exp(-kappa
/ sigma**2 * dt))) / (4 *
kappa) x2[t] nc= np.exp(-kappa *dt)/c*x2[t-1] nc, size=I)
= c * npr.noncentral_chisquare(df,
x2 =returnx2
srd_exact()
Figure10­9showstheoutputofthesimulationwiththeexactschemeasa
histogram:
In[26]: plt.hist(x2[-1], bins=50)
plt.xlabel('value')
plt.ylabel('frequency')
plt.grid(True)
Figure10­9.Simulatedsquare­rootdiffusion at maturity(exactscheme)
Figure10­10presentsasbeforethef i r s t 10simulatedpaths,again
displayingthenegativeaveragedrift andtheconvergencetoᵰ:
In [27]: plt.plot(x2[:,
plt.ylabel('index:10],level')
plt.xlabel('time') lw=1.5)
plt.grid(True)
Figure10­10.Simulatedsquare­rootdiffusionpaths(exactscheme)
desireds
Comparingthemains tatistics fromthedifferentapproachesreveals that the
biasedEulerschemeindeedperformsquitewellwhenitcomestothe
tatistical properties:
In [28]: print_statistics(x1[-1], x2[-1])
Out[28]: ---------------------------------------------
statistic data set 1 data set 2
sizemin 10000.000
0.004 10000.000
0.005
max
meanstd 0.020 0.049 0.050
0.020
0.006 0.006
kurtosis 0.418 0.572
skew 0.529 0.503
However,amajordifferencecanbeobservedintermsofexecutionspeed,
sincesamplingfromthenoncentralchi­squaredistributioni
computationallydemandingthanfromthestandardnormaldistribution.To
s more
il ustrate this point,consideralargernumberofpathstobesimulated:
In [29]:I=%time250000
x1 = srd_euler()
Out[29]: CPUWalltimes:
time: user
1.11 1.02
s s, sys: 84 ms, total: 1.11 s
In [30]: %time x2 = srd_exact()
Out[30]: Wall
CPUtimes:
time: 2.3users2.26 s, sys: 32 ms, total: 2.3 s
results as withtheEulerscheme:
Theexactschemetakesroughlytwiceasmuchtime for virtuallythesame
In [31]: print_statistics(x1[-1],
x1=0.0;x2= 0.0 x2[-1])
Out[31]: ---------------------------------------------
statistic data set 1 data set 2
sizemin 250000.000
0.003 250000.000
0.004
max
meanstd 0.020 0.069 0.060
0.020
0.006 0.006
kurtosis 0.488 0.578
skew 0.554 0.502
Stochasticvolatility
OneofthemajorsimplifyingassumptionsoftheBlack­Scholes­Merton
modeli s theconstantv o l a t i l i t y . However,volatilityingeneralisneither
withregardtofinancialmodelingwasachievedintheearly1990swiththe
constantnordeterministic;itis stochastic.Therefore,amajoradvancement
popularmodelsthatfallinto
introductionofso­calledstochasticv o l a t i l i t y models.Oneofthemost
presentedinEquation10­7. thatcategoryis thatofHeston(1993),whichis
vEquation10­7.StochasticdifferentialequationsforHestonstochastic
olatility model

Themeaning of thesinglevariables and parameterscannowbeinferred


easilyfromthediscussionofthegeometricBrownianmotionandthe
square­rootdiffusion.Theparameterᵰrepresentstheinstantaneous
correlationbetweenthetwostandardBrownian motions. Thisallows
ustoaccountforastylizedfactcalledtheleverageeffect,whichinessence
statesthatv o l a t i l i t y goesupintimesofstress(decliningmarkets)andgoes
Considerthefollowingparameterizationofthe model:
downintimesofabullmarket(risingmarkets).
In [32]: S0r ==0.05100.
v0kappa=0.1= 3.0
theta
sigma == 0.250.1
Trho= 1.0= 0.6
Toaccountforthecorrelationbetweenthetwostochasticprocesses,we
needtodeterminetheCholeskydecompositionofthecorrelationmatrix:
In [33]:corr_mat=
corr_mat[0, np.zeros((2,
:] = [1.0, 2))
rho]
corr_mat[1, :]= [rho,1.0]
cho_mat = np.linalg.cholesky(corr_mat)
In [34]: cho_mat
Out[34]: array([[[0.6,
1. , 0.0.8]])],
forolbothprocesses,lookingtouseset0fortheindex
processandset1forthev
setofrandomnumbers atility process: we generatethewhole
Beforewestartsimulatingthestochasticprocesses,
In [35]: MI == 5010000
ran_num = npr.standard_normal((2, M + 1, I))
Forthev olatility processmodeledbythesquare­rootdiffusionprocess
parameter:
type,weusetheEulerscheme,takingintoaccountthecorrelation
In [36]: dtv ==np.zeros_like(ran_num[0])
T/M
vhvh[0]
v[0]= np.zeros_like(v)
=v0=v0
ranvh[t]=range(1,
for tin M + 1):
np.dot(cho_mat,
= (vh[t - ran_num[:, t, :])np.maximum(vh[t -
1]+kappa*(theta-
1], 0)) * dt + sigma * np.sqrt(np.maximum(vh[t - 1], 0)) *
np.sqrt(dt) *ran[1])
v = np.maximum(vh, 0)
Fortheindexlevelprocess,wealsotakeintoaccountthecorrelationand
usetheexactEulerschemeforthegeometricBrownianmotion:
In [37]: SS[0]= np.zeros_like(ran_num[0])
for rant=inS0=range(1, M + 1): ran_num[:, t, :])
np.dot(cho_mat,
* np.exp((r - *0.5*ran[0]*
S[t] =S[t - 1] np.sqrt(v[t]) dt +
v[t])*np.sqrt(dt))
isquare­rootdiffusion:correlation is easilyandconsistentlyaccountedfor
sinceweonlydrawstandardnormallydistributedrandomnumbers.There
ThisillustratesanotheradvantageofworkingwiththeEulerschemeforthe
s nosimplewayofachievingthesamewithamixedapproach,usingEuler
vfortheindexandthenoncentralchisquare­basedexactapproachforthe
Figure10­11
olatility process.
showsthesimulationresultsasahistogramforboththeindex
levelprocessandthevolatilityprocess:
In [38]: ax1.hist(S[-1],
fig, (ax1, ax2) bins=50)
= plt.subplots(1, 2, figsize=(9, 5))
ax1.set_xlabel('index
ax2.hist(v[-1],
ax1.grid(True)
ax1.set_ylabel('frequency')
bins=50)level')
ax2.set_xlabel('volatility')
ax2.grid(True)
Aninspectionofthefirst10simulatedpathsofeachprocess(
12)showsthatthev o l a t i l i t y process i s cf. Figure10­
driftingpositivelyonaverageand
thatit, asexpected,convergestoᵰv =0.25:
In(7,[39]:
6)) fig, (ax1, ax2):10],= plt.subplots(2, 1, sharex=True, figsize=
ax1.plot(S[:,
ax1.set_ylabel('index lw=1.5)
level')
ax1.grid(True)
ax2.plot(v[:, :10], lw=1.5)
ax2.set_xlabel('time')
ax2.set_ylabel('volatility')
ax2.grid(True)

Figure10­11.Simulatedstochasticvolatility modelatmaturity
Figure10­12.Simulatedstochasticvolatility modelpaths
Finally,l
process. e
In t ustakeabrieflooka
bothdatasefact,thisis much t thes t a t i s t i c s forthel
ts, showingaprettyhighmaximumvaluefortheindexlevel a
higherthanageometricBrownianmotionwiths t pointintimefor
constantvolatility couldeverclimb,ceterisparibus:
In [40]: print_statistics(S[-1], v[-1])
Out[40]: statistic data set1 data set 2
---------------------------------------------
sizemin 10000.000 10000.000
max 19.814
600.080 0.174
0.322
meanstd 108.818
52.535 0.243
0.020
skew 1.702
kurtosis 5.407 0.071 0.151
Jumpdiffusion
Stochasticv o l a t i l i t y andtheleverageeffectarestylized(empirical)facts
theexistenceofjumpsinassetpricesand,forexample,volatility.In1976,
foundinanumberofmarkets.Anotherimportantstylizedempiricalfactis
Mertonpublishedhisjumpdiffusionmodel,enhancingtheBlack­Scholes­
Mertonsetupbyamodelcomponentgeneratingjumpswithlog­normal
distribution.Therisk­neutralSDE is presented in Equation10­8.
Equation10­8.StochasticdifferentialequationforMertonjumpdiffusion
model
dSt =(r –rJ)Stdt+ᵰStdZt +JtStdNt
Forcompleteness,herei
meaning: s anoverviewofthevariables’andparameters’
St Indexlevelat datet
r Constantrisklessshortrate
Driftcorrectionforjumptomaintainriskneutrality
M
Constantvolatility ofS
Zt StandardBrownianmotion
Jt Jump at datetwithdistribution
with
Nasthecumulativedistributionfunctionofastandardnormal
randomvariable
Nt Poissonprocesswithintensityᵰ
the arestandardnormallydistributedandthefor arePoissondistributed
Equation10­9presentsanEulerdiscretization thejumpdiffusionwhere
withintensityᵰ.
Equation10­9.EulerdiscretizationforMertonjumpdiffusionmodel
Giventhediscretizationscheme,considerthefollowingnumerical
parameterization:
In [41]: S0r ==0.05100.
sigma
lamb= =0.750.2
mudelta= -0.6=0.25
T = 1.0
Tosimulatethejumpdiffusion,weneedtogeneratethreesetsof
(independent)randomnumbers:
In [42]: MI =50
rjdt===T10000
lamb*
/ M (np.exp(mu + 0.5 * delta ** 2) - 1)
Ssn1S[0]= =npr.standard_normal((M
np.zeros((M
=S0 + 1, I)) + 1, I))
poisn2 = npr.poisson(lamb
npr.standard_normal((M* dt, (M+ 1,+ 1,I))I))
for tS[t]in range(1,
= S[t - M +* 1,(np.exp((r
1] 1): - rj - 0.5 * sigma ** 2) *
dt
+*+ poi[t])
sigma * np.sqrt(dt)*
(np.exp(mu sn1[t]) 1)
+ delta* sn2[t])-
S[t] = np.maximum(S[t], 0)
moreright­skewedinFigure10­13comparedtoatypicallog­normal
comeasasurprisethatthef
Sincewehaveassumedahighlynegativemeanforthejump,itshouldnot
inal valuesofthesimulatedindexlevelare
distribution:
In [43]: plt.hist(S[-1], bins=50)
plt.ylabel('frequency')
plt.xlabel('value')
plt.grid(True)
Figure10­13.Simulatedjumpdiffusionatmaturity
Thehighlynegativejumps can also be foundinthefirst 10simulatedindex
levelpaths,aspresentedinFigure10­14:
In [44]: plt.plot(S[:,
plt.ylabel('index:10],level')
plt.xlabel('time') lw=1.5)
plt.grid(True)
Figure10­14.Simulatedjumpdiffusionpaths
VarianceReduction
Notonlybecauseofthefactthatthe Python functionswehaveusedsofar
closeenoughtotheexpected/desiredones.Forexample,youwouldexpect
generatepseudorandomnumbers,butalsoduetothevaryingsizesofthe
asetofstandardnormallydistributedrandomnumberstoshowameanof0
samplesdrawn,resultingsetsofnumbersmightnotexhibitstatisticsreally
andastandarddeviationof1.Letuscheckwhatstatistics differentsetsofseed
valuefortherandomnumbergenerator:
randomnumbersexhibit.Toachievearealisticcomparison,wefixthe
In [45]: print
print "%15s
31 * %15s" % ('Mean', 'Std. Deviation')
"-"
for npr.seed(1000)
iin range(1, 31,2):
sn = npr.standard_normal(i
print ** 2 * 10000)sn.std())
"%15.12f %15.12f" % (sn.mean(),
Out[45]: -------------------------------
Mean Std. Deviation
-0.011870394558
-0.002815667298 1.008752430725
1.002729536352
-0.003847776704
-0.003058113374 1.000594044165
1.001086345326
-0.001685126538
-0.001175212007 1.001630849589
1.001347684642
-0.000803969036
-0.000601970954 1.000159081432
0.999506522127
-0.000147787693
-0.000313035581 0.999571756099
0.999646153704
-0.000178447061
0.000096501709 0.999677277878
0.999684346792
-0.000135677013
-0.000015726986 0.999823841902
-0.000039368519 0.999906493379
1.000063091949
In [46]: i ** 2 * 10000
Out[46]: 8410000
Theresultsshowthatthes
even t a t i s t i c s “somehow”getbetterthelargerthe
in ourlargestsamplewith more than8,000,000randomnumbers.
numberofdrawsbecomes.Buttheystilldonotmatchthedesiredones,
Fortunately,thereareeasy­to­implement,genericvariancereduction
techniquesavailabletoimprovethematchingofthefirsttwomomentsof
variates.Thisapproachsimplydrawsonlyhalfthedesirednumberof
the(standard)normaldistribution.Thefirsttechniqueistouseantithetic
randomdraws,andaddsthesamesetofrandomnumberswiththeopposite
signafterward.[38] Forexample,iftherandomnumbergenerator(i.e., the
respective Python
to
0.5isadded theset.
WithNumPy function)draws0.5,thenanothernumberwithvalue–
concatenate:this is conciselyimplementedbyusingthefunction
sn = np.concatenate((sn,
In [47]: sn= npr.standard_normal(10000
-sn)) / 2)
np.shape(sn)
Out[47]: (10000,)
Thefollowingrepeatstheexercisefrombefore,t
variates: his timeusingantithetic
In[48]: print
print "%15s
31 * %15s" % ('Mean', 'Std. Deviation')
"-"
i in range(1, 31, 2):
for npr.seed(1000)
sn== np.concatenate((sn,
snprintnpr.standard_normal(i-sn))** 2 * 10000 / 2)
"%15.12f%15.12f" % (sn.mean(), sn.std())
Out[48]: -------------------------------
Mean Std. Deviation
0.000000000000 1.009653753942
-0.000000000000 1.000413716783
0.000000000000
-0.000000000000 1.002925061201
1.000755212673
0.000000000000
-0.000000000000 1.001636910076
1.000726758438
-0.000000000000
0.000000000000 1.001621265149
1.001203722778
-0.000000000000
0.000000000000 1.000556669784
1.000113464185
-0.000000000000
0.000000000000 0.999435175324
0.999356961431
-0.000000000000
-0.000000000000 0.999641436845
-0.000000000000 0.999642768905
0.999638303451
Asyouimmediatelynotice,tcomeasa
perfectly—whichshouldnot surprise.Thisfollowsfromthe fact
his approachcorrectsthefirstmoment
thatwheneveranumber
suchp airs, themeani nis drawn, –ni s alsoadded.Sinceweonlyhave
s equalto0overthewholesetofrandomnumbers.
However,t h i s approachdoesnothaveanyinfluenceonthesecond
moment,thestandarddeviation.
correctinonestepboththefirst andsecondmoments:
Usinganothervariancereductiontechnique,calledmomentmatching,helps
In[49]: sn =npr.standard_normal(10000)
In [50]: sn.mean()
Out[50]: -0.001165998295162494
In [51]: sn.std()
Out[51]: 0.99125592020460496
Bysubtractingthemeanfromeverysinglerandomnumberanddividing
numbersmatching the desiredfirst andsecondmomentsofthestandard
everysinglenumberbythestandarddeviation,wegetasetofrandom
normaldistribution(almost)perfectly:
In [52]: sn_new = (sn - sn.mean()) / sn.std()
In [53]: sn_new.mean()
Out[53]: -2.3803181647963357e-17
In [54]: sn_new.std()
Out[54]: 0.99999999999999989
Thefollowingfunctionu t i l i z e s theinsightwithregard to variancereduction
techniquesandgeneratesstandardnormalrandomnumbersforprocess
simulationusingeithertwo,one,ornovariancereductiontechnique(s):
In [55]: defgen_sn(M,
''' FunctionI, anti_paths=True,
to generate randommo_match=True):
numbersfor simulation.
Parameters
==========
M : intnumber of time intervals for discretization
I : intnumber of paths to be simulated
anti_paths:
useof antithetic
Boolean variates
mo_mathuse ofmoment
: Boolean matching
'''ifanti_paths is True: -sn),+ 1, I / 2))
snsn == npr.standard_normal((M
else:sn = np.concatenate((sn,
npr.standard_normal((M
axis=1)
+ 1, I))
if mo_match is True:
sn = (sn-sn.mean())/ sn.std()
returnsn
Valuation
OneofthemostimportantapplicationsofMonteCarlosimulationisthe
valuationofcontingentclaims(options,derivatives,hybridinstruments,
etc.).Simplystated,inarisk­neutralworld,thevalueofacontingentclaim
isthediscountedexpectedpayoffundertherisk­neutral(martingale)
measure.Thisistheprobabilitymeasurethatmakesallriskfactors(stocks,
TheoremofAssetPricing,theexistence
indices,etc.)driftattherisklessshortratofsuchaprobabilitymeasurei
e. AccordingtotheFundamentals
equivalenttotheabsenceofarbitrage.
agivenprice(theso­calledstrikeprice).Letusf
(Europeanoption),oroveraspecifiedperiodoftime(Americanoption),at
Afinancialoptionembodiestherighttobuy(
option)aspecifiedfinancialinstrumenta t agiven(maturity)date
caliroption)ors
st considerthemuch
el (put
simplercaseofEuropeanoptions in termsofvaluation.
EuropeanOptions
h(SThepayoffofaEuropeanc
T)Ģmax(ST– K,0),whereSal option onan indexat maturityisgivenby
T
ifortherelevantstochasticprocess( i s theindexlevela t maturitydateTandK
s thestrikeprice.Givena, orincompletemarketsthe,risk­neutralmeasure
e.g., geometricBrownianmotion),the
priceofsuchanoptionisgivenbytheformulainEquation10­10.
Equation10­10.Pricingbyrisk­neutralexpectation
Chapter9brieflysketcheshowtonumericallyevaluateanintegralby
MonteCarlosimulation.Thisapproachis usedinthefollowingandapplied
estimator
toEquation10­10.Equation10­11providestherespectiveMonteCarlo
fortheEuropeanoption,where i s thei t h simulatedindexlevel
at maturity.
Equation10­11.Risk­neutralMonteCarloestimator

Consider now thefollowingparameterizationforthegeometricBrownian


motionandthevaluationfunction gbm_mcs_stat,t maturityissimulated:
takingasaparameter
onlythestrikeprice.Here,onlytheindexlevela
In [56]: S0r ==0.05100.
sigma
T=I = 1.050000= 0.25
def gbm_mcs_stat(K):
Merton by'''Valuationof European call option in Black-Scholes-
Monte Carlo simulation (of index level at maturity)
Parameters
==========
K:float (positive) strike price of the option
Returns
=======
C0 :estimated
float present value of European call option
'# 'ssimulate
n= gen_sn(1,indexI)level at maturity
ST = S0 * np.exp((r+sigma -*0.5*sigma
np.sqrt(T) **sn[1])
* 2)*T
#hT=np.maximum(ST
calculatepayoff at-K,0) maturity
#C0=calculate MCS*estimator
return C0 T) * 1 / I * np.sum(hT)
np.exp(-r
Asareference,considerthecasewithastrikepriceofK=105:
In[57]: gbm_mcs_stat(K=105.)
Out[57]: 10.044221852841922
Europeanputoptionsinadditiontothec
Next,weconsiderthedynamicsimulationapproachandallowfor
al option.Thefunction
gbm_mcs_dyna implementsthealgorithm:
In [58]: Mdef= 50gbm_mcs_dyna(K, option='call'):
Merton by'''Valuation of European options in Black-Scholes-
Monte Carlo simulation (of index level paths)
Parameters
==========
K:float
(positive) strike price of the option
option: string
typeofthe option tobe valued ('call', 'put')
Returns
C0 :float
======= estimated present value of European call option
'''dt=T/M
# simulationof index level paths
SforsnS[0]==S[t]
np.zeros((M
tgen_sn(M, I)+ 1,M *+I))np.exp((r-
S[t -1]
=S0in =range(1, 1): 0.5 *sigma
+ sigma* np.sqrt(dt) * sn[t]) ** 2) * dt
if#case-based
option == calculation of payoff
'call':
else:hThT == np.maximum(S[-1]
np.maximum(K
- K, 0)
- S[-1], 0)
#calculation
C0return= np.exp(-r of MCS
T) estimator
C0 * 1 / I * np.sum(hT)
*
Now,wecancompareoptionpriceestimatesforacal andaputstroke at
thesamelevel:
In [59]: gbm_mcs_dyna(K=110., option='call')
Out[59]: 7.9500085250284336
In [60]: gbm_mcs_dyna(K=110., option='put')
Out[60]: 12.629934942682004
Thequestioni s howwellthesesimulation­basedvaluationapproaches
performrelativetothebenchmarkvaluefromtheBlack­Scholes­Merton
valuationformula.Tofindout,letusgeneraterespectiveoption
values/estimatesforarangeofstrikeprices,usingtheanalyticaloption
pricingformulaforEuropeanc
module BSM_Functions.py: al s inBlack­Scholes­Mertonfoundinthe
In [61]: from bsm_functions
stat_res = [] import bsm_call_value
dyna_res
anal_res == [][]
k_list= np.arange(80., 120.1, 5.)
np.random.seed(200000)
for stat_res.append(gbm_mcs_stat(K))
K ink_list:
dyna_res.append(gbm_mcs_dyna(K))
anal_res.append(bsm_call_value(S0, K, T, r, sigma))
stat_res
dyna_res = np.array(stat_res)
anal_res == np.array(dyna_res)
np.array(anal_res)
irst, wecomparetheresults from thestatic simulationapproachwith
Fpreciseanalyticalvalues:
In(8,[62]:
6)) fig, (ax1, ax2) = plt.subplots(2, 1, sharex=True, figsize=
ax1.plot(k_list,
ax1.plot(k_list, anal_res,
stat_res, 'b', label='analytical')
'ro', label='static')
ax1.set_ylabel('European
ax1.grid(True) call option value')
ax1.legend(loc=0)
ax1.set_ylim(ymin=0)
wi=1.0
100, wi) ax2.bar(k_list-wi / 2, (anal_res - stat_res) / anal_res *
ax2.set_xlabel('strike')
ax2.set_ylabel('difference in %')
ax2.set_xlim(left=75, right=125)
ax2.grid(True)
Figure10­15showsther esults. Allvaluationdifferencesaresmallerthan
1%absolutely.Therearebothnegativeandpositivevaluedifferences.

Figure10­15.Comparisonofstatic anddynamicMonteCarloestimatorvalues
Asimilarpictureemergesforthedynamicsimulationandvaluation
approach,whoseresults are reportedinFigure10­16.Again, al valuation
differencesaresmallerthan1%,absolutelywithbothpositiveandnegative
deviations.Asageneralr ule, thequalityoftheMonteCarloestimatorcan
becontrolledforbyadjustingthenumberoftimeintervalsMusedand/or
thenumberofpathsIsimulated:
(8,In [63]: fig, (ax1, ax2) = plt.subplots(2, 1, sharex=True, figsize=
6)) ax1.plot(k_list, anal_res, 'b', label='analytical')
ax1.plot(k_list, dyna_res,
ax1.set_ylabel('European 'ro',
call label='dynamic')
option value')
ax1.legend(loc=0)
ax1.grid(True)
100, ax1.set_ylim(ymin=0)
wi=1.0ax2.bar(k_list - wi / 2, (anal_res - dyna_res)/ anal_res *
wi) ax2.set_xlabel('strike')
ax2.set_ylabel('difference in %')
ax2.set_xlim(left=75,
ax2.grid(True) right=125)
Figure10­16.Comparisonofstatic anddynamicMonteCarloestimatorvalues
AmericanOptions
ThevaluationofAmericanoptionsis moreinvolvedcomparedtoEuropean
options.Int his case,anoptimalstoppingproblemhastobesolvedtocome
as suchaproblem.Theproblemformulation is
upwithafairvalueoftheoption.Equation10­12formulatesthevaluation
ofanAmericanoption
alreadybased ona discretetimegridforusewithnumericalsimulation.
sense,itisthereforemorecorrect to speakof optionvaluegiven a
an
Bermudanexercise.Forthetimeintervalconvergingtozerolength,the
valueoftheBermudanoptionconvergestotheoneoftheAmericanoption. In
Equation10­12.Americanoptionpricesasoptimalstoppingproblem
s from thepaperbyLongstaffandSchwartz(2001). It
Thealgorithm we describeinthefollowingiscalledLeast­SquaresMonte
Carlo(LSM)andi
canbeshownthat the valueofanAmerican(Bermudan)optiona
s) = max(h(s),C(
t t s) , where t any
givendatetis givenasVist(theso­calledcontinuationvalueoftheoption
givenanindexlevelofSt =s.
intervalsofequalsize
continuationvalueforpathiattime
Ģ
ConsidernowthatwehavesimulatedIpathsoftheindexleveloverMtime
.DefineYt,i e–rᵮtVt+ᵮt,ito bethesimulated
t . We cannotusethis numberdirectly
perfect
sectionofallsuchsimulatedcontinuationvaluestoestimatethe(expected)
becauseitwouldimply foresight.However,wecanusethecross
continuationvaluebyleast­squaresregression.
Givenasetofbasisfunctionsb d,d=1, ,D,thecontinuationvalueis then
given by theregressionestimate ,wheretheoptimalregression
parametersᵯ*arethesolutionoftheleast­squaresproblemstatedin
Equation10­13.
Equation10­13.Least­squaresregressionforAmericanoptionvaluation
Thefunction gbm_mcs_amer implementstheLSMalgorithmforboth
Americancal andputoptions:[39]
In [64]: def '''gbm_mcs_amer(K,
Valuation ofAmerican option in Black-Scholes-Merton
option='call'):
by Monte Carlo simulation by LSM algorithm
Parameters
==========
K : float
(positive) strike price of the option
optiontype: string
of the option to be valued ('call', 'put')
Returns
=======
C0 :estimated
float present value of European call option
'''dt= T / M dt)
df# simulation
= np.exp(-rof*index levels
S = np.zeros((M
S[0] = S0 +1, I))
snfor=tinrange(1,
gen_sn(M, I) M + 1):
S[t] =S[t+ sigma
- 1] ** np.sqrt(dt)
np.exp((r - *0.5sn[t])
* sigma ** 2) * dt
ifoption==
# case-based calculation
'call': ofpayoff
h= np.maximum(S - K, 0)
else:h = np.maximum(K
algorithm - S, 0)
#V LSM= np.copy(h)
for treg=inrange(M - 1, 0, -1):V[t + 1] * df, 7)
np.polyfit(S[t],
C= np.polyval(reg, S[t])
#MCSV[t]estimator
= np.where(C >h[t], V[t + 1] * df, h[t])
C0 = dfC0* 1 / I * np.sum(V[1])
return
In [65]: gbm_mcs_amer(110., option='call')
Out[65]: 7.7789332794493156
In [66]: gbm_mcs_amer(110., option='put')
Out[66]: 13.614023206242445
TheEuropeanvalueofanoptionrepresentsalowerboundtotheAmerican
option’svalue.Thedifferencei
valuesforthesamerangeofstrikess generallycalledtheearlyexercise
as before to estimatetheoption
premium.Inwhatfollows,wecompareEuropeanandAmericanoption
premium.Thistime we takeputs:[40]
In [67]: euro_res
amer_res == [][]
k_list in= k_list:
np.arange(80., 120.1, 5.)
for Keuro_res.append(gbm_mcs_dyna(K,
amer_res.append(gbm_mcs_amer(K, 'put'))
'put'))
euro_res=np.array(euro_res)
amer_res = np.array(amer_res)
Figure10­17showsthatfortherangeofstrikeschosenthepremiumcan
rise toup to10%:
(8,In [68]:
6)) fig, (ax1, ax2) = plt.subplots(2, 1, sharex=True, figsize=
ax1.plot(k_list,
ax1.plot(k_list, euro_res,
amer_res, 'b',
'ro',label='European
label='Americanput')
put')
ax1.set_ylabel('call
ax1.grid(True) option value')
wiax1.legend(loc=0)
ax2.bar(k_list
=1.0 - wi / 2, (amer_res - euro_res) / euro_res *
100, wi) ax2.set_xlabel('strike')
ax2.set_ylabel('early
ax2.set_xlim(left=75, exercise premium in %')
right=125)
ax2.grid(True)

Figure10­17.ComparisonofEuropeanandLSMMonteCarloestimatorvalues
RiskMeasures
Inadditiontovaluation,riskmanagementis anotherimportantapplication
areaofstochasticmethodsandsimulation.Thissectionillustratesthe
calculation/estimationoftwo
todayinthefinanceindustry. ofthemostcommonriskmeasuresapplied
Value­at­Risk
Value­at­risk(VaR)i
muchdebatedone.Lovedbypractitionersfori
s oneofthemostwidelyusedriskmeasures,anda
ts intuitiveappeal,itis
widelydiscussedandcriticizedbymany—mainlyontheoreticalgrounds,
thati
thisshortly).Inwords,VaRi
USD,EUR,JPY)indicatingaloss(ofaportfolio,asingleposition,etc.)
withregardtoitslimiteda
s notexceededwithsomeconfidencelevel(probability)overagiven
bilistanumberdenotedincurrencyunits(
y tocapturewhatis calledtailrisk(moreon
e.g.,
periodoftime.
50,000USDa t aconfidencelevelof99%overatimeperiodofhas aVaR
Considerastockposition,worth1millionUSDtoday,that 30 daysof
(onemonth).ThisVaRfiguresaysthatwithaprobabilityof99%(
lossi
ofthelossoncealossbeyond50,000USDoccurs—i.e.,ifthemaximum
notexceed50,000USD.However,itdoesnotsayanythingaboutthesize
99outof100cases),thelosstobeexpectedoveraperiodof30dayswill
s 100,000or500,000USDwhattheprobabilityofsuchaspecific i . e . , in
“higherthanVaRloss”is.Allitsaysisthatthereisa1%probabilitythata
ofa minimumof50,000USDorhigherwilloccur.
lossAssumeagainthatweareinaBlack­Scholes­Mertonsetupandconsider
dateT=30/365(i.e.,weassumeaperiodof30days): future
thefollowingparameterizationandsimulationofindexlevels ata
In [69]: S0 = 100
rsigma= 0.05= 0.25
TI == 3010000/ 365.
ST = S0 * np.exp((r+ sigma-*0.5*sigma
np.sqrt(T) *** npr.standard_normal(I))
2)*T
relative to thevalueoftheposition todayinasortedmanner,i.e., fromthe
ToestimateVaRfigures,weneedthesimulatedabsoluteprofitsandlosses
severestlosstothelargestprofit:
In [70]: R_gbm = np.sort(ST - S0)
Figure10­18showsthehistogramofthesimulatedabsoluteperformance
values:
In [71]: plt.hist(R_gbm, bins=50)return')
plt.xlabel('absolute
plt.ylabel('frequency')
plt.grid(True)
Figure10­18.Absolutereturns of geometricBrownianmotion(30d)
Havingthe ndarray objectwiththesortedr
scoreatpercentile alreadydoesthet r i c k . eAllwehavetodoi
sults, thefunction s todefine
thepercentiles(inpercentvalues)inwhichweareinterested.Inthe
99.9%.The30­dayVaRgivenaconfidencelevelof99.9%inthis caseis
objectpercs,0.1translatesintoaconfidencelevelof100%–0.1%= list
20.2currencyunits,whileitis8.9atthe90%confidencelevel:
In [72]: percs = [0.01, 0.1, 1., 2.5, 5.0, 10.0]
varprint= scs.scoreatpercentile(R_gbm, percs)'Value-at-Risk')
"%16s
forprintprint %16s"
pair33 in*"-"
"%16.2f % ('Confidence
zip(percs, var): Level',
%16.3f" % (100 - pair[0], -pair[1])
Confidence Level Value-at-Risk
Out[72]: ---------------------------------
99.99
99.90 26.072
20.175
99.00
97.50 15.753
13.265
95.00
90.00 11.298 8.942
Asasecondexample,recallthejumpdiffusionsetupfromMerton,which
wewanttosimulatedynamically:
In [73]: rjdt =lamb
= 30. /*365(np.exp(mu
/ M + 0.5 * delta ** 2) - 1)
S = np.zeros((M
S[0] = S0 + 1, I))
sn1sn2 =npr.standard_normal((M
=npr.standard_normal((M ++ 1,1, I))I))
forpoi S[t]
t= innpr.poisson(lamb
range(1, M + * 1):dt, (M + 1, I))
1, ** 2)
dt = S[t - 1] * (np.exp((r - rj - 0.5 * sigma *
+ (np.exp(mu
+ sigma * np.sqrt(dt)
+delta* *sn2[t])-1)
sn1[t])
S[t] = np.maximum(S[t],
* poi[t])
0)
In [74]: R_jd = np.sort(S[-1] - S0)
Inthis case,withthejumpcomponenthavinganegativemean,wesee
somethinglikeabimodaldistributionforthesimulatedprofits/lossesin
Figure10­19.Fromanormaldistributionpointofview,wehaveastrongly
pronounced left fat ail:
In [75]: plt.hist(R_jd, bins=50)
plt.xlabel('absolute return')
plt.ylabel('frequency')
plt.grid(True)

Figure10­19.Absolutereturnsofjumpdiffusion(30d)
Fort h i s processandparameterization,theVaRover30daysa t the90%
levelisalmostidentical,whileitismorethanthreetimesashighatthe
99.9%levelaswiththegeometricBrownianmotion(71.8vs.20.2currency
units):
In [76]: percs = [0.01, 0.1, 1., 2.5, 5.0, 10.0]
varprint= scs.scoreatpercentile(R_jd, percs) 'Value-at-Risk')
"%16s
forprintprint %16s"
"%16.2f % ('Confidence
zip(percs, var):
pair33 in*"-" Level',
%16.3f" % (100 - pair[0], -pair[1])
Confidence Level Value-at-Risk
Out[76]: ---------------------------------
99.99
99.90 75.029
71.833
99.00
97.50 55.901
45.697
95.00
90.00 25.993 8.773
Thisi l u s t r a t e s theproblem of capturingthetailrisksooftenencounteredin
financialmarketsbythestandardVaRmeasure.
measuresbehavecompletelydifferentlygivenarangeoftypicalconfidence
casesindirectcomparisongraphically.AsFigure10­20reveals,theVaR
Tofurtheril ustrate thepoint,welastly showtheVaRmeasuresforboth
levels:
In [77]: percs =
gbm_var list(np.arange(0.0,
= 10.1, 0.1))percs)
scs.scoreatpercentile(R_gbm,
jd_var = scs.scoreatpercentile(R_jd, percs)
In [78]: plt.plot(percs,
plt.plot(percs, gbm_var,
jd_var, 'b',lw=1.5,
'r', lw=1.5,label='JD')
label='GBM')
plt.xlabel('100 - confidence level [%]')
plt.legend(loc=4)
plt.grid(True)
plt.ylabel('value-at-risk')
plt.ylim(ymax=0.0)
Figure10­20.Value­at­riskforgeometricBrownianmotionandjumpdiffusion
CreditValueAdjustments
Otherimportantriskmeasuresarethecreditvalue­at­risk(CVaR)andthe
creditvalueadjustment(CVA),whichi s derivedfromtheCVaR.Roughly
speaking,CVaRisameasurefortheriskresultingfromthepossibilitythat
of defaultandthe(average)loss
thecounterpartygoesbankrupt.Insuchacasetherearetwomain
acounterpartymightnotbeabletohonoritsobligations—forexample,if
assumptionstobemade:probability
Tomakei l
t specific,consideragainthebenchmarksetupofBlack­Scholes­e v e l .
Mertonwiththefollowingparameterization:
In [79]: S0r ==0.05100.
sigma= 0.2
TI = 1.100000
ST = S0 * np.exp((r
+ sigma-*0.5np.sqrt(T)
* sigma ***npr.standard_normal(I))
2) * T
Inthesimplestcase,oneconsidersafixed(average)losslevelLandafixed
probabilitypfordefault(peryear)ofacounterparty:
In [80]: L = 0.5
In [81]: p = 0.01
UsingthePoissondistribution,defaultscenariosaregeneratedasfollows,
takingintoaccountthatadefaultcanonlyoccuronce:
np.where(D > 1,* 1,T, D)I)
In[82]: DD == npr.poisson(p
equaltothecurrentvalueoftheassettoday(uptodifferencesresultingbe
Withoutdefault,therisk­neutralvalueofthefutureindexlevelshould
fromnumericaler ors):
In [83]: np.exp(-r * T) * 1 / I * np.sum(ST)
Out[83]: 99.981825216842921
TheCVaRunderourassumptionsis calculatedasfollows:
In [84]: CVaR
CVaR = np.exp(-r * T) * 1 / I * np.sum(L * D * ST)
Out[84]: 0.5152011134161355
Analogously,thepresentvalueofthea
givenasfollows: s et, adjustedforthecreditrisk, is
In [85]: S0_CVA
S0_CVA = np.exp(-r * T) * 1 / I * np.sum((1 - L * D) * ST)
Out[85]: 99.466624103426781
currentassetvalue:
Thisshouldbe(roughly)thesameassubtractingtheCVaRvaluefromthe
In[86]: S0_adj
S0_adj = S0 - CVaR
Out[86]: 99.48479888658386
Intocreditrisk,whichistobeexpectedgiventheassumeddefaultprobability
this particularsimulationexample, we observeroughly1,000lossesdue
of1%and100,000simulatedpaths:
In [87]: np.count_nonzero(L * D * ST)
Out[87]: 1031
Figure10­21showsthecompletefrequencydistributionofthelossesdueto
adefault.Ofcourse,inthelargemajorityofcases(i.e.,inabout99,000of
the100,000cases)thereisnolosstoobserve:
In [88]: plt.hist(L * D * ST, bins=50)
plt.xlabel('loss')
plt.ylabel('frequency')
plt.grid(True)
plt.ylim(ymax=175)

Figure10­21.Lossesduetorisk­neutrallyexpecteddefault(stock)
ConsidernowthecaseofaEuropeanc
currencyunitsatastrikeof100: a l option.I t s valuei s about10.4
In [89]: KhT==100.np.maximum(ST - K, 0)
C0C0 = np.exp(-r * T) * 1 / I * np.sum(hT)
Out[89]: 10.427336109660052
TheCVaRisabout5centsgiventhesameassumptionswithregardto
probabilityofdefaultandlosslevel:
In [90]: CVaR
CVaR = np.exp(-r * T) * 1 / I * np.sum(L * D * hT)
Out[90]: 0.053822578452208093
Accordingly,theadjustedoptionvalueis roughly5centslower:
In [91]: C0_CVA
C0_CVA = np.exp(-r * T) * 1 / I * np.sum((1 - L * D) * hT)
Out[91]: 10.373513531207843
Comparedtothecaseofaregulara
differentcharacteristics. We only sees e t
al , theoptioncasehassomewhat
i t l e morethan500lossesdue to a
default,althoughweagainhaveabout1,000defaults.Thisresultsfromthe
factthatthepayoffofthe
zero: a t ahigh
option maturityhas probabilityof being
In [92]: np.count_nonzero(L * D * hT) # number of losses
Out[92]: 582
In [93]: np.count_nonzero(D) # number of defaults
Out[93]: 1031
In [94]: I - np.count_nonzero(hT) # zero payoff
Out[94]: 43995
Figure10­22showsthattheCVaRfortheoptionhasacompletelydifferent
frequencydistributioncomparedtotheregularassetcase:
In [95]: plt.hist(L*D* hT, bins=50)
plt.xlabel('loss')
plt.ylabel('frequency')
plt.grid(True)
plt.ylim(ymax=350)

Figure10­22.Lossesduetorisk­neutrallyexpecteddefault(calloption)
Conclusions
Thischapterdealswithmethodsandtechniquesimportanttothe
applicationofMonteCarlosimulationinfinance.Inparticular,i
howtogenerate(pseudo)randomnumbersbasedondifferentdistribution t shows
laws.Itproceedswiththesimulationofrandomvariablesandstochastic
areasare
European and Americanexerciseandthe
processes,whichi
discussedinsomedepthint his chapter:valuation of optionswith
s importantinmanyfinancialareas.Twoapplication
value­at­riskandcreditvalueadjustments.estimationofriskmeasureslike
Thechapterillustratesthat Python incombinationwithNumPyiswell
suitedtoimplementingevensuchcomputationallydemandingtasksasthe
valuationofAmericanoptionsbyMonteCarlosimulation.Thisi NumPyares mainly
implementedin C,which leadstoconsiderablespeedadvantagesingeneral
duetothefactthatthemajorityoffunctionsandclassesof
overpurePythoncode.Afurtherbenefiti
oftheresultingcodedue s thecompactnessandreadability
to vectorizedoperations.
FurtherReading
Theoriginalarticle introducingMonteCarlosimulationtofinanceis:
Boyle,Phelim(1977):“Options:AMonteCarloApproach.”Journalof
FinancialEconomics,Vol.4,No.4,pp.322–338.
Otheroriginalpaperscitedint h i s chapterare(seealsoChapter16):
Black,FischerandMyronScholes(1973):“ThePricingofOptionsand
CorporateLiabilities.” JournalofPoliticalEconomy,Vol.81,No.3,
Cox,John,
pp.638–659.
JonathanIngersollandStephenRoss(1985):“ATheoryof
theTermStructureofInterestRates.”Econometrica,Vol.53,No.2,pp.
Heston,Steven(1993):“AClosed­FromSolutionforOptionswith
385–407.
StochasticVolatilitywithApplicationstoBondandCurrencyOptions.”
TheReviewofFinancialStudies,Vol.6,No.2,327–343.
Merton,Robert(1973):“TheoryofRationalOptionPricing.”Bell
Journal of EconomicsandManagementScience,Vol.4,pp.141–183.
ReturnsAreDiscontinuous.”
Merton,Robert(1976):“OptionPricingWhentheUnderlyingStock
No.3,pp.125–144. JournalofFinancialEconomics,Vol.3,
ThebooksbyGlassermann(2004)andHilpisch(2015)covera
thischapterindepth(however,thef i r s t l topicsof
onedoesnotcoveranytechnical
implementationd e t a i l s ) :
Glasserman,Paul(2004):MonteCarloMethodsinFinancial
Engineering.Springer,NewYork.
Hilpisch,Yves(2015):DerivativesAnalyticswithPython.Wiley
Finance,Chichester,England.http://www.derivatives­analytics­with­
python.com.
IAmericanoptionsbyMonteCarlosimulationtofinallybepublished:
t tookuntil theturnofthecentury foran efficientmethodtovalue
Longstaff,FrancisandEduardoSchwartz(2001):“ValuingAmerican
OptionsbySimulation:ASimpleLeastSquaresApproach.”Reviewof
FinancialStudies,Vol.14,No.1,pp.113–147.
Abroadandin­depthtreatmentofcreditriski s providedi n :
Duffie,DarrellandKennethSingleton(2003):CreditRisk—Pricing,
Measurement,andManagement.PrincetonUniversityPress,Princeton,
NJ.
Forsimplicity,wewillspeakofrandomnumbersknowingthatal numbersused
[35]willbepseudorandom.
[36] Cf.http://docs.scipy.org/doc/numpy/reference/routines.random.html.
[37] Cf.http://docs.scipy.org/doc/numpy/reference/routines.random.html.
[38]The describedmethodworksforsymmetricmedian0randomvariablesonly,
likestandardnormallydistributedrandomvariables,whichwealmostexclusively
usethroughout.
[39] Foralgorithmicd
[40]Since etails, refer to Hilpisch(2015).
wedonotassumeanydividendpayments(havinganindexinmind),there
generallyisnoearlyexercisepremiumforcal options(i.e., noincentivetoexercise
theoptionearly).
Chapter11.Statistics
Icanproveanythingbys
—GeorgeCanning t a t i s t i c s exceptthet r u t h .
Sbecomeindispensibleforfinance.Thisalsoexplainsthepopularityof
tatistics is avastfield. Thetoolsandresultsthefield provideshave
domain­specificlanguageslike R inthefinanceindustry.Themore
tohaveavailableeasy­to­useandhigh­performingcomputationalsolutions.
elaborateandcomplexstatistical modelsbecome,themoreimportantitis
thebroadness of thefield of statihsitsiconecannot
Asinglechapterinabookliket do justicetotherichnessand
s. Therefore,theapproach—asinmany
otherchapters—istofocus
importance or on selectedtopicsthatseemofparamount
thatprovideagoodstartingpointwhenitcomestotheuseof
Python fortheparticulartasksa
Normalityt ests t hand.Thechapterhasfourfocalpoints:
Alargenumberofimportantfinancialmodels,likethemean­variance
portfoliotheoryandthecapitalassetpricingmodel(CAPM),restonthe
to testagiventimeseries for
thischapterpresentssomeapproaches
assumptionthatreturnsofsecuritiesarenormallydistributed;therefore,
normalityofreturns.
Portfoliotheory
Modernportfoliotheory(MPT)canbeconsideredoneofthebiggest
successesofstatisticsthatinfinance;startingintheearly1950swiththe
workofpioneerHarryMarkowitz,t his theorybegantoreplacepeople’s
reliance on judgmentandexperiencewithrigorousmathematicaland
statisticalmethodswhenitcomestotheinvestmentofmoneyin
financialmarkets.In sense,itismaybethefirstrealquantitative
approachinfinance.
Principalcomponentanalysis
Principalcomponentanalysis(PCA)isquiteapopulartoolinfinance,
strategiesoranalyzingtheprincipalcomponentsthatexplainthe
forexample,whenitcomestoimplementingequityinvestment
movementininterestrates. Its majorbenefitis“complexityreduction,”
correlatedtimeseriescomponents; we il ustrate theapplicationbased
orthogonal)componentsfromapotentiallylargesetofmaybehighly
achievedbyderivingasmallsetoflinearlyindependent(noncorrelated,
ontheGermanDAXindexandthe30stockscontainedinthatindex.
Bayesianregression
beliefsofagentsandtheupdatingofbeliefstos
Onafundamentallevel, Bayesianstatistics introducesthenotionof
tatistics; whenitcomes
tolinearregression,forexample,t
finance,whichiswhy
estimates( hsome(advanced)applicationsint
is mighttakeontheformofhaving
we il ustarerateratherpopular and importantin his
of theregressionline).
astatisticaldistributionforregressionparametersinsteadofsinglepoint
chapter.
Nowadays,Bayesianmethods
e.g., fortheinterceptandslope
Manyaspects in this chapterrelate to dateand/ortimeinformation.Refer
AppendixCforanoverviewofhandlingsuchdatawith Python,NumPy, to
and pandas.
NormalityTests
Thenormaldistributioncanbeconsideredthemostimportantdistribution
theory.Amongothers,thefollowingcornerstonesoffinancialtheoryrest
infinanceandoneofthemajorstatistical buildingblocksoffinancial to
Portfoliotheory
alargeextentonthenormaldistributionofstockmarketreturns:
Whenstockreturnsarenormallydistributed,optimalportfoliochoice
can becastintoasettingwhereonlythemeanreturnandthevariance
ofthereturns(orthev olatility) aswellasthecovariancesbetween
differentstocksarerelevantforaninvestmentdecision( i.e., anoptimal
portfoliocomposition).
Capitalassetpricingmodel
Again,whenstockreturnsarenormallydistributed,pricesofsingle
index;therelationshipisgenerallyexpressedbyameasure for the
stockscanbeelegantlyexpressedinrelationshiptoabroadmarket
comovementofasinglestockwiththemarketindexcalledbeta(ᵯ).
Efficientmarketshypothesis
Anefficientmarketisamarketwherepricesreflect al available
information,where“all”canbedefinedmorenarrowlyormorewidely
stockpricesfluctuaterandomlyandreturnsarenormallydistributed.
(e.g., asin“al publiclyavailable”informationvs.includingalso“only
privatelyavailable”information);i f this hypothesisholdstrue,then
Optionpricingtheory
Brownianmotionasthemodelforastock’srandomfluctuationsover
famousBlack­Scholes­Mertonoptionpricingformulausesageometric
modelingofrandomstock(andothersecurity)pricemovements;the
Brownianmotionisthestandardandbenchmarkmodelforthe
time,leadingtonormallydistributedreturns.
Thisbyfarnonexhaustivel
assumptioninfinance. ist underpinstheimportanceofthenormality
BenchmarkCase
Tosetthestageforfurtheranalyses,wes
motionasone
modeling.Thefollowing can besaidabouttarthet withthegeometricBrownian
of thecanonicalstochasticprocessesusedinfinancial
characteristicsofpaths from
Normallogreturns
ageometricBrownianmotionS:
Logreturns betweentwotimes0<s<tarenormally
distributed.
Log­normalvalues
Atanytimet>0,thevaluesSt arelog­normallydistributed.
scipy.stats and statsmodels.api: Python libraries, including
Forwhatfollowsweneedanumberof
In [1]: import numpy asnp
np.random.seed(1000)
import
import scipy.statsas
statsmodels.apiscsas sm
import matplotlib.pyplot
import matplotlib as mpl as plt
%matplotlib inline
LetusdefineafunctiontogenerateMonte Carlo
Brownianmotion alsoChapter10): pathsforthegeometric
(see
In [2]: def '''gen_paths(S0,
Generates r, sigma,
Monte T,paths
Carlo M, I):for geometric Brownian
motion.
Parameters
==========
S0:float
:initial
float stock/index value
r constant short rate
sigmaconstant
: floatvolatility
T: float
M :intfinal timehorizon
I :intnumber of time steps/intervals
number of paths to be simulated
Returns
=======
paths:ndarray,
simulated shapegiven(M +the1,parameters
paths I)
'''paths=
dt = float(T)/M
np.zeros((M + 1, I), np.float64)
forpaths[0] =S0 1):
trandin range(1,M+
rand == np.random.standard_normal(I)
2) * dt + paths[t](rand= paths[t - rand.mean()) / rand.std()
- 1] * np.exp((r - 0.5 * sigma **
*rand) return paths sigma* np.sqrt(dt)
Thefollowing i s apossibleparameterization for theMonteCarlo
simulation,generating,incombinationwiththefunction gen_paths,
250,000pathswith50timestepseach:
In [3]: S0r ==0.05100.
sigma
MT==501.0= 0.2
I = 250000
In [4]: paths = gen_paths(S0, r, sigma, T, M, I)
Figure11­1showsthefirst 10simulatedpathsfromthesimulation:
In [5]: plt.plot(paths[:,
plt.grid(True) :10])
plt.xlabel('time
plt.ylabel('indexsteps')
level')
Ourmaini n t e r e s t isinthedistributionofthelogreturns.Thefollowing
codegeneratesan ndarray objectwithal logreturns:
In [6]: log_returns = np.log(paths[1:] / paths[0:-1])

Figure11­1.TensimulatedpathsofgeometricBrownianmotion
Considertheveryfirst simulatedpathoverthe50timesteps:
In [7]: paths[:, 0].round(4)
Out[7]: array([ 100. , 97.821 , 98.5573, 106.1546, 105.899 ,
99.8363, 100.0145, 102.6589, 105.6643, 107.1107, 108.7943,
108.2449, 106.4105, 101.0575, 102.0197, 102.6052, 109.6419,
109.5725, 112.9766, 113.0225, 112.5476, 114.5585, 109.942 ,
112.6271, 112.7502, 116.3453, 115.0443, 113.9586, 115.8831,
117.3705, 117.9185, 110.5539, 109.9687, 104.9957, 108.0679,
105.7822, 105.1585, 104.3304, 108.4387, 105.5963, 108.866 ,
108.3284, 107.0077, 106.0034, 104.3964, 101.0637, 98.3776,
97.135 , 95.4254, 96.4271, 96.3386])
Alog­returnseries for asimulatedpathmightthentakeontheform:
In [8]: log_returns[:, 0].round(4)
Out[8]: array([-0.022 , 0.0075, 0.0743, -0.0024, -0.059 , 0.0018,
0.0261,
0.0095, 0.0057, 0.0289, 0.0136, 0.0156, -0.0051, -0.0171, -0.0516,
0.0177, -0.0411, 0.0663, -0.0006, 0.0306, 0.0004, -0.0042,
0.0167, 0.0128, 0.0241, 0.0011, 0.0314, -0.0112, -0.0095,
0.0047, -0.0645, -0.0053, -0.0463, 0.0288,
-0.0214, -0.0059, -0.0079, 0.0386, -0.0266, 0.0305, -0.0049,
-0.0123, -0.0094, -0.0153, -0.0324, -0.0269, -0.0127, -0.0178,
0.0104, -0.0009])
Thisis somethingonemightexperienceinfinancialmarketsaswell:days
whenyoumakeapositivereturnonyourinvestmentandotherdayswhen
youarelosingmoneyrelativetoyourmostrecentwealthposition.
Thefunction print_statistics
functionfromthe scipy.stats sublibrary. It mainlygeneratesdescribe
isawrapperfunctionforthe amore
(human­)readableoutputforsuchstatisticsasthemean,theskewness,or
thekurtosisofagiven(historicalorsimulated)dataset:
In [9]: def print_statistics(array):
''' Prints selected statistics.
Parameters
==========
array:objectndarrayto generate statistics on
'''sta = scs.describe(array)
"%14s* "-"%15s" % ('statistic', 'value')
print 30"%14s
print
print
print "%14s %15.5f"
%15.5f" % ('size',
('min', sta[0])
sta[1][0])
print
print "%14s
"%14s %15.5f"
%15.5f" %
% ('max',
('mean',sta[1][1])
sta[2])
print
print "%14s %15.5f" % ('std', np.sqrt(sta[3]))
print "%14s
"%14s %15.5f"
%15.5f" %% ('skew',
('kurtosis',sta[4])sta[5])
Forexample,thefollowingshowsthefunctioninaction,usingaflattened
flatten array withallthedatagiven
ndarrayobject
versionofthereturnsa1D ina multidimensional
containingthelogreturns.Themethod
array:
In [10]: print_statistics(log_returns.flatten())
statistic value
Out[10]: ------------------------------
sizemin 12500000.00000
-0.15664
max
meanstd 0.000600.15371
skew 0.02828
0.00055
kurtosis 0.00085
Thedatasetinthiscaseconsistsof12,500,000datapointswith the values
mainlylyingbetween+/–0.15.Wewouldexpectannualizedvaluesof0.05
annualizedvaluesofthedatasetcomeclosetothesevalues,ifnotmatching
forthemeanreturnand0.2forthestandarddeviation(volatility).The
bythemperfectly(multiplythemeanvalueby50andthestandarddeviation
) .
Figure11­2comparesthefrequencydistribution of thesimulatedlog
returnswiththeprobabilitydensityfunction(pdf)ofthenormaldistribution
giventheparameterizationsfor r and sigma. Thefunctionused is
norm.pdfit: fromthe scipy.stats sublibrary.Thereis obviouslyquitea
goodf
Inlabel='frequency')
[11]: plt.hist(log_returns.flatten(),
plt.grid(True) bins=70, normed=True,
plt.xlabel('log-return')
plt.ylabel('frequency')
xplt.plot(x,
= np.linspace(plt.axis()[0], plt.axis()[1])
np.sqrt(M)), 'r', lw=2.0, label='pdf')/ M, scale=sigma /
scs.norm.pdf(x, loc=r
plt.legend()
Figure11­2.Histogram of logreturnsandnormaldensityfunction
Comparingafrequencydistribution(histogram)withatheoreticalpdf is not
theonlywaytographically“test”fornormality.So­calledquantile­quantile
plots(qqplots)arealsowellsuitedforthistask.Here,samplequantile
valuesarecomparedtotheoreticalquantilevalues.Fornormallydistributed
sampledatasets,suchaplotmightlooklikeFigure11­3,withtheabsolute
majorityofthequantilevalues(dots)lyingonastraightline:
In [12]: sm.qqplot(log_returns.flatten()[::500],
plt.grid(True)plt.xlabel('theoretical quantiles') line='s')
plt.ylabel('sample quantiles')
Figure11­3.Quantile­quantileplotforlogreturns
Howeverappealingthegraphicalapproachesmightbe,theygenerally
cannotreplacemorerigoroustestingprocedures.Thefunction
normality_tests combinesthreedifferentstatistical tests:
Skewnesst
Thist e setsstwhethertheskewofthesampledatai
(skewtest)zero). s “normal”(i.e., hasa
valuecloseenoughto
Kurtosist est (kurtosistest)
Similarly,thistestswhetherthekurtosisofthesampledatai s “normal”
(again,closeenoughtozero).
Normalityt est (normaltest) est approachestotest fornormality.
Thiscombinestheothertwot
Wedefinethis functionasfollows:
In [13]: def normality_tests(arr):
''' Tests for normality distribution of given data set.
Parameters
==========
array:objectndarrayto generate statistics on
'''print
print "Skew
"Skew
"Kurt of
test dataset
data
p-value
set %14.3f"
%14.3f" %
% scs.skew(arr)
scs.skewtest(arr)[1]
print "Kurt oftest p-value %14.3f" % scs.kurtosis(arr)
[1] print "Norm test p-value %14.3f" % scs.kurtosistest(arr)
%14.3f" %
scs.normaltest(arr)
[1]
Thetestvaluesindicatethatthelogreturns
or are
—i.e.,theyshowp­valuesof0.05 above: indeednormallydistributed
In [14]: normality_tests(log_returns.flatten())
Out[14]: Skew
Skew oftestdatap-value
set 0.0010.430
Kurt
Kurt of dataset
p-value 0.001
Norm test p-value 0.541
test 0.607
Finally,l ettouscheckwhethertheend­of­periodvaluesareindeedlog­
normallydistributed.Thisboilsdowntoanormalityt e s t as well,since we
thelog­normallydistributedend­of­periodvaluesandthetransformedones
arriveatnormallydistributeddata—ormaybenot).Figure11­4plotsboth
onlyhave transformthedatabyapplyingthelogfunctiontoit(tothen
(“logindexlevel”):
In [15]: f,(ax1,ax2) = plt.subplots(1,
ax1.hist(paths[-1], bins=30) 2, figsize=(9, 4))
ax1.set_xlabel('index level')
ax1.grid(True)
ax1.set_title('regular data')bins=30)
ax1.set_ylabel('frequency')
ax2.hist(np.log(paths[-1]),
ax2.set_xlabel('log index level')
ax2.grid(True)
ax2.set_title('log data')

Figure11­4.Histogramofsimulatedend­of­periodindexlevels
Thes
meantvaluecloseto105andastandarddeviation(
atistics forthedatasetshowexpectedbehavior—forexample,a
volatility) closeto20%:
In [16]: print_statistics(paths[-1])
statistic value
Out[16]: ------------------------------
sizemin 250000.00000
42.74870
max 233.58435
meanstd 105.12645
21.23174
kurtosis 0.61116
skew 0.65182
Thelogindexlevelvaluesalsohaveskewandkurtosisvaluesclosetozero:
In [17]: print_statistics(np.log(paths[-1]))
Out[17]: ------------------------------
statistic value
sizemin 250000.00000
3.75534
max 5.45354
meanstd 4.63517
skew 0.19998
-0.00092
kurtosis -0.00327
Thisdatasetalsoshowshighp­values,providingstrongsupportforthe
normaldistributionhypothesis:
In [18]: normality_tests(np.log(paths[-1]))
Out[18]: Skew
Skew test
of datap-value
set -0.001 0.851
Kurt
Kurt of data set -0.003
Norm test
test p-value
p-value 0.744 0.931
normal
Figure11­5comparesagainthefrequencydistributionwiththepdfofthe
distribution,showingaprettygoodf i t (asnowi s , ofcourse,tobe
expected):
In [19]: plt.hist(log_data,
log_data= np.log(paths[-1])
bins=70, normed=True, label='observed')
plt.xlabel('indexlevels')
plt.grid(True)
xplt.plot(x,
plt.ylabel('frequency')
= np.linspace(plt.axis()[0],
scs.norm.pdf(x, log_data.mean(),
plt.axis()[1])
log_data.std()), 'r', lw=2.0, label='pdf')
plt.legend()

Figure11­5.Histogramoflogindexlevelsandnormaldensityfunction
Figure11­6alsosupportsthehypothesisthatthelogindexlevelsare
normallydistributed:
In [20]: sm.qqplot(log_data,
plt.xlabel('theoreticalline='s')
plt.grid(True) quantiles')
plt.ylabel('sample quantiles')

Figure11­6.Quantile­quantileplotforlogindexlevels
NORMALITY
Thenormalityassumptionwithregardtoreturnsofsecuritiesiscentral
toanumberofimportantfinancialtheories. Python providesefficient
statistical andgraphicalmeans totest whethertimeseriesdatai s
normallydistributedornot.
Real­WorldData
normalityassumptiondoesbeyondthefinanciallaboratory.Wearegoing
Wearenowprettywellequippedtoattackreal­worlddataandseehowthe
Chapter 6),sonc.)webeginwitha
MicrosoftI
toanalyzefourhistoricaltimes feweriesimports: pandas(cf.
indexandtheAmericanS&P500index)andtwostocks(Yahoo!Inc.and
. Thedatamanagementtoolofchoiceis
: twostockindices(theGermanDAX
In [21]: import pandas as pd
import pandas.io.data as web
Herearethesymbolsforthetimeseriesweareinterestedi
readermightofcoursereplacethesewithanyothersymbolofi n. Thecurious
nterest:
In [22]: symbols = ['^GDAXI', '^GSPC', 'YHOO', 'MSFT']
Thefollowingreadsonlythe Adj
DataFrame objectforallsymbols: Close timeseriesdataintoasingle
In [23]: data = pd.DataFrame()
for symdata[sym]=
in symbols:web.DataReader(sym, data_source='yahoo',
start='1/1/2006')['Adj Close']
data = data.dropna()
In [24]: data.info()
Out[24]: <class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2179 entries, 2006-01-03 00:00:00 to 2014-09-
26 00:00:00Data
^GDAXI columns2179(total4 columns):
non-null float64
^GSPC
YHOO 2179
2179 non-null
non-null float64
float64
MSFT 2179
dtypes: float64(4) float64
non-null
Thefourtimeseriesstart at ratherdifferentabsolutevalues:
In[25]: data.head()
Out[25]: Date ^GDAXI ^GSPC YHOO MSFT
2006-01-03
2006-01-04 5460.68
5523.62 1268.80
1273.46 40.91
40.97 22.09
22.20
2006-01-05
2006-01-06 5516.53 1273.48 41.53 22.22
2006-01-09 5537.11 1290.15 43.42 22.15
5536.32 1285.45 43.21 22.11
Figure11­7showsthereforethefourtimeseriesindirectcomparison,but
normalized toa startingvalue of 100:
In [26]: (data / data.ix[0] * 100).plot(figsize=(8, 6))
Figure11­7.Evolutionofstockandindexlevelsovertime
Calculatingthelogreturnswith pandas
NumPy, sincewecanusethe shiftmethod:i s ab i t moreconvenientthanwith
In [27]: log_returns.head()
log_returns = np.log(data / data.shift(1))
Out[27]: Date ^GDAXI ^GSPC YHOO MSFT
2006-01-03
2006-01-04 NaN 0.003666NaN 0.001466NaN 0.004967NaN
0.011460
2006-01-05 -0.001284 0.000016 0.013576 0.000900
2006-01-06
2006-01-09 0.003581
0.000143 0.009356
0.003650 0.039656
0.004848 -0.003155
-0.001808
easyFigure11­8providesalllogreturnsintheformofhistograms.Althoughnot
normal:tojudge,onecanguessthatthesefrequencydistributionsmightnotbe
In [28]: log_returns.hist(bins=50, figsize=(9, 6))

Figure11­8.Histogramofrespectivelogreturns
Asanextstep,considerthedifferentstatistics forthetimeseriesdatas
Thekurtosisvaluesseemtobeespeciallyfarfromnormalfora l fourdataets.
sets:
In [29]: for print
sym in"\nResults
symbols: for symbol %s" % sym
log_data=
print 30 * np.array(log_returns[sym].dropna())
"-"
print_statistics(log_data)
Out[29]: ------------------------------
Results for symbol ^GDAXI
statistic value
------------------------------
sizemin 2178.00000
-0.07739
max
meanstd 0.000250.10797
skew 0.01462
0.02573
kurtosis 6.52461
Results for symbol ^GSPC
------------------------------
statistic value
------------------------------
sizemin 2178.00000
-0.09470
max
meanstd 0.000200.10957
skew 0.01360
-0.32017
kurtosis 10.05425
Results for symbol YHOO
------------------------------
statistic value
------------------------------
sizemin 2178.00000
-0.24636
meanstdmax -0.00000
0.39182
0.02620
skew
kurtosis 31.986590.56530
Results for symbol MSFT
------------------------------
statistic value
------------------------------
sizemin 2178.00000
-0.12476
meanstd 0.17039
max 0.00034
0.01792
skew
kurtosis 10.180380.04262
Wewillinspectthedata
there qqare
ltheie onastraightl of twosymbolsviaa qq Onplotthel. Figure11­9shows
plotfortheS&P500.Obviously,thesamplequantilevaluesdonot
manyvaluesthatl
ine, indicating“nonnormality.”
ie wellbelowthelineandwellabovethel
eft andrightsidesine,
respectively.Inotherwords,thetimeseriesdataexhibitsf
togeneratet
observedfarmoreoftenthananormaldistributionwouldimply.Thecode
referstoa(frequency)distributionwherenegativeandpositiveoutliersare
his plotisasfollows: a t t a i l s . Thisterm
In [30]: sm.qqplot(log_returns['^GSPC'].dropna(),
plt.grid(True)plt.xlabel('theoretical quantiles') line='s')
plt.ylabel('sample quantiles')
Figure11­9.Quantile­quantileplotforS&P500logreturns
ThesameconclusionscanbedrawnfromFigure11­10,presentingthedata
fat­tailed distribution:
fortheMicrosoftInc.stock.Therealsoseemstobestrongevidencefora
In [31]: sm.qqplot(log_returns['MSFT'].dropna(),
plt.xlabel('theoretical
plt.grid(True) quantiles') line='s')
plt.ylabel('sample quantiles')
Figure11­10.Quantile­quantileplotforMicrosoftlogreturns
Allthis leadsusfinallytotheformalnormalitytests:
In [32]: for symprintin"\nResults
symbols: for symbol %s" % sym
print 32*"-"
log_data = np.array(log_returns[sym].dropna())
normality_tests(log_data)
Out[32]: --------------------------------
Results for symbol ^GDAXI
Skew
Skew oftestdatap-value
set 0.026 0.623
Kurt
Kurt of data set 6.525
Norm test
test p-value
p-value 0.000 0.000
Results for symbol ^GSPC
--------------------------------
Skew test
Skew of datap-value
set -0.320
0.000
Kurt
Kurt of dataset 10.054
Norm test
test p-value
p-value 0.000 0.000
Results for symbol YHOO
--------------------------------
Skew
Skew oftestdataset
p-value 0.565
0.000
Kurt
Kurt of dataset 31.987
Norm test
test p-value
p-value 0.000 0.000
Results for symbol MSFT
--------------------------------
Skew
Skew oftestdatap-value
set 0.0430.415
Kurt
Kurt of data set
p-value 10.180
Norm test p-value 0.000
test 0.000
Throughout,thep­valuesofthedifferentt e s t s area l zero,strongly
rejectingthetesthypothesisthatthedifferentsampledatasetsarenormally
distributed.Thisshowsthatthenormalassumptionforstockmarketreturns
—as,forexample,embodiedinthegeometricBrownianmotionmodel—
cannotbejustifiedingeneralandthatonemighthavetouserichermodels
vgeneratingfattails(e.g.,jumpdiffusionmodelsormodelswithstochastic
olatility).
PortfolioOptimization
Modernormean­varianceportfoliotheory(MPT)is amajorcornerstoneof
financialtheory.Basedont
Althoughformulatedinthe1hi9s50theoreticalbreakthroughtheNobelPrizein
Economicswasawardedtoitsinventor,HarryMarkowitz,in1990.
s,[41] itis til atheorytaughttofinance
modifications).Thissectionil ustrates thefundamentalprinciplesofthe
studentsandappliedinpracticetoday(oftenwithsomeminorormajor
theory.
thepreviously,theassumptionofnormallydistributedreturnsi
goodintroductiontotheformaltopicsassociatedwithMPT.
Chapter5inthebookbyCopeland,Weston,andShastri(2005)providesa s Asfundamentalto
pointedout
theory:
otherstatisticsarenecessarytodescribethedistributionofend­of­period
Bylookingonlyatmeanandvariance,wearenecessarilyassumingthatno
wealth.Unlessinvestorshaveaspecialtypeofu
utilityfunction),i t tility function(quadratic
isnecessarytoassumethatreturnshaveanormal
distribution,whichcanbecompletelydescribedbymeanandvariance.
TheData
Let us beginour Python sessionbyimportingacoupleofbynowwell­
knownlibraries:
In [33]: import
import numpy
pandas asasnppd
import
import pandas.io.data as
matplotlib.pyplot web
as plt
%matplotlib inline
Wepickfivedifferentassetsfortheanalysis:AmericantechstocksApple
IAGandgoldasacommodityviaanexchange­tradedfund(ETF).Thebasic
nc., Yahoo!Inc., andMicrosoftInc., aswellasGermanDeutscheBank
ideaofMPTis diversificationtoachieveaminimalportfolioriskor
maximalportfolioreturnsgivenacertainlevelofrisk.Onewouldexpect
suchresultsfortherightcombinationofalargeenoughnumberofassets
andacertaindiversityintheassets.However,toconvey thebasicideasand
toshowtypicalef ects, thesefiveassetsshallsuffice:
In [34]: symbols=['AAPL',
noa = len(symbols) 'MSFT', 'YHOO', 'DB', 'GLD']
Usingthe DataReader function of pandas (cf. Chapter6)makesgettingthe
timeseriesdatarathere f i c i e n t
example,intheClosepricesofeachstock:. Weareonlyinterested,asintheprevious
In [35]: data = pd.DataFrame()
forsymdata[sym]
in symbols:
Close'] = web.DataReader(sym, data_source='yahoo',
end='2014-09-12')['Adj
data.columns = symbols
Figure11­11showsthetimeseriesdatainnormalizedfashiongraphically:
In [36]: (data / data.ix[0] * 100).plot(figsize=(8, 5))
Figure11­11.Stockpricesovertime
Mean­variancereferstothemeanandvarianceofthe(log)returnsofthe
differentsecurities, whicharecalculatedasfollows:
In [37]: rets = np.log(data / data.shift(1))
Overtheperiodofthetimeseriesdata,weseesignificantdifferencesinthe
annualizedperformance.Weuseafactorof252tradingdaystoannualize
thedailyreturns:
In [38]: rets.mean() * 252
Out[38]: AAPL 0.266036
MSFT
YHOO 0.114476
0.196165
DBGLD -0.125170
0.016054
dtype: float64
Thecovariancematrixfortheassetstobeinvestedinisthecentralpieceof
thewholeportfolioselectionprocess. pandas hasab u i l t ­ i n methodto
generatethecovariancematrix:
In [39]: rets.cov() * 252
Out[39]: AAPL 0.072813
AAPL 0.020426
MSFT 0.023254
YHOO 0.041044DB 0.005234GLD
MSFT
YHOO 0.020426
0.023254 0.049384
0.024247 0.024247
0.093349 0.046100
0.051528 0.002105
-0.000864
DBGLD 0.041044
0.005234 0.046100 0.051528 0.177477
0.002105 -0.000864 0.008775 0.008775
0.032406
TheBasicTheory
“Inwhatfollows,weassumethataninvestoris notallowedtosetupshort
positionsinasecurity.Onlylongpositionsareallowed,whichmeansthat
insuchawaythatallpositionsarelong(positive)andthatthepositionsadd
100%oftheinvestor’swealthhastobedividedamongtheavailableassets
upto100%.Giventhefives
followingcodegeneratesfiverandomnumbersbetween0
amountsintoeverysecurity(i.e.,20% and1and
of yourwealthineach).The
ecurities, youcouldforexampleinvestequal
normalizesthevaluessuchthatthesumofallvaluesequals 1: then
In [40]: weights
weights /== np.random.random(noa)
np.sum(weights)
In [41]: weights
Out[41]:
0.54723926])array([ 0.0346395 , 0.02726489, 0.2868883 , 0.10396806,
Youcannowcheckthat the assetweightsindeedaddupto1;i . e . , ᵫ w =1,
Iis wi Ĥ the
Equation11­1providestheformulafortheexpectedportfolioreturngiven
where thenumberofassetsand 0is weightofasseti.
theweights for thesinglesecurities. Thisis expectedportfolioreturninthe
I i
sensethathistoricalmeanperformancei
forfuture(expected)performance.Here,thers assumedtobethebestestimator
i arethestate­dependent
futurereturns(vectorwithreturnvaluesassumedtobenormally
distributed)andᵰiistheexpectedreturnforsecurityi.Finally,wTi s the
transposeoftheweightsvector
returns. and ᵰ is thevectoroftheexpectedsecurity
Equation11­1.Generalformulaforexpectedportfolioreturn
Translatedinto Python this boilsdowntothefollowinglineofcode,where
wemultiplyagainby252togetannualizedreturnvalues:
In [42]: np.sum(rets.mean() * weights) * 252
# expected portfolio return
Out[42]: 0.064385749262353215
Thesecondobject of choiceinMPTistheexpectedportfoliovariance.The
covariancebetweentwosecuritiesi s definedbyᵰij=ᵰji= E(ri–ᵰi)(rj–
ᵰj)).Thevarianceofasecurityisthespecialcaseofthecovariancewith
tself: .Equation11­2providesthecovariancematrix for a
iportfolioofsecurities(assuminganequalweightof1foreverysecurity).
Equation11­2.Portfoliocovariancematrix

Equippedwiththeportfoliocovariancematrix,Equation11­3thenprovides
theformulafortheexpectedportfoliovariance.
Equation11­3.Generalformulaforexpectedportfoliovariance

InPythont his allagainboilsdowntoasinglelineofcode,makingheavy


useofNumPy’svectorizationcapabilities.The dot functiongivesthedot
transposeofavectorormatrix: T or transpose methodgivesthe
productoftwovectors/matrices.The
In [43]: np.dot(weights.T, np.dot(rets.cov()
# expected portfolio variance * 252, weights))
Out[43]: 0.024929484097150213
The(expected)portfoliostandarddeviationorv
onesquarerootaway: olatility is thenonly
In [44]: np.sqrt(np.dot(weights.T, np.dot(rets.cov() * 252,
weights)))
# expected portfolio standard deviation/volatility
Out[44]: 0.15789073467797346
LANGUAGE
TheMPTexampleshowsagainhowefficientitiswith Pythonto
translatemathematicalconcepts,likeportfolioreturnorportfolio
variance,intoexecutable,vectorizedcode(anargumentmadein
Chapter1).
paramounti
given Carlo simulation
set ofsecurities,andt
implementaMonte atis(tcif.alChapter10)togeneraterandom
characteristics.Tothis end, we
Thismainlycompletesthetoolsetformean­varianceportfolioselection.Of
nterest toinvestorsi
heir stwhatrisk­returnprofilesarepossiblefora
portfolioweightvectorsonalargerscale.For every
werecordtheresultingexpectedportfolioreturn andsimulatedallocation,
variance:
In [45]: prets
pvols == [][]
for pweights=
in rangenp.random.random(noa)
(2500):
weights /= np.sum(weights) * weights) * 252)
prets.append(np.sum(rets.mean()
pvols.append(np.sqrt(np.dot(weights.T,
np.dot(rets.cov() * 252, weights))))
prets = np.array(prets)
pvols = np.array(pvols)
Figure11­12i l u s t r a t e s theresultsoftheMonteCarlosimulation.In
additionit providesresultsfortheso­calledSharperatio, definedas
(rfdivided
i.e.,the expectedexcessreturnoftheportfolio)overtherisk­freeshortrate
by theexpectedstandarddeviationoftheportfolio.For
simplicity,weassumerf =0:
In [46]: plt.scatter(pvols,
plt.figure(figsize=(8,prets,4))c=prets / pvols, marker='o')
plt.xlabel('expected volatility')
plt.grid(True)
plt.ylabel('expectedreturn')
plt.colorbar(label='Sharpe ratio')

Figure11­12.Expectedreturnandvweights
olatility fordifferent/randomportfolio
Iperformwellwhenmeasuredintermsofmeanandvariance.Forexample,
t is clearbyinspectionofFigure11­12thatnotal weightdistributions
forafixedrisklevelof, say,20%,therearemultipleportfoliosthatallshow
differentreturns.Asaninvestoroneisgenerallyinterestedinthemaximum
returngivenafixedrisklevelortheminimumriskgivenafixedreturn
expectation.Thissetofportfoliosthenmakesuptheso­callede
frontier. Thisis whatwederivelater inthesection. f icient
PortfolioOptimizations
Tomakeourlivesabiteasier,firstwehaveaconveniencefunctiongiving
backthemajorportfoliostatistics foraninputweightsvector/array:
In [47]: def statistics(weights):
''' Returns portfolio statistics.
Parameters
==========
weightsweights: array-like
for different securities in portfolio
Returns
=======
pretexpected
: float portfolio return
pvolexpected
:float portfolio volatility
Sharpe ratiofor
pret/pvol :float rf=0
'''weights = np.array(weights)
pret
pvol == np.sum(rets.mean() * weights)np.dot(rets.cov()
np.sqrt(np.dot(weights.T, * 252 *
252, weights)))return np.array([pret, pvol, pret / pvol])
Thederivationoftheoptimalportfoliosiminimize
problemforwhichweusethefunction s aconstrainedoptimization
fromthe
scipy.optimize sublibrary(cf. Chapter9):
In [48]: import scipy.optimize as sco
Theminimizationfunction minimize is quitegeneralandallowsfor
(in)equalityconstraintsandboundsfortheparameters.Letusstartwiththe
maximization of
valueoftheSharpertheSharperatio.Formally,weminimizethenegative
atio:
In [49]: def min_func_sharpe(weights):
return -statistics(weights)[2]
formulatedasfollowsusingtheconventionsofthe
Theconstraintisthata minimize function(cf.
l parameters(weights)addupto1.Thiscanbe
thedocumentationforthis function).[42]
In [50]: cons = ({'type': 'eq', 'fun': lambda x: np.sum(x) - 1})
Wealsoboundtheparametervalues(weights)tobewithin0and1.These
valuesareprovidedtotheminimizationfunctionasatupleoftuplesinthis
case:
In [51]: bnds = tuple((0, 1) for x in range(noa))
Theonlyinputthati
startingparameterl i stmissingforac
s ( i n i t i a l al oftheoptimizationfunctionis a
guessesfortheweights).Wesimplyusean
equaldistribution:
In [52]: noa * [1. / noa,]
Out[52]: [0.2, 0.2, 0.2, 0.2, 0.2]
Callingthefunctionreturnsnotonlyoptimalparametervalues,butmuch
more.Westoretheresultsinanobjectwecal opts:
In [53]: %%time
opts = sco.minimize(min_func_sharpe, noa * [1. / noa,],
method='SLSQP', bounds=bnds, constraints=cons)
Out[53]: CPUWalltimes:
time: user
50.3 52ms ms, sys: 0 ns, total: 52 ms
Herearetheresults:
In[54]: opts
Out[54]: success:
status: 0True
njev:
nfev: 642
fun:x: -1.0597540702789927
array([6.59141408e-01, 8.82635668e-02,
2.52595026e-01, 8.34564622e-17, -8.91214186e-17])
message: 'Optimization terminatedsuccessfully.'
-2.88933516e-05,jac: array([ 3.27527523e-05, -1.61930919e-04,
0.00000000e+00])nit:1.51561590e+00,
6
1.24186277e-03,
Ourmaini n t e r e s t l i e s ingettingtheoptimalportfoliocomposition.Tot
end,weaccesstheresultsobjectbyprovidingthekeyofinterest—i.e., x in h i s
ourcase.Theoptimizationyieldsaportfoliothatonlyconsistsofthreeout
ofthefiveassets:
In [55]: opts['x'].round(3)
Out[55]: array([ 0.659, 0.088, 0.253, 0. , -0. ])
Usingtheportfolioweightsfromtheoptimization,thefollowingstatistics
emerge:
In [56]: statistics(opts['x']).round(3)
Out[56]: array([ 0.235, 0.222, 1.06 ])
andtheresultingoptimalSharperatitheo is expectedvolatilityisabout22.2%,
Theexpectedreturnisabout23.5%, 1.06.
Next,l
minimizingthev olatility, but we willdefineafunctiontominimizetheas
variance:et usminimizethevarianceoftheportfolio.Thisisthesame
In [57]: def min_func_variance(weights):
returnstatistics(weights)[1] ** 2
Everythingelsecanremainthesameforthecal ofthe minimize function:
In [58]: optv = sco.minimize(min_func_variance,
method='SLSQP', noa * [1. / noa,],
bounds=bnds,
constraints=cons)
In [59]: optv
Out[59]: success:
status: 0True
njev:
nfev: 964
fun:x: array([
0.018286019968366075
1.07591814e-01, 2.49124471e-01,
1.09219925e-01,
message:
jac:1.01101853e-17,
'Optimization
array([ 0.03636634,
terminated
5.34063791e-01])
0.03643877,
successfully.'
0.03613905,
0.05222051, 0.03676446, 0. ])
nit: 9
theabsoluteminimumvarianceportfolio:
Thistimeafourthassetisaddedtotheportfolio.Thisportfoliomixleadsto
In[60]: optv['x'].round(3)
Out[60]: array([ 0.108, 0.249, 0.109, 0. , 0.534])
Fortheexpectedreturn,volatility, andSharperatio, weget:
In [61]: statistics(optv['x']).round(3)
Out[61]: array([ 0.087, 0.135, 0.644])
EfficientFrontier
Thederivationofalloptimalportfolios—i.e.,al portfolioswithminimum
volatilityforagiventargetreturnlevel(orallportfolioswithmaximum
returnforagivenrisklevel)—issimilartothepreviousoptimizations.The
onlydifferenceis thatwehavetoiterate overmultiplestartingconditions.
Theapproachwetakeisthatwefixatargetreturnlevelandderiveforeach
suchlevelthoseportfolioweightsthatleadtotheminimumv o l a t i l i t y value.
Fortheoptimization,t
level tretand his leadstotwoconditions:oneforthetargetreturn
oneforthesumoftheportfolioweightsasbefore.The
boundaryvaluesforeachparameterstaythesame:
In [62]: cons = ({'type': 'eq', 'fun': lambda x: statistics(x)[0] -
tret}, {'type': 'eq',
bnds = tuple((0, 1) for'fun': lambdax: np.sum(x) - 1})
x in weights)
minimizationprocedure.I t merelyreturnsthevmin_func
Forclarity,wedefineadedicatedfunction olatilitforuseinthe
y valuefromthe
statistics function:
In[63]: defmin_func_port(weights):
return statistics(weights)[1]
duringeveryloop:
theminimizationchanges.Thati s whytheconditionsdictionaryis updatedfor
Wheniteratingoverdifferenttargetreturnlevels(trets),onecondition
In [64]: %%time
trets
fortvolstret==[]np.linspace(0.0,
in trets: 0.25, 50)
[0] - tret}, cons = ({'type': 'eq', 'fun': lambda x: statistics(x)
{'type': 'eq', 'fun': lambdanoax:* [1.np.sum(x)-1})
method='SLSQP', res = sco.minimize(min_func_port, / noa,],
bounds=bnds, constraints=cons)
tvols.append(res['fun'])
tvols = np.array(tvols)
Out[64]: CPUWalltimes:
time: user
4.36 4.35
s s, sys: 4 ms, total: 4.36 s
Figure11­13showstheoptimizationr e s u l
portfoliosgivenacertaintargetreturn;thedotst s . Crossesindicatetheoptimal
are, asbefore,therandom
minimumvolatility/varianceportfolio(theleftmostportfolio)andonefor
portfolios.Inaddition,thefigureshowstwolargerstars:oneforthe
theportfoliowiththemaximumSharperatio:
In [65]: plt.figure(figsize=(8,
plt.scatter(pvols, 4))
prets,
c=prets
# random/portfolio
pvols, marker='o')
composition
plt.scatter(tvols,
c=trets trets,
/ tvols, marker='x')
# efficient frontier
plt.plot(statistics(opts['x'])[1],
markersize=15.0)higheststatistics(opts['x'])[0],
'r*',# portfoliowith Sharpe ratio
plt.plot(statistics(optv['x'])[1],
'y*',# minimumvariance statistics(optv['x'])[0],
markersize=15.0)portfolio
plt.grid(True)
plt.xlabel('expected volatility')
plt.ylabel('expected return')ratio')
plt.colorbar(label='Sharpe
Figure11­13.Minimumriskportfoliosforgivenreturnlevel(crosses)
Theefficientfrontieriscomprised ofa l optimalportfolioswithahigher
returnthantheabsoluteminimumvarianceportfolio.Theseportfolios
dominatea
risklevel. l otherportfoliosintermsofexpectedreturnsgivenacertain
CapitalMarketLine
Inadditiontoriskysecuritieslikestocksorcommodities(suchasgold),
thereis in generaloneuniversal,risklessinvestmentopportunityavailable:
cashorcashaccounts. Inan idealizedworld,moneyheldinacashaccount
be
insuranceschemes).Thedownsidei
withalargebankcan consideredriskless(
s thatsucharisklessinvestment
e.g., throughpublicdeposit
generallyyieldsonlyasmallreturn,sometimesclosetozero.
However,takingintoaccountsucharisklessassetenhancesthe efficient
investmentopportunityse
investorsf i r s t t forinvestorsconsiderably.Thebasicideais that
determineanefficientportfolioofriskyassetsandthenadd
wealthtobeinvestedintherisklessassetitispossibletoachieveanyr
therisklessassettothemix.Byadjustingtheproportionoftheinvestor’sisk­
returnprofilethatl i e s onthestraightline(intherisk­returnspace)between
therisklessassetandtheefficientportfolio.
Whichefficientportfolio(outofthemanyoptions)i
inoptimalfashion?I s tobetaken
t istheoneportfoliowherethetangentline the to
of invest
efficientfrontiergoesexactlythroughtherisk­returnpointoftheriskless
throughthepoint(
lookforthatportfolioontheefficientfrontierforwhichthetangentgoes
portfolio.Forexample,considerarisklessi
ᵰf,rf nterest rateofrf =0.01.We
)=(0,0.01)inrisk­returnspace.
firstderivativefortheefficientf
Forthecalculationstofollow,weneedafunctionalapproximationandthe
rontier. Weusecubicsplinesinterpolation
tothis end (cf.Chapter 9):
In [66]: import scipy.interpolate as sci
Forthesplineinterpolation,weonlyusetheportfoliosfromtheefficient
fpreviouslyusedsets
rontier. Thefollowingcodeselectsexactlytheseportfoliosfromour
tvols and trets:
In [67]: indevols= np.argmin(tvols)
tvols[ind:]
erets ==trets[ind:]
Thenew ndarray objects evols and erets areusedfortheinterpolation:
In [68]: tck = sci.splrep(evols, erets)
differentiablefunction f(x) fortheefficientfrontierandtherespectivefirst
Viathisnumericalrouteweendupbeingabletodefineacontinuously
derivativefunction df(x):
''' Efficient frontier function (splines approximation).
In [69]: def f(x):
''' '''return sci.splev(x, tck,of der=0)
def df(x):Firstderivative efficient frontier function. '''
return sci.splev(x, tck, der=1)
theefficientf
functiont
Whatwearelookingfori
(x) hastosatisfy.
rontier. Equation11­4describesa · xdescribingthelinethat
passesthroughtherisklessassetinrisk­returnspaceandthati
s afunctiont(x) =a+bl threeconditionsthatthe
s tangentto
Equation11­4.Mathematicalconditionsforcapitalmarketline

Sincewedonothaveaclosedformulafortheefficientfrontierorthef
numerically.Tothisend,wedefinea equationsinEquation11­4irst
Pythonoffunctionthatreturnsthe
derivativeofit,wehavetosolvethesystem
valuesofal threeequationsgiventheparametersetp=(a,b,x):
In [70]: def equations(p, rf=0.01):
eq2eq1 =rf
= rf+- p[1] p[0] * p[2] - f(p[2])
eq3return= p[1]eq1,-eq2,df(p[2])
eq3
Thefunctionfsolve from scipy.optimize
systemofequations.Weprovideani iscapableofsolvingsucha
nitial parameterizationinadditionto
thefunction equations. Notethatsuccessorfailureoftheoptimization
mightdependontheinitialparameterization,whichthereforehastobe
tchosencarefully—generallybyacombinationofeducatedguesseswith
rial anderror:
In [71]: opt = sco.fsolve(equations, [0.01, 0.5, 0.15])
havea=rf=0.01:
Thenumericaloptimizationyieldsthefollowingvalues.Asdesired,we
In [72]: opt
Out[72]: array([ 0.01 , 1.01498858, 0.22580367])
Thethreeequationsarealso,asdesired,zero:
In [73]: np.round(equations(opt), 6)
Out[73]: array([ 0., -0., -0.])
Figure11­14presentstheresultsgraphically:thestar representstheoptimal
portfoliofromtheefficientfrontierwherethetangentlinepassesthrough
therisklessassetpoint(0,rf =0.01).Theoptimalportfoliohasanexpected
vwiththefollowingcode:
olatility of20.5%andanexpectedreturnof17.6%.Theplotis generated
In [74]: plt.figure(figsize=(8,
plt.scatter(pvols, 4))
prets,
c=(prets
# random -0.01)/
portfolio pvols,marker='o')
composition
plt.plot(evols,#efficient
erets, 'g',lw=4.0)
frontier* cx,
opt[0]+ 0.3)
opt[1]
cxplt.plot(cx,
= np.linspace(0.0, lw=1.5)
# capital market line
plt.plot(opt[2], f(opt[2]), 'r*', markersize=15.0)
plt.grid(True)
plt.axhline(0, color='k',
color='k', ls='--',
ls='--', lw=2.0)
plt.axvline(0,
plt.xlabel('expected volatility') lw=2.0)
plt.ylabel('expected return')ratio')
plt.colorbar(label='Sharpe
Figure11­14.Capitalmarketlineandtangencyportfolio(star)forrisk­free
rate of 1%
Theportfolioweightsoftheoptimal(tangent)portfolioare
threeofthefiveassetsareinthemix: as follows.Only
Inf(opt[2])},
[75]: cons = ({'type': 'eq', 'fun':
{'type': 'eq', 'fun': lambda
lambda x:x: np.sum(x)
statistics(x)[0]
- 1}) -
res = sco.minimize(min_func_port, noa * [1. / noa,],
method='SLSQP',
bounds=bnds, constraints=cons)
In [76]: res['x'].round(3)
Out[76]: array([ 0.684, 0.059, 0.257, -0. , 0. ])
PrincipalComponentAnalysis
Principalcomponentanalysis(PCA)hasbecomeapopulartoolinfinance.
Wikipediadefinesthetechnique as follows:
Principalcomponentanalysis(PCA)is astatisticalprocedurethatuses
orthogonaltransformationtoconvertasetofobservationsofpossibly
correlatedvariablesintoasetofvaluesoflinearlyuncorrelatedvariables
calledprincipalcomponents.Thenumberofprincipalcomponentsisless
thanorequaltothenumberoforiginalvariables.Thistransformationis
definedinsuchawaythatthefirstprincipalcomponenthasthelargest
possiblevariance(thatis, accountsforasmuchofthevariabilityinthedata
aspossible),andeachsucceedingcomponentinturnhasthehighest
variancepossibleundertheconstraintthati
uncorrelatedwith)theprecedingcomponents.t is orthogonalto(i.e.,
Consider,forexample,astockindexliketheGermanDAXindex,
composed of 30differentstocks.Thestockpricemovementsofa
in theindex(viasomewell­l stocks
documentedformula).Inaddition,thestockpricemovementsofthesingle
takentogetherdeterminethemovement
stocksaregenerallycorrelated, for example,duetogeneraleconomic
conditionsorcertaindevelopmentsincertainsectors.
Fors tatistical applications,it is generallyquitehardtouse30correlated
factorstoexplainthemovementsofastockindex.Thisi s wherePCA
comesintoplay.I t derivessingle,uncorrelatedcomponentsthatare“well
suited”toexplainthemovementsinthestockindex.Onecanthinkofthese
componentsaslinearcombinations(30i.ecorrelatedindexconstituents,one
theindex.Insteadofworkingwith ., portfolios)ofselectedstocksfromcan
Theexampleofthissectionil ustrates theuseofPCAinsuchacontext. We
thenworkwithmaybe5,3,orevenonly1principalcomponent.
toconstructwhatwec al a pca_index. al stocksthatmakeup
theindex.WethenusePCAtoderiveprincipalcomponents,whichweuse
retrievedataforboththeGermanDAXindexand
First,someimports.Inparticular,weusethe
scikit-learn machinelearninglibrary( c f . KernelPCAfunction ofthe
thedocumentationfor
KernelPCA):
In [1]:import
import numpy as np asweb
importpandas.io.data
pandasaspd
from sklearn.decomposition import KernelPCA
TheDAXIndexandIts30Stocks
Thefollowing list objectcontainsthe30symbolsforthestockscontained
intheGermanDAXindex,aswellasthesymbolfortheindexitself:
In [2]:symbols =['ADS.DE',
'BMW.DE', 'ALV.DE',
'CBK.DE', 'BAS.DE',
'CON.DE', 'BAYN.DE',
'DAI.DE', 'BEI.DE',
'DB1.DE',
'FME.DE',
'DBK.DE',
'FRE.DE', 'DPW.DE',
'HEI.DE', 'DTE.DE',
'HEN3.DE','EOAN.DE',
'IFX.DE', 'LHA.DE',
'LIN.DE',
'SAP.DE', 'LXS.DE',
'SDF.DE', 'MRK.DE',
'SIE.DE', 'MUV2.DE',
'TKA.DE', 'RWE.DE',
'VOW3.DE',
'^GDAXI']
Weworkonlywiththeclosingvaluesofeachdatasetthatweretrieve(for
detailsonhowtoretrievestockdatawith pandas, seeChapter6):
In [3]: %%time
data = pd.DataFrame()
insymbols:
for symdata[sym] = web.DataReader(sym, data_source='yahoo')
['Close']data= data.dropna()
Out[3]: Wall
CPU times: user408s ms, sys: 68 ms, total: 476 ms
time: 5.61
Letusseparatetheindexdatasinceweneedit regularly:
In [4]: dax = pd.DataFrame(data.pop('^GDAXI'))
Thestocks:DataFrame object data nowhaslogreturndataforthe30DAX
In [5]: data[data.columns[:6]].head()
Out[5]: Date ADS.DE ALV.DE BAS.DE BAYN.DE BEI.DE BMW.DE
2010-01-04 39.72
2010-01-05 38.51 88.81
88.54 44.17
44.85 55.37
56.40 46.20
46.44 32.31
32.05
2010-01-06 39.74
2010-01-07 39.40 88.47
89.50 44.15
44.45 54.30
55.02 45.70
46.17 33.10
32.81
2010-01-08 39.60 87.99 44.02 53.82 44.38 32.65
Applying PCA
Usually,PCAworkswithnormalizeddatasets. Therefore,thefollowing
conveniencefunctionproveshelpful:
In [6]: scale_function = lambda x: (x - x.mean()) / x.std()
Forthebeginning,consideraPCAwithmultiplecomponents(
notrestrictthenumberofcomponents):[43] i . e . , wedo
In [7]: pca = KernelPCA().fit(data.apply(scale_function))
Theimportanceorexplanatorypowerofeachcomponenti
Eigenvalue. Thesearefoundinanattributeofthe KernelPCAs givenbyi
object.Thets
analysisgivestoomanycomponents:
In [8]: len(pca.lambdas_)
Out[8]: 655
Therefore,letusonlyhavealookatthefirst10components.Thetenth
componentalreadyhasalmostnegligibleinfluence:
In [9]: pca.lambdas_[:10].round()
Out[9]:
378., array([ 22816., 6559., 2535., 1558., 697., 442.,
255., 183., 151.])
Wearemainlyinterestedintherelativeimportanceof
this: each component,so
wewillnormalizethesevalues.Again,weuseaconveniencefunctionfor
In [10]: get_we = lambda x: x / x.sum()
In [11]: get_we(pca.lambdas_)[:10]
Out[11]: array([ 0.6295725 , 0.1809903 , 0.06995609, 0.04300101,
0.01923256,
0.00416612]) 0.01218984, 0.01044098, 0.00704461, 0.00505794,
Witht h i s information,thepicturebecomesmuchclearer.Thef
componentalreadyexplainsabout60%ofthevariabilityinthe30timei r s t
series. Thefirstfivecomponentsexplainabout95%ofthevariability:
In [12]: get_we(pca.lambdas_)[:5].sum()
Out[12]: 0.94275246704834414
ConstructingaPCAIndex
Next,weusePCAtoconstructaPCA(orfactor)indexovertime and
compareitwith
componentonly:theoriginalindex.First, wehaveaPCAindexwithasingle
InKernelPCA(n_components=1).fit(data.apply(scale_function))
[13]:pca=
dax['PCA_1'] = pca.transform(-data)
of theapproach:
giventherathersimpleapplicationfor normalizeddata—alreadynottoobad,
Figure11­15showstheresults
In [14]: import matplotlib.pyplot
%matplotlib inline as plt
dax.apply(scale_function).plot(figsize=(8, 4))
Figure11­15.GermanDAXindexandPCAindexwithonecomponent
Letthisusend,seeiwef needtocalculateaweightedaveragefromthesingleresulting
wecanimprovetheresultsbyaddingmorecomponents.To
components:
InKernelPCA(n_components=5).fit(data.apply(scale_function))
[15]: pca =
weights=
pca_components=pca.transform(-data)
get_we(pca.lambdas_)
dax['PCA_5'] = np.dot(pca_components, weights)
TheresultsaspresentedinFigure11­16ares til “good,”butnotthatmuch
betterthanbefore—atleast uponvisualinspection:
In [16]: import matplotlib.pyplot
%matplotlib inline as plt
dax.apply(scale_function).plot(figsize=(8, 4))

Figure11­16.GermanDAXindexandPCAindiceswithoneandfive
components
Inviewoftheresultssof ar, wewanttoinspecttherelationshipbetweenthe
DAXindexandthePCAindexinadifferentway—viaascatterp
addingdateinformationtothemix.First,weconvertthe DatetimeIndex
lot,
ofthe DataFrameobject toa matplotlib­compatible format:
In [17]: import matplotlib
mpl_dates = as mpl
mpl.dates.date2num(data.index)
mpl_dates
Out[17]:
735501., array([
735502.])733776., 733777., 733778., ..., 735500.,
differentcolors whichdateeachdatapointilsofrom.Figure11­17showsthe
Thisnewdatelistcanbeusedforascatterp t, highlightingthrough
datainthis fashion:
In [18]: plt.figure(figsize=(8,
plt.scatter(dax['PCA_5'], 4))dax['^GDAXI'], c=mpl_dates)
lin_reg = np.polyval(np.polyfit(dax['PCA_5'],
dax['^GDAXI'], 1),
plt.plot(dax['PCA_5'], lin_reg, dax['PCA_5'])
'r', lw=3)
plt.grid(True)plt.xlabel('PCA_5')
plt.ylabel('^GDAXI')
plt.colorbar(ticks=mpl.dates.DayLocator(interval=250),
format=mpl.dates.DateFormatter('%d %b %y'))

Figure11­17.DAXreturnvaluesagainstPCAreturnvalueswithlinear
regression
Figure11­17revealsthatthereis obviouslysomekindofstructuralbreak
sometimeinthemiddleof2011.IfthePCAindexweretoperfectly
replicatetheDAXindex,wewouldexpectallthepointstolieonastraight
lineandtoseetheregressionlinegoingthrough thesepoints.Perfectionis
hard to achieve,butwe
Tothisend,l can maybe
et usdividethet do bet er.
otal timeframeinto
thenimplementanearlyandalateregression: twosubintervals.Wecan
In [19]: early_pca
cut_date =='2011/7/1'
dax[dax.index < cut_date]['PCA_5']
early_reg = np.polyval(np.polyfit(early_pca,
dax['^GDAXI'][dax.index<
early_pca) cut_date],1),
In[20]: late_pca
late_reg =dax[dax.index >= cut_date]['PCA_5']
= np.polyval(np.polyfit(late_pca,
dax['^GDAXI'][dax.index >= cut_date],1),
late_pca)
Figure11­18showsthenewregressionlines, whichindeeddisplaythehigh
explanatorypowerbothbeforeourcutoffdateandthereafter.Thisheuristic
statistics:
approachwillbemadeabitmoreformalinthenextsectiononBayesian
In [21]: plt.figure(figsize=(8, 4))
plt.scatter(dax['PCA_5'], dax['^GDAXI'],
'r',lw=3)c=mpl_dates)
plt.plot(early_pca,late_reg,
plt.plot(late_pca,
plt.ylabel('^GDAXI')
plt.xlabel('PCA_5')
plt.grid(True) early_reg,
'r',lw=3)
plt.colorbar(ticks=mpl.dates.DayLocator(interval=250),
format=mpl.dates.DateFormatter('%d %b %y'))
Figure11­18.DAXindexvaluesagainstPCAindexvalueswithearlyandlate
regression(regimeswitch)
BayesianRegression
Bayesianstatistics nowadaysis acornerstoneinempiricalfinance.This
chaptercannotlaythefoundationsforallconceptsofthefield.Youshould
thereforeconsult,i f needed,atextbooklikethatbyGeweke(2005)fora
generalintroductionorRachev(2008)foronethati s financiallymotivated.
Bayes’sFormula
ThemostcommoninterpretationofBayes’formulainfinancei s the
diachronicinterpretation.Thismainlystatesthatovertime
informationaboutcertainvariables we learnnew
or parametersofinterest, likethemean
returnofatimeseries.Equation11­5statesthetheoremformally.Here,H
standsforanevent,thehypothesis,andDrepresentsthedataanexperiment
ortherealworldmightp
definitions,wehave: resent.[4 ] Onthebasisofthesefundamental
p(H)is calledthepriorprobability.
p(D)istheprobabilityforthedataunderanyhypothesis,calledthe
normalizingconstant.
p(D|H)i s thelikelihood(i.e., theprobability)ofthedataunder
hypothesisH.
p(H|D)is theposteriorprobability;i.e., after wehaveseenthedata.
Equation11­5.Bayes’sformula

contains20blackballsand70 red bal s, whileboxB1 andB2. BoxB1


Considerasimpleexample.Wehavetwoboxes,B
ballsand50redb 2 contains40black
al s. Werandomlydrawaballfromoneofthetwoboxes.
Assumetheballisblack.Whataretheprobabilitiesforthehypotheses“H1:
Ballis fromboxB1”and“H2:Ballis fromboxB2,” respectively?
Beforewerandomlydrawtheb
After i t a l , bothhypothesesareequallyl i
isclearthattheballisblack,wehavetoupdatetheprobability k e l y . for
bothhypothesesaccordingtoBayes’formula.ConsiderhypothesisH1:
Prior:p(H1)=0.5
Normalizingconstant:p(D)=0.5 · 0.2+0.5 · 0.4=0.3
Likelihood:p(D|H1)=0.2
Thisgives for theupdatedprobabilityofH1
Thisresultalsomakessensei ntuitively. Theprobabilityfordrawingablack
.
ballfromboxB2istwiceashighasforthe
B1.Therefore,havingdrawnablackball,thehypothesisH2haswith same eventhappeningwithbox
an updatedprobabilitytwotimesashighastheupdated
probabilityforhypothesisH1.
PyMC3
WithPyMC3thePythonecosystemprovidesapowerfulandperformant
thiswriting)notpartofthe
librarytotechnicallyimplementBayesians Anaconda distributionrecommendedin
tatistics. PyMC3is (at thetimeof
Chapter 2. OnaLinuxoraMac OS X operatingsystem,theinstallation
comprisesmainlythefollowingsteps.
F(icrfs.th,t youneedtoinstallthe
p : / b i t . l y / i n s t a l _ t h e a nTheano
o ) . In compilerpackageneededfor
thes h e l , executethefollowing PyMC3
commands:
$$ gitsudoclone git://github.com/Theano/Theano.git
pythonTheano/python.pyinstall
Ona Mac OS X systemyoumightneedtoaddthefollowinglinetoyour
.bash_profile file (tobefoundinyourhome/userdirectory):
export DYLD_FALLBACK_LIBRARY_PATH= \
$DYLD_FALLBACK_LIBRARY_PATH:/Library/anaconda/lib:
Once Theano is instal ed, theinstallationof PyMC3 is straightforward:
$$ gitcd pymcclone https://github.com/pymc-devs/pymc.git
$ sudo python setup.py install
If successful,youshouldbeabletoimportthelibrarynamed pymc asusual:
In [22]:import warnings
warnings.simplefilter('ignore')
import
import pymc
numpyasasnppm
np.random.seed(1000)
import matplotlib.pyplot
%matplotlib inline as plt
PYMC3
PyMC3 isalreadyapowerfullibraryat thetimeofthis writing.
However,i t i s t i l ini t s
enhancements,changestotheAPI,eearlystages,
t so
c . youshouldexpectfurther
Makesuretostayuptodate
byregularlycheckingthewebsitewhenusing PyMC3.
IntroductoryExample
Considernowanexamplewherewehavenoisydataaroundastraightline:
[45]
In [23]: yx == 4np.linspace(0, 10, 500)
+ 2 * x + np.random.standard_normal(len(x)) *2
Asabenchmark,considerf i r s t anordinaryleast­squaresregressiongiven
thenoisydata,using NumPy’spolyfit function(cf. Chapter9). The
regressionis implementedasfollows:
In [24]: reg# =linear
np.polyfit(x,
regressiony, 1)
Figure11­19showsthedataandtheregressionlinegraphically:
In [25]: plt.scatter(x, y, c=y, 4))marker='v')
plt.figure(figsize=(8,
plt.plot(x, reg[1] + reg[0] * x, lw=2.0)
plt.colorbar()
plt.grid(True)
plt.xlabel('x')
plt.ylabel('y')

Figure11­19.Sampledatapointsandregressionline
Theresultofthe“standard”regressionapproachi
parametersoftheregressionline: s fixedvaluesforthe
In [26]: reg
Out[26]: array([ 2.03384161, 3.77649234])
regressionline)isatindexlevel0andthattheinterceptisatindexlevel1.
Notethatthehighest­ordermonomialfactor(inthiscase,theslopeofthe
coursei
Theoriginalparameters2and4arenotperfectlyrecovered,butt
s duetothenoiseincludedinthedata. his of
Next,theBayesianregression.Here,weassumethattheparametersare
distributedinacertainway.Forexample,considertheequationdescribing
theregressionline W ( x ) = ᵯ+ ᵰ ·
ᵯisnormallydistributedwith meanx . Wenowassumethefollowingpriors:
mean 0andastandarddeviationof20.
ᵰisnormallydistributedwith 0 and astandarddeviationof20.
uniformlydistributedstandarddeviationbetween0and10. of W(x) and a
Forthelikelihood,weassumeanormaldistributionwithmean
AmajorelementofBayesianregressioni
(MCMC)sampling.[46]Inprinciple,thisi ss (MarkovChain)MonteCarlo
thesameasdrawingballs
multipletimesfromboxes,asintheprevioussimpleexample—justina
Forthetechnicalsampling,there
moresystematic,automatedway. arethreedifferentfunctionstoc a l :
find_MAP findsthestartingpointforthesamplingalgorithmby
derivingthelocalmaximumaposterioripoint.
NUTS implementstheso­called“efficientNo­U­TurnSamplerwithdual
averaging”(NUTS)algorithmforMCMCsamplinggiventheassumed
priors.
sample drawsanumberofsamplesgiventhestartingvaluefrom
find_MAP andtheoptimalstepsize from the NUTS algorithm.
Allthisistobewrappedintoa
with statement: PyMC3Model objectandexecutedwithina
# model specifications
In [27]: with pm.Model() asmodel: in PyMC3
# define
alpha # =arepm.Normal('alpha',
wrapped in a withmu=0,
priors statement
sd=20)
beta ==pm.Normal('beta',
sigma pm.Uniform('sigma',mu=0,lower=0,
sd=20)upper=10)
#define
y_est = alpha linear+ regression
beta * x
#define
likelihood likelihood
= pm.Normal('y', mu=y_est, sd=sigma,
observed=y)
#start=
inferencepm.find_MAP()
step## find startingvalue by optimization
=instantiate
pm.NUTS(state=start)
MCMCsampling algorithm
trace=pm.sample(100, step, start=start,
progressbar=False)#draw 100 posterior samples using NUTS sampling
Havealookat theestimatesfromthefirst sample:
In [28]: trace[0]
Out[28]: {'alpha': 3.8783781152509031,
'beta': 2.0148472296530033,
'sigma': 2.0078134493352975}
thewholeprocedureyields,ofcourse,manymoreestimates.Theyarebest
Allthreevaluesareratherclosetotheoriginalvalues(4,2,2).However,
showingtheresultingposteriordistributionforthedifferentparameters
iwellasa
l ustrateld withthehelpofatraceplot,asinFigure11­20—i.e.,aplot
singleestimatespersample.Theposterior as
anintuitivesenseabouttheuncertaintyinourestimates: distributiongivesus
In'sigma':
[29]: 2})fig = pm.traceplot(trace, lines={'alpha': 4, 'beta': 2,
plt.figure(figsize=(8, 8))

Figure11­20.Traceplotsforalpha,beta,andsigma
Takingonlythe alpha and beta valuesfromtheregression,wecandraw
al resultingregressionlinesasshowninFigure11­21:
In [30]: plt.figure(figsize=(8,
plt.scatter(x, y, c=y, 4))marker='v')
plt.colorbar()plt.grid(True)
plt.xlabel('x')
plt.ylabel('y')
foriplt.plot(x,
in range(len(trace)):
trace['alpha'][i] + trace['beta'][i] * x)

Figure11­21.SampledataandregressionlinesfromBayesianregression
RealData
HavingseenBayesianregressionwith PyMC3 inactionwithdummydata,
another
wenowmoveontoreal market data.Int
Python library: zipline his context, we introduce yet
(cf. https://github.com/quantopian/zipline
andhttps://pypi.python.org/pypi/zipline). zipline is a Pythonic, open
sourcealgorithmictradinglibrarythatpowersthecommunitybacktesting
It is alsotobeinstalledseparately,e.g., byusing pip:
platformQuantopian.
$ pip install zipline
Afterinstal ation, import zipline aswell pytz and datetime asfollows:
In [31]: import warnings
warnings.simplefilter('ignore')
import
import zipline
import pytz
datetime as dt
pandas.
Similartopandas, ziplinealsoloaduses
zipline providesaconveniencefunctionto
financialdatafromdifferentsources.Underthehood,
Theexampleweuseis a“classical”pairtradingstrategy,namelywithgold
andstocksofgoldminingcompanies.ThesearerepresentedbyETFswith
thefollowingsymbols,respectively:
GLD
GDX
Wecanloadthedatausing zipline asfollows:
In [32]: data = zipline.data.load_from_yahoo(stocks=['GLD',
end=dt.datetime(2014, 3, 15, 0, 0, 0, 0, 'GDX'],
pytz.utc)).dropna()
data.info()
Out[32]: GLDGDX<class 'pandas.core.frame.DataFrame'>
2014-03-14DatetimeIndex:
00
:00:00+00:00
1967 entries, 2006-05-22 00:00:00+00:00 to
Data
GDXGLD columns
1967 (total 2float64
non-null columns):
float64(2) float64
dtypes:1967non-null
Figure11­22showsthehistoricaldataforbothETFs:
In [33]: data.plot(figsize=(8, 4))

Figure11­22.Comovementsoftradingpair
Theabsoluteperformancedifferssignificantly:
In [34]: data.ix[-1] / data.ix[0] - 1
Out[34]: GDXGLD -0.216002
dtype: 1.038285
float64
However,bothtimeseriesseemtobequitestronglypositivelycorrelated
wheninspectingFigure11­22,whichi s alsoreflectedinthecorrelation
data:
In [35]: data.corr()
Out[35]: GDX 1.000000GDX 0.466962GLD
GLD 0.466962 1.000000
Timestamp
Asusual,theDatetimeIndexobjectofthe
objects: DataFrame objectconsistsof
In [36]: data.index
Out[36]: <class 'pandas.tseries.index.DatetimeIndex'>
Length:1967, Freq: None,Timezone: UTC
[2006-05-22, ...,2014-03-14]
Tousethedate­timeinformationwith
representation: matplotlib inthewaywewantto
inthefollowing,wehavetofirstconvertittoanordinaldate
In [37]: import matplotlib
mpl_dates = as mpl
mpl.dates.date2num(data.index)
mpl_dates
Out[37]:
735305., array([
735306.])732453., 732454., 732455., ..., 735304.,
Figure11­23showsascatterplotofthetimeseriesdata,plottingthe
valuesagainsttheGDXvaluesandi l u s t r a t i n g GLD
thedatesofeachdatapair
withdifferentcolorings:[47]
In [38]: plt.figure(figsize=(8,
plt.scatter(data['GDX'], 4))data['GLD'], c=mpl_dates,
marker='o')plt.grid(True)plt.xlabel('GDX')
plt.ylabel('GLD')
plt.colorbar(ticks=mpl.dates.DayLocator(interval=250),
format=mpl.dates.DateFormatter('%d %b %y'))

Figure11­23.ScatterplotofpricesforGLDandGDX
sLetusimplementaBayesianregressiononthebasisofthesetwotime
examplewithdummydata;
eries. Theparameterizationsareessentiallythesameasintheprevious
datawenowhaveavailable:wejust replacethedummydatawiththereal
In [39]: withalpha
pm.Model() asmodel: mu=0, sd=20)
= pm.Normal('alpha',
beta
sigma==pm.Normal('beta', sd=20)upper=50)
pm.Uniform('sigma',mu=0,lower=0,
y_est = alpha + beta * data['GDX'].values
likelihood = pm.Normal('GLD', mu=y_est, sd=sigma,
observed=data['GLD'].values)
start
step ==pm.NUTS(state=start)
pm.find_MAP()
trace = pm.sample(100, step, start=start,
progressbar=False)
Figure11­24showstheresultsfromthe MCMC samplingproceduregiven
parameters:
theassumptionsaboutthepriorprobabilitydistributionsforthethree
In [40]: figplt.figure(figsize=(8,8))
= pm.traceplot(trace)
Figure11­24.Traceplotsforalpha,beta,andsigmabasedonGDXandGLD
data
are prettycloseto each other: from
Figure11­25addsal the resultingregressionlinestothescatterplot
before.Alltheregressionlines
In [41]: plt.figure(figsize=(8,
plt.scatter(data['GDX'],4))data['GLD'], c=mpl_dates,
marker='o')plt.grid(True)
plt.xlabel('GDX')
plt.ylabel('GLD')
for plt.plot(data['GDX'],
iin range(len(trace)):trace['alpha'][i] + trace['beta']
[i] * dataplt.colorbar(ticks=mpl.dates.DayLocator(interval=250),
['GDX'])
format=mpl.dates.DateFormatter('%d %b %y'))
Figure11­25.Scatterplotwith“simple”regressionlines
of theregressionapproachused:the
approachdoesnottakeintoaccountevolutionsovertime.Thati
Thefigurerevealsamajordrawback s , themost
recentdata is treatedthesameway as theoldestdata.
ToAspointedouta
estimates. t thebeginningofthis section,theBayesianapproachin
thatnewdatarevealedovertimeallowsforbetterregressionsand
financeis generallymostusefulwhenseenasdiachronic—i.e.,inthesense
incorporatet h
regressionparametersi s conceptinthecurrentexample,weassumethatthe
are notonlyrandomanddistributedinsomefashion,
randomvariablestostochasticprocesses(whichareessentiallyordered
butthattheyfollowsomekindofrandomwalkovertime.Iti
generalizationusedwhenmakingthetransitioninfinancetheoryfrom s thesame
sequencesofrandomvariables):
Tothisend,wedefineanew PyMC3 model,this timespecifyingparameter
valuesasrandomwalkswiththevarianceparametervaluestransformedto
logspace(forbettersamplingcharacteristics).
In [42]: model_randomwalk=
with#stdof
model_randomwalk:pm.Model()
randomwalkbestsampled
sigma_alpha,log_sigma_alpha =\ in log space
model_randomwalk.TransformedVar('sigma_alpha',
pm.Exponential.dist(1. / .02,
testval=.1),
log_sigma_betapm.logtransform)
=\
sigma_beta,model_randomwalk.TransformedVar('sigma_beta',
testval=.1), pm.Exponential.dist(1. / .02,
pm.logtransform)
Afterhavingspecifiedthedistributions of therandomwalkparameters,we
50
canproceedwithspecifyingtherandomwalksfor
makethewholeproceduremoree alpha and beta.To
f icient, datapointsatatimeshare
commoncoefficients:
In [43]: from
# pymc.distributions.timeseries
tomakethemodelsimpler, wewillimport
apply GaussianRandomWalk
thesame
coefficients#to 50 data points at a time
subsample_alpha
subsample_beta ==5050
withalpha
model_randomwalk:
= GaussianRandomWalk('alpha', sigma_alpha**-2,
shape=len(data)
subsample_alpha)beta = GaussianRandomWalk('beta', /
shape=len(data)
sigma_beta**-2,
/
subsample_beta)
#alpha_r
make coefficients
= have thesamelength
np.repeat(alpha, as prices
subsample_alpha)
beta_r = np.repeat(beta, subsample_beta)
Thetimeseriesdatasetshavealengthof1,967datapoints:
In [44]: len(data.dropna().GDX.values) # a bit longer than 1,950
Out[44]: 1967
50.Therefore,onlythef
Forthesamplingtofollow,thenumberofdatapointsmustbedivisibleby
irst 1,950datapointsaretakenfortheregression:
In[45]: with#define
model_randomwalk:
regressionregression
=pricesalpha_r
are + beta_r * data.GDX.values[:1950]
#assume
#the mean comes normally
from the distributed
regression
sdlikelihood
= pm.Uniform('sd', 0,mu=regression,
=pm.Normal('GLD', 20)
sd=sd,
observed=data.GLD.values[:1950])
Allthesedefinitionsareab it moreinvolvedthanbeforeduetotheuseof
randomwalksinsteadofasinglerandomvariable.However,theinference
stepswiththeMCMCremainessentiallythesame.Note,though,thatthe
computationalburdenincreasessubstantiallysincewehavetoestimateper
randomwalksample1,950/50=39parameterpairs(insteadof1,
before): as
In [46]: import scipy.optimize assco
with model_randomwalk:
#first pm.find_MAP(vars=[alpha,
start= optimize randomwalk beta],
fmin=sco.fmin_l_bfgs_b)
#step=
samplingpm.NUTS(scaling=start)
trace_rw = pm.sample(100, step, start=start,
progressbar=False)
Intotal, wehave100estimateswith39timeintervals:
In [47]: np.shape(trace_rw['alpha'])
Out[47]: (100, 39)
samples, as inFigure11­26: of theestimatesandtheaverageover
overtimebyplottingasubset
Wecanillustratetheevolutionoftheregressionfactors alphaandbetaal
In [48]: part_dates = np.linspace(min(mpl_dates), max(mpl_dates), 39)
In [49]: fig, ax1 = plt.subplots(figsize=(10,
plt.plot(part_dates, 5)) axis=0),
np.mean(trace_rw['alpha'],
for i in range(45,
'b',lw=2.5,55):label='alpha')
plt.plot(part_dates, trace_rw['alpha'][i], 'b-.',
lw=0.75) plt.xlabel('date')
plt.ylabel('alpha')
plt.axis('tight')
plt.legend(loc=2)
plt.grid(True)
ax1.xaxis.set_major_formatter(mpl.dates.DateFormatter('%d %b
%y') ) ax2 = ax1.twinx()
plt.plot(part_dates,
'r', lw=2.5,np.mean(trace_rw['beta'],
label='beta') axis=0),
for iplt.plot(part_dates,
inrange(45,55): trace_rw['beta'][i], 'r-.',
lw=0.75) plt.ylabel('beta')
plt.legend(loc=4)
fig.autofmt_xdate()

Figure11­26.Evolutionof(mean)alphaand(mean)betaovertime(updated
estimatesovertime)
ABSOLUTE PRICE
RETURNDATA VERSUS
DATA RELATIVE
BothwhenpresentingthePCAanalysisimplementationandforthis
exampleaboutBayesianstatistics, we’veworkedwithabsoluteprice
purposesonly,becausetherespectivegraphicalresultsareeasier
levelsinsteadofrelative(log)returndata.Thisisforillustration to
However,forreal­worldfinancialapplicationsyouwouldinsteadrely
understandandinterpret(theyarevisually“moreappealing”).
onrelativereturndata.
Usingthemean alpha and beta values,wecanil ustrate howthe
regressionisupdatedovertime.Figure11­27againshowsthedatapoints
asascatterp l o t . Inaddition,the
increasestheregressionf (for
alpha andbetavalues areit displayed.39 regressionlinesresultingfromthe
recent mean
It is obviousthatupdatingovertime
thecurrent/most data)tremendously
—inotherwords,everytimeperiodneedsits ownregression:
In [50]: plt.figure(figsize=(10,
plt.scatter(data['GDX'], 5))data['GLD'], c=mpl_dates,
marker='o')plt.colorbar(ticks=mpl.dates.DayLocator(interval=250),
plt.grid(True)format=mpl.dates.DateFormatter('%d %b%y'))
plt.xlabel('GDX')
plt.ylabel('GLD')
xfor= iin
np.linspace(min(data['GDX']),
range(39): max(data['GDX']))
alpha_rw=np.mean(trace_rw['alpha'].T[i])
beta_rw= np.mean(trace_rw['beta'].T[i])
* i / 39)) plt.plot(x, alpha_rw +beta_rw *x, color=plt.cm.jet(256
Figure11­27.Scatterplotwithtime­dependentregressionlines(updated
estimates)
ThisconcludesthesectiononBayesianregression,whichshowsthat
Python offerswith PyMC3 apowerfullibrarytoimplementdifferent
approachesfromBayesians
finance. tatistics. Bayesianregressioninparticularis a
toolthathasbecomequitepopularandimportantrecentlyinquantitative
Conclusions
Statisticsisnotonlyanimportantdisciplineinits ownright, butalsoand
providesindispensibletoolsformanyotherdisciplines,likefinance
socialsciences.Itisimpossible to giveabroadoverviewofstatistics in athe
singlechapter.Thischapterthereforeconcentratesonfourimportanttopics,
il ustrating theuseofPythonandseveralstatistics libraries onthebasisof
rNormalityt
ealiThenormalityassumptionwithregardtofinancialmarketreturnsi
stic examples:
ests s an
thereforeimportanttobeabletot
importantoneformanyfinancialtheoriesandapplications;i
est whethercertaintimeseriesdata
tis
conformstothisassumption.Aswehaveseen—viagraphicaland
sdistributed.
tatistical means—real­worldreturndatagenerallyis notnormally
Modernportfoliotheory
beconsideredone of themajorconceptualandintellectualsuccessesof
MPT,withits focusonthemeanandvariance/volatilityofreturns,can
statistics infinance;theimportantconceptofinvestmentdiversification
isbeautifullyi l ustrated inthis context.
Principalcomponentanalysis to
factor/componentanalysistasks;wehaveshownthatfiveprincipal
PCAprovidesaprettyhelpfulmethod reducecomplexityfor
components—constructed from the30stockscontainedintheDAX
index—sufficetoexplainmorethan95%oftheindex’sv ariability.
Bayesianregression
Bayesianstatisticsingeneral(andBayesianregressioninparticular)has
shortcomingsofotherapproaches,asintroducedinChapter9;even
becomeapopulartoolinfinance,sincethisapproachovercomes
themathematics and theformalism
ideas—liketheupdating are moreinvolved,thefundamentalif
of probability/distributionbeliefsovertime—
areeasilygraspedintuitively.
FurtherReading
Thefollowingonlineresourcesarehelpful:
InformationabouttheSciPystatistical functionsis foundhere:
http://docs.scipy.org/doc/scipy/reference/stats.html.
http://statsmodels.sourceforge.net/stable/.statsmodels library:
Alsoconsultthedocumentationofthe
Fortheoptimizationfunctionsusedinthischapter,referto
http://docs.scipy.org/doc/scipy/reference/optimize.html.
Thereisashortt
thelibraryis stil uinearlyreleasemodeandnotyetfullydocumented.
torial availableforPyMC3;atthetimeofthiswriting
Usefulreferencesinbookformare:
FinancialTheoryandCorporatePolicy,4thed.Pearson,Boston,MA.
Copeland,Thomas,FredWeston,andKuldeepShastri(2005):
Downey,Allen(2013):ThinkBayes.O’Reilly,Sebastopol,CA.
SGeweke,John(2005):ContemporaryBayesianEconometricsand
tatistics. JohnWiley&Sons,Hoboken,NJ.
Wiley&Sons,Hoboken,NJ.
Rachev,Svetlozaretal.(2008):BayesianMethodsinFinance.John
[41]77­91.Cf.Markowitz,Harry(1952):“PortfolioSelection.”Journal of Finance,Vol.7,
[42] Analternativeto np.sum(x) - 1 wouldbetowrite np.sum(x) == 1 taking
intoaccountthatwith Python theBoolean True valueequals1andthe False value
[43] Notethatweworkhere—andinthesectiontofollowonBayesianstatistics—
equals0.
statistical y sound.Thereasonforthis is thatit simplifiesintuitionandmakes
withabsolutestockpricesandnotwithreturndata,whichwouldbemore
graphicalplotseasiertoi nterpret. Inreal­worldapplications,youwouldusereturn
[44]data.Fora Python­based introductionintotheseandotherfundamentalconceptsof
Bayesianstatistics, refertoDowney(2013).
[45] Thisexampleandtheoneinthefollowingsubsectionarefromapresentationby
[46]ThomasWiecki,oneoftheleaddevelopersof
forthischapter,forwhichIammostgrateful. PyMC3; heallowedmetousethem
Cf.http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo.Forexample,the
MonteCarloalgorithmsusedthroughoutthebookandanalyzedindetailin
step/valueonlydepends onthecurrentstateoftheprocessandnotonanyother
Chapter10allgenerateso­calledMarkovchains,sincetheimmediatenext
historicstateorvalue.
Notealsoherethatweareworkingwithabsolutepricelevelsandnotreturn
[47]data,whichwouldbes tatistical y moresound.Forareal­world(trading)
application,youwouldratherchoosethereturndatatoimplementsuchananalysis.
Chapter12.ExcelIntegration
MicrosoftExceli
times.—KiratSinghs probablythemostsuccessfuldataanalyticsplatformofal
It is fair tosaythatMicrosoft Excel—as partofMicrosoft’s Officesuite
ofproductivitytools—isoneofthemostwidelyusedtoolsandapplications
iinthefinanceindustryandthefinancefunctionsofcorporateandother
nstitutions. Whatstartedoutasacomputerizedversionofpaper
fieldsandindustries). (in addition to themanyusecasesinother
spreadsheetshasbecomeamultipurposetool
financialapplicationbuilding for financialanalysisand
characterizedbyafewmainfeatures: Excel andLibreOffice Calc, are
Spreadsheetapplications,likeMicrosoft
Organization
Aworkbookiin turnareorganizedincells.
s aspreadsheetapplicationfile thatis organizedinsingle
sheetsthat
DataDatais generallystoredintabularforminsinglecel s; thecel s contain
thedataitself(e.g.,afloating­pointnumberoratexts tring), formatting
somecomputercode(if,forexample,thedatainthecellistheresultof
informationfordisplaypurposes(e.g., fonttype,color),andmaybe
Functionality
anumericaloperation).
Giventhedatastoredinsinglecel s, youcandocomputationaland
otheroperationswiththatdata,likeaddingormultiplyingintegers.
Visualization
Datacanbeeasilyvisualized,forexample,asapiechart.
Programmability
Modernspreadsheetapplicationsallowhighlyflexibleprogrammability,
spreadsheet.Visual Basic for Applications (VBA)withinan Excel
e.g.,via
References
name,column,androw)identifyingthec
iThemajortoolforimplementingfunctionalityorwriting,e.g.,VBAcode
s thecellreference;everycel hasuniquecoordinates(workbook,sheet
el .
Thisbriefcharacterizationmightexplainthepopularity:alltechnical
combinedtohaveavailablea of thefeaturesj
and tools(Python,
oflPython andNumPy,matplotlib,PyTables,etc.)
elementsneededtoimplementfinancialanalyses
coupleoflibraries
inasingleplace.Thinking ted. need a
ustorlisapplicationsarefound
thepreviouschapters,you
Suchconvenienceandone­size­fits­allapproachesgenerallycomea
though.Topickj u s t t acost,
onearea,spreadsheetsarenotsuitedtostoringlarge
amountsofdataordatawithcomplexrelationships.Thisisthereasonwhy
Microsoft Excel inthefinanceindustryhasdevelopedmoreasageneral
graphicaluserinterface(GUI)“only.”Inmanycases,itismainlyusedto
hocanalyses.Forexample,thereareinterfacesavailableto get datafromad
leadingdataserviceproviders,likeBloombergandThomsonReuters,into
displayandvisualizedataandaggregateinformationandtoimplement
Excel (andmaybetheotherwayaround).
ThischapterworksontheassumptionthatMicrosoft Excel i s availableon
almosteverydesktopornotebookcomputerandthatit is usedasageneral
GUI.Inthissense,
Manipulationtool Python canplaythefollowingroles:
UsingPython, youcaninteractwithandmanipulate Excel
spreadsheets.
Pythoncan providedatatoaspreadsheetandreaddatafroma
Dataprocessor
spreadsheet.
Analyticsengine
Python canprovidei t s wholeanalyticscapabilitiestospreadsheets,
becomingafull­fledgedsubstitutefor VBA programming.
BasicSpreadsheetInteraction
xlrd and xlwtPython
Fundamental libraries toworkwith Excel spreadsheetfiles are
(cf. http://www.python­excel.org).Althoughquitepopular,
amajordrawbackof xlwt i s thati t can onlywritespreadsheetfiles
compatiblewithMicrosoft Excel97/2000/XP/2003, OpenOffice.org
Gnumeric—i.e., thosewiththesuffix.xls.Therefore,wealsousethe
andlibrariesxlsxwriterand Calc,
OpenPyxl, whichgeneratespreadsheetfiles inthe
current .xslxformat. We’llbegin,then,withafewimports.
In [1]: import numpy
pandasasasnppd
import
import xlrd, xlwt
xlsxwriter
path ='data/'
GeneratingWorkbooks(.xls)
objecttarwb.t bygeneratingaworkbookwithtwosheets.[48]First,theWorkbook
Wes Notethatthis is anin­memoryversionoftheworkbookonly(so
far):
In [2]: wb = xlwt.Workbook()
In [3]: wb
Out[3]: <xlwt.Workbook.Workbook at 0x7f7dcc49df10>
Thesecondstepis toaddoneormultiplesheetstothe Workbook object:
In [4]: wb.add_sheet('first_sheet', cell_overwrite_ok=True)
Out[4]: <xlwt.Worksheet.Worksheet at 0x7f7dac9dde90>
Wenowhaveone Worksheet object,whichhasindexnumber0:
In [5]: wb.get_active_sheet()
Out[5]: 0
Tofurtherworkwiththesheet,defineanalias forit:
In [6]: ws_1
ws_1 = wb.get_sheet(0)
Out[6]: <xlwt.Worksheet.Worksheet at 0x7f7dac9dde90>
Ofcourse,thesetwosteps—instantiationandaliasdefinition—canbe
combinedintoasinglestep:
In [7]: ws_2 = wb.add_sheet('second_sheet')
ndarrayWorksheet
Both objectsarestillempty.Therefore,let usgeneratea NumPy
objectcontainingsomenumbers:
In [8]: data = np.arange(1, 65).reshape((8, 8))
In [9]: data
Out[9]: array([[[ 9,1, 10,2, 11,3, 12,4, 13,5, 14,6, 15,7, 16],8],
[17, 26,18, 27,19, 28,20, 29,21, 30,22, 31,23, 32],
[25, 24],
[33, 42,34, 51,43,35, 52,44,36, 53,45,37, 54,46,38, 55,47,39, 56],
[41, 40],
48],
[49, 58,50, 59, 60, 61, 62, 63, 64]])
[57,
Usingthe write methodandprovidingrowandcolumninformation(with
worksheet:
zero­basedindexing),dataiseasilywrittentoacertaincellinacertain
In [10]: ws_1.write(0,
# write 100 0,in 100)
cell "A1"
Thisway,thesampledatacanbewritten“inbulk”tothetwo
objects: Worksheet
In [11]: for forc inrrange(data.shape[0]):
in range(data.shape[1]):
ws_1.write(r,
ws_2.write(r, c,c, data[c,
data[r, r])c])
Thesavemethodofthe
Workbook objecttodisk: Workbook classallowsustosavethewhole
In [12]: wb.save(path + 'workbook.xls')
Windows systems,thepathmightlooklike
Onr"C:\path\data\workbook.xls".
GeneratingWorkbooks(.xslx)
(Thecreationofspreadsheetfiles inthenewformatworksessentiallythe
sameway.First,wecreateaWorkbookobject:
In [13]: wb = xlsxwriter.Workbook(path + 'workbook.xlsx')
Second,the Worksheet objects:
In [14]: ws_1
ws_2 == wb.add_worksheet('first_sheet')
wb.add_worksheet('second_sheet')
Third,wewritedatatothe Worksheet objects:
In [15]: for forc inrrange(data.shape[0]):
in range(data.shape[1]):
ws_1.write(r,
ws_2.write(r, c,data[c,
c, data[r,c])r])
Fourth,weclosethe Workbook file object:
In [16]: wb.close()
In [17]: ll $path*
Out[17]: -rw-------
-rw------- 11 yhilpisch
yhilpisch 7375
5632 SepSep 2828 18:18
18:18 data/chart.xlsx
data/workbook.xls
-rw------- 1 yhilpisch 6049 Sep 28 18:18 data/workbook.xlsx
Ifeverythingwentwell,thefile openedinMicrosoft Excel shouldlook
likeFigure12­1.

Figure12­1.ScreenshotofworkbookinExcel
xlsxwriter hasmanymoreoptionstogenerate Workbook
examplewithcharts.Considerthefollowingcode( objects,for
cf. the xlsxwriter
documentation):
In [18]: wbws == xlsxwriter.Workbook(path
wb.add_worksheet() + 'chart.xlsx')
#write
values= cumsumofrandomvaluesinfirst
np.random.standard_normal(15).cumsum() column
ws.write_column('A1', values)
# create= wb.add_chart({'type':
chart a newchartobject 'line'})
#chart.add_series({'values':
add aseries to the chart '=Sheet1!$A$1:$A$15',
'marker':(here:{'type':
#series with markers diamond)'diamond'},})
#insert the chart chart)
ws.insert_chart('C1',
wb.close()
Theresultingspreadsheetfile is shownasascreenshotinFigure12­2.
Figure12­2.ScreenshotofworkbookinExcelwithachart
ReadingfromWorkbooks
Thesister library xlrd is responsibleforreadingdatafromspreadsheet
files(i.e., workbooks):
In [19]: book = xlrd.open_workbook(path + 'workbook.xlsx')
In [20]: book
Out[20]: <xlrd.book.Book at 0x7f7dabec4890>
Onceaworkbookis opened,the sheet_names methodprovidesthenames
ofal Worksheet objectsinthis particular Workbook object:
In [21]: book.sheet_names()
Out[21]: [u'first_sheet', u'second_sheet']
Worksheets canbeaccessedviatheir namesorindexvalues:
sheet_1 == book.sheet_by_index(1)
In [22]: sheet_2 book.sheet_by_name('first_sheet')
sheet_1
Out[22]: <xlrd.sheet.Sheet at 0x7f7dabec4a10>
In [23]: sheet_2.name
Out[23]: u'second_sheet'
Importantattributesofa Worksheet objectare ncols and nrows, indicating
thenumberofcolumnsandrows,respectively,thatcontaindata:
In [24]: sheet_1.ncols, sheet_1.nrows
Out[24]: (8, 8)
Singlecells—i.e. Cell objects—areaccessedviathe cell method,
providingthenumbersforboththerowandthecolumn(again,numbering
iparticularc
s zero­based).The
el : value attributethengivesthedatastored in t h i s
In [25]: clcl.value
= sheet_1.cell(0, 0)
Out[25]: 1.0
Theattribute ctype givesthecel type:
In [26]: cl.ctype
Out[26]: 2
Table12­1lists allExcelcel types.
Table12­1.Excelcel types
Type Number Python type
XL_CELL_EMPTY 0 Empty string
XL_CELL_TEXT 1 AUnicode string
XL_CELL_NUMBER 2 float
XL_CELL_DATE 3 float
XL_CELL_BOOLEAN 4 int (1= TRUE, 0= FALSE)
XL_CELL_ERROR 5 int representinginternal Excel codes
XL_CELL_BLANK 6 Empty string, onlywhen formatting_info=True
Similarly,youcanaccesswholerowsbyprovidingthenumberoftherow
tothe row method:
In [27]: sheet_2.row(3)
Out[27]: [number:25.0,
number:26.0,
number:27.0,
number:28.0,
number:29.0,
number:30.0,
number:31.0,
number:32.0]
And,analogously,wholecolumns:
In [28]: sheet_2.col(3)
Out[28]: [number:4.0,
number:12.0,
number:20.0,
number:28.0,
number:36.0,
number:44.0,
number:52.0,
number:60.0]
Themethods row_values and col_values
containedintherespectiveroworcolumn: onlydeliverthevalues
In [29]: sheet_1.col_values(3, start_rowx=3, end_rowx=7)
Out[29]: [28.0, 29.0, 30.0, 31.0]
In [30]: sheet_1.row_values(3, start_colx=3, end_colx=7)
Out[30]: [28.0, 36.0,44.0, 52.0]
Toreadoutallthedataina
androwsthatcontaindata: Worksheet object,j u s t i t e r a t e overa l columns
In [31]: for forc inrinrange(sheet_1.ncols):
range(sheet_1.nrows):
printprint '%i' % sheet_1.cell(r, c).value,
Out[31]: 19 2103114 5126137 814 15 16
1725 1826 1927 2028 2129 2230 2331 2432
3341 3442 3543 3644 3745 3846 3947 4048
4957 5058 5159 5260 5361 5462 5563 5664
UsingOpenPyxl
There is yetanotherlibrarytogenerateand read Excel spreadsheetfiles in
.xlsx formatwith Python:OpenPyxl. Thislibraryallowsus to bothcreate
Pythonic
spreadsheetfandmightthereforebeworthtakingalooka
similartotheotherl
iles andreadfromthem.Inaddition,whilebasicusagei
ibraries, theinterfaceisinsomecasesabitmore
t. Importthelibrary
s
asfollows:
In [32]: import openpyxl as oxl
Letusproceedasbefore.First, generatea Workbook object:
In [33]: wb = oxl.Workbook()
Second,createa Worksheet object:
In [34]: ws = wb.create_sheet(index=0, title='oxl_sheet')
Third,writethedatatotheworksheet:
In [35]: forcforinws.cell(row=r,
rrange(data.shape[0]):
column=c).value =data[c, r]
inrange(data.shape[1]):
# creates a Cell object and assigns a value
Fourth,closethefileobject:
In[36]: wb.save(path + 'oxl_book.xlsx')
WithOpenPyxl,you can alsoreadworkbooks:
In [37]: wbel =oxl.load_workbook(path
Now,singlec s areeasilyaccessedviath+ei'oxl_book.xlsx')
r cel names:
In [38]: ws = wb.get_active_sheet()
In [39]: cell = ws['B4']
In [40]: cell.column
Out[40]: 'B'
In [41]: cell.row
Out[41]: 4
In [42]: cell.value
Out[42]: 12
Similarly,youcanaccesscel rangesasin Excel:
In [43]: ws['B1':'B4']
Out[43]: ((<Cell
(<Cell oxl_sheet.B1>,),
oxl_sheet.B2>,),
oxl_sheet.B3>,),
oxl_sheet.B4>,))
In[44]: forcellprintincell[0].value
ws['B1':'B4']:
Out[44]: 910
1112
Therei s alsoa range methodtowhichyoucanprovidethec
Excel syntaxasastring: e l rangein
In [45]: ws.range('B1:C4')
# same as ws['B1':'C4']
Out[45]: ((<Cell
(<Cell oxl_sheet.B1>,
oxl_sheet.B2>, <Cell oxl_sheet.C1>),
oxl_sheet.B4>, <Cell oxl_sheet.C2>),
(<Cell oxl_sheet.B3>, <Cell oxl_sheet.C3>),
oxl_sheet.C4>))
In [46]: for rowfor incellws.range('B1:C4'):
incell.value,
row:
print
print
Out[46]: 9101718
1112 1920
Refertothelibrary’swebsiteformoredetails.
UsingpandasforReadingandWriting
thepandasxlwtlibrary. need aDataFrameobjectfor
library. LetWeususetheseapproaches
Chapter7showshowtointeractwith Exceltoread eachthedatawrittenwith
spreadsheetfilesusingthe
sheet.With
header=None,pandas
thedata set: doesnotinterpretthef i r s t data rowas theheaderfor
In [47]: df_1
df_2 == pd.read_excel(path
pd.read_excel(path ++'workbook.xlsx',
'workbook.xlsx',
'first_sheet',header=None)
'second_sheet',header=None)
Torecoverthecolumnnames/valuesofthespreadsheetf ile, let usgenerate
alistwithcapitallettersascolumn names fortheDataFrameobjects:
In [48]: import
columnsstring
c in=range(data.shape[0]):
[]
for columns.append(string.uppercase[c])
columns
Out[48]: ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']
Wepassthis list asthenewcolumnnamestothetwoobjects:
In [49]: df_1.columns
df_2.columns == columns
columns
Indeed,theoutput of thetwo
spreadsheetstyleprettywell: DataFrame objectsnowresemblesthe
In [50]: df_1
Out[50]: 0 1A 9B 17C 25D 33E 41F 49G 57H
231 432 121110 201918 282726 363534 444342 525150 605958
564 765 151413 232221 313029 393837 474645 555453 636261
7 8 16 24 32 40 48 56 64
In [51]: df_2
Out[51]: 0 1A 2B 3C 4D 5E 6F 7G 8H
12 179 1018 1119 1220 1321 1422 1523 1624
453 413325 423426 433527 443628 453729 463830 473931 484032
76 5749 5850 5951 6052 6153 6254 6355 6456
Similarly, pandas allowsustowritethedatato Excel spreadsheetfiles:
In [52]: df_1.to_excel(path + 'new_book_1.xlsx', 'my_sheet')
Notethatwhenwriting DataFrame objects to spreadsheetfiles pandas adds
bothcolumnnamesandindexvalues,asseeninFigure12­3.
Ofcourse, pandas­generatedExcel
the xlrd library: workbookscanbereadasbeforewith
In [53]: wbn = xlrd.open_workbook(path + 'new_book_1.xlsx')
In [54]: wbn.sheet_names()
Out[54]: [u'my_sheet']
Towritemultiple DataFrame
anExcelWriterobject: objects to asinglespreadsheetf i l e , oneneeds
In [55]: wbwdf_2.to_excel(wbw,
= pd.ExcelWriter(path
df_1.to_excel(wbw, + 'new_book_2.xlsx')
'first_sheet')
'second_sheet')
wbw.save()
Letusinspecti f
spreadsheetfile: weindeedhavegeneratedthetwosheetsinthesingle
In [56]: wbn = xlrd.open_workbook(path + 'new_book_2.xlsx')
In [57]: wbn.sheet_names()
Out[57]: [u'first_sheet', u'second_sheet']

Figure12­3.ScreenshotofworkbookinExcelwrittenwithpandas
Asaf i n a l usecasefor pandas and Excel, considerthereadingandwriting
oflargeramountsofdata.Althoughthisisnotafastoperation,itmightbe
usefulinsomecircumstances.F irst, thesampledatatobeused:
In [58]: data = np.random.rand(20, 100000)
In [59]: data.nbytes
Out[59]: 16000000
Second,generatea DataFrame objectoutofthesampledata:
In [60]: df = pd.DataFrame(data)
Third,writeit asan Excel file tothedisk:
In [61]: %time df.to_excel(path + 'data.xlsx', 'data_sheet')
time: user
Out[61]: CPUWalltimes: 1min 1min
25s 25s, sys: 460 ms, total: 1min 26s
NumPyndarray
Thistakesquiteawhile.Forcomparison,seehowf
objectis(onanSSDdrive): ast nativestorageofthe
In[62]: %time np.save(path + 'data', data)
Out[62]: Wall
CPU times: user
time: 159 ms8 ms, sys: 20 ms, total: 28 ms
In [63]: ll $path*
Out[63]: -rw-------
-rw------- 11 yhilpisch
yhilpisch 7372 SepSep 2828 18:18
16000080 18:20 data/chart.xlsx
data/data.npy
-rw------- 1 yhilpisch 3948600 Sep
-rw------- 1 yhilpisch 5828 Sep 28 18:18
data/new_book_1.xlsx 28 18:20 data/data.xlsx
-rw------- 1 yhilpisch 6688 Sep 28 18:18
data/new_book_2.xlsx
-rw------- 1 yhilpisch 6079 Sep 28 18:18
data/oxl_book.xlsx
-rw------- 1 yhilpisch 5632 Sep 28 18:18
data/workbook.xls
-rw------- 1 yhilpisch 6049 Sep 28 18:18
data/workbook.xlsx
Fourth,readit fromdisk.Thisis significantlyfasterthanwritingit:
In [64]: %time df = pd.read_excel(path + 'data.xlsx', 'data_sheet')
Out[64]: CPUWalltimes:
time: user
6.51 6.53
s s, sys: 44 ms, total: 6.58 s
However,seeagainthespeeddifferencecomparedtonativestorage:
In [65]: %time data = np.load(path + 'data.npy')
Out[65]: Wall
CPUtimes:
time: 40.5
user ms16 ms, sys: 8 ms, total: 24 ms
In [66]: data, df =
!rm $path* 0.0, 0.0
ScriptingExcelwithPython
Theprevioussectionshowshowtogenerate,read,andmanipulate Excel
spreadsheetf
cases, i l e s ( i . e . , workbooks).Althoughtherearesomebeneficialuse
Python isnottheonlyway,andsometimesalsonotthebestway,to
Muchmoreinterestingistoexpose
achievetheresultspresentedthere. theanalyticalpowerof Python to
Excel spreadsheets.However,t
Forexample,the PythonlibraryPyXLL
functionsviaso­called h i s isatechnicallymoredemandingtask.
Excel add­ins,Microsoft’stechnology to enhance
providesmeanstoexposePython
thefunctionalityof Excel. Additionally,thecompanyDataNitroprovidesa
Pythona
solutionthatallowsthef
fullsubstituteforVBAprogramming.
ul integrationofPythonand Exceland makes
commercialproductsthatneed tobe licensed. Bothsolutions,however,are
InExcelwhatfollows,weprovideanoverviewofhowtouse
scripting,sincet h i s i s aratherflexibleapproachtointegrating DataNitrofor
Python with Excel.
InstallingDataNitro
DataNitro worksonWindowsoperatingsystemsand Excel installations
only.On Mac OSsystemsi t canbeusedinaWindowsvirtualmachine
and how to instal it.
websitehttp://www.datanitro.comforfurtherinstructionsonhowtogeta
(trial)licenseforthesolution
environment.I t iscompatiblewithOffice2007andhigher.Refertothe
However,
Wheninstalling
example,therei if yousDataNitro
havenoneedtoi
alreadyinstalled
youhavetheoption Pythonto(civersionordistribution.
nstal anotherAnaconda fn.sChapter2),for
tal Pythonas well.
Youthenj u s t
menu)tousetheexistinghavetocustomizethe
Anaconda DataNitro
i n s t a l a t i o n . solution(viathe
DataNitro Settings
workswithall
Python versions 2.6and higheraswellaswithversions3.x.
f successfullyinstal ed, youthenfindthe DataNitro ribbonwithin Excel,
IasdisplayedinFigure12­4.
Figure12­4.ScreenshotofExcelwithDataNitroribbon
WorkingwithDataNitro
There are twomainmethods to combine DataNitro with Excel:
Scripting
Witht his method,youcontrol Excel spreadsheetsvia Python scripts,
similartotheapproachpresentedintheprevioussection.
User­definedfunctions
Usingthisapproach,youexposeyourown Python functionsto Excel
insuchawaythattheycanbecalledfromwithinExcel.
Bothmethodsneedaninstallationofthe DataNitro solutiontowork—i.e.,
youcannotdistributesomethingthatyouhaveworkedontosomebodyelse
whodoesnothavetheDataNitrosolutioninstal ed.
ScriptingwithDataNitro
Openthepreviouslygeneratedspreadsheetf i l e workbook.xlsx in Excel.
Wewanttoworkwith DataNitro andthis particularfile. Whenyouthen
clickonthe Python Shell symbolinthe
shouldlooklikeFigure12­5. DataNitro ribbon,yourscreen

Figure12­5.Screenshot of ExcelwithDataNitroIPythonshell
Asimplesessioncouldthenlooklikethefollowing:
InOut[1]:
[1]: B1Cell("B1")
InInOut[2]:
[2]:
[3]: 9Cell("B1").value
Cell("B1").value
#In#[4]:this immediately = 'Excel
changes the with Python'value
(displayed)
in thespreadsheet
Cell("B1").value
Out[4]: u'Excel with Python'
Inthesamewayasyouchangethe
canassignaformulatoit: value attributeofa Cell object,you
In [5]: Cell("A9").formula = '=Sum(A1:A8)'
InOut[6]:
[6]: 36Cell("A9").value
Table12­2lists theattributesofthe DataNitroCell object.
Table12­2.DataNitroCellattributes
Attribute Description
row Rowofthecel
col Columnof thec(row,el
position Positionasa col) tuple
sheet Nameofthesheetthecel is in
name Nameofthecel in Excel fashion
value Valueofthecel
vertical Allcel valuesincludingandbelowthecel
vertical_range Excel rangeforallcellsincludingandbelowthecel
horizontal_range
horizontal Excel Allcelrangeforallcellsincludingandrightof
valuesincludingandrightofthecel
thecel
table All valuesincludingandbelow/rightofthecell as nested list
object
table_range Excel rangefor table object
formula Excel formula
comment Commentattachedtocel
hyperlink Hyperlinkoremailaddressas string object
alignment Text/valuealignmentfordisplay
color Cellcolor
df Letsyouwritea
spreadsheet pandasDataFrame objectdirectlytothe
example,t
attributesofthefontobject,which is apropertyofthe
Table12­3showstypesettingoptionsforthe
his: Cell object.Alloptionsare
Cell object.For
Cell("A1").font.size = 15
setsa(new)fontsize.
Table12­3.DataNitroCelltypesettingoptions
Attribute Description
size Fontsize
color Fontcolor
bold Boldfontvia Cell("A1").font.bold=True
italic Italic font
underline Underlinestext
strikethrough Putsstrikethroughlinethroughtext
subscript Subscriptstext
superscript Superscriptstext
lFinally,therearealsoacoupleofmethodsforthe
isted inTable12­4. Cell object.They are
Table12­4.DataNitroCellmethods
Attribute Description
clear Resetsal properties/attributesofthecel
copy_from Copiesal properties/attributesfromanothercel
copy_format_from Copiesa
formula l properties/attributesfromanothercel except value and
is_empty Returns True if empty
offset Returnscel objectgivenrelativeoffsetas (row,col) tuple
subtraction (2,Subtractiongivestheo
1) f set; e.g., Cell("B4")- Cell("A2")gives
print Gives name and sheet ofcel
set_name Sets namedrange in Excel; e.g.,
Cell("A1").set_name("upper_left")
Often,i
multiplet i s helpfultoworkwith CellRange insteadof Cell objectsonly.
Cell objects.Considerthefollowingexamples,stillbasedonthe
Onecanthinkoft his asanapproachtovectorizecertainoperationson
samespreadsheetfile workbook.xlsx withourpreviouschanges:
InOut[6]:
[6]: [1,2,
CellRange("A1:A8").value
3, 4,5,6, 7, 8]
In#[7]:likeCellRange("A1:A8").value
broadcasting =1
In[8]:
Out[8]: CellRange("A1:A8").value
[1, 1, 1, 1, 1, 1, 1, 1]
In [9]: CellRange("A1:A8").value = 2 * [1, 2, 3, 4]
InOut[10]:
[10]: CellRange("A1:A8").value
[1, 2, 3, 4, 1, 2, 3, 4]
In[11]: Cell("A9").value
#valueofSum
Out[11]: 20 function is
# automatically updated
Ofcourse,youcanalsouse CellRange foriteration:
In [12]:
....: for cell
printincell.name,
CellRange("A1:B2"):
cell.value
....:
A1B1Python
1 with Excel
A2B2 210
Themajorityofthe Cell attributes and methodscanalsobeusedwith
CellRange.
Whenwritingcomplex Python scriptsforinteractionwith Excel
spreadsheets,performancemightbeanissue.Basically,performancei
boundby Excelinput/output (I/O)speed.Thefollowingrulesshouldbes
Reading/writing
followedwheneverpossible:
Donotalternatereadingwithwritingoperations,sincethis mightlower
performancesignificantly.
Vectorization
Use CellRange objectsor Cell().table objectstoreadandwritedata
in(large)blocksinsteadoflo ps.[49]
Pythont intotalwithPython,tomanipulateitwith Python,and towrite
UseForexample,whenyouhavetotransformadatablock,itisbetterto
readi
ireallyslow.
t backtothespreadsheetasablock;cell­by­celloperationscanbe
Storedatain Python
StorevaluesinPythonwhenpossibleratherthanrereadingthem,
especiallyforperformance­criticalloopsorsimilaroperations.
Seetherelevantsectionsinthe
howtoworkwithwhole Worksheet
DataNitro
and Workbook
documentationfordetailson
objects.
PlottingwithDataNitro
Aspecialtopicwhenscripting Excel spreadsheetswith
plottingdatacontainedinaspreadsheetwith Python insteadofusing
DataNitrois
Excel’s plottingcapabilities.Example12­1showsa
onlyexecutablei Python scriptthat is
f DataNitro is instal ed. It retrievesAppleInc.stockprice
datawiththe DataReader
datatoanewlygenerated functionfrom
Workbook pandas ( c f . Chapter6 )
object,andthenplotsthedatastoredin, writesthe
DataNitro’smatplotlib.pyplot
therespective Worksheet objectwithwrappernitroplot—andexposesthe
Python—i.e., withthehelpof
Example12­1.Plotting datastoredinaspreadsheetwithDataNitroand
resulttothespreadsheet.
displayingamatplotlibplotinthesamespreadsheet
## Plotting with DataNitro in Excel
dn_plotting.py
#import pandas.io.data as web
import nitroplot
# wrapper as nplt (plt)
for matplotlib.pyplot
# make a new workbook
wb=new_wkbk()
rename_sheet("Sheet1", "Apple_Stock")
active_wkbk(wb)
# read Apple Inc. stock data
aapl = web.DataReader('aapl', data_source='yahoo')[['Open', 'Close']]
# write the data to the new workbook
Cell("A1").df =aapl
#generate matplotlib plot
nplt.figure(figsize=(8, 4)) Cell("C2").vertical, label='AAPL')
nplt.plot(Cell("A2").vertical,
nplt.legend(loc=0)
nplt.grid(True)
nplt.xticks(rotation=35)
# expose plot to Excel spreadsheet
nplt.graph()
#as plt.show()
# save the new workbook with data and plot
save('dn_plot.xlsx')
FromaDataNitro IPython shel , executethescriptwith:
In [1]: %run dn_plotting.py
f thescriptissuccessfullyexecuted,theworkbook/worksheetin Excel
IshouldlookasdisplayedinFigure12­6.

Figure12­6.ScreenshotofExcelwithDataNitroplotofApplestockpricedata
User­definedfunctions
Fromafinancepointofview,i t seemsmostinterestingtoexposeuser­
definedfunctions(UDFs)via
enabledinthe Settings DataNitro to Excel.This optionhastobe
in thismenuofDataNitro.Oncethisisenabled,youcan
importaPythonscriptwith
functionsincluded DataNitrocalledfunctions.py. AllPython
file—andtheyhavetobeinthisparticularfile—
functiontovalueEuropean cal optionsintheBlack­Scholes­Mertonmodel
willthenbedirectlycallablefrom Excel. Considerthe by nowwell­known
Example12­2.
inExample12­2.PythonscriptforimportwithDataNitrointoExcel
##foruse
Valuationwithof European
DataNitro calland options
Excel in BSM model
spreadsheets
## functions.py
# analytical Black-Scholes-Merton (BSM) formula
Valuation of European
def '''bsm_call_value(S0, K,T, r,callsigma):option in BSM model.
Analytical formula.
Parameters
==========
S0 :float
initial stock/index level
K :float
strike price
T :float (for t=0)
r : time-to-maturity
float
constant risk-free short rate
sigmavolatility
: float factor in diffusion term
Returns
=======
valuepresent
: floatvalue of the European call option
'''from
from math
scipyimport
import log,statssqrt, exp
d1S0 = (log(S0
float(S0)/ K) +(r + 0.5 * sigma ** 2) * T) / (sigma * sqrt(T))
d2return=(log(S0
value=(S0 - K */stats.norm.cdf(d1,
value K) + (r* T)*
exp(-r - 0.5stats.norm.cdf(d2,
* sigma0.0,**1.0)2) * T)0.0,/ (sigma
1.0))* sqrt(T))
Ifthisscriptisimportedvia
thevaluationformulafrom DataNitro,
Excel. withUDFsenabled,youcanuse
Inpractice,youcanthentypethe
followingintoan Excel cel :
= bsm_call_value(B1, 100, 2.0, B4, B5)
Thisisthesamethen as withanyother Excel formula.Forexample,havea
theBlack­Scholes­Mertonmodel.Whenyouclickon see aparameterization
Insert Functionofin
theFORMULAStabofExcel,youcanenterafunctiondialogfortheoption
lookatFigure12­7.Intheupper­leftcorneryou
valuationformulafrom Python (youfinditeunder
Onceyouhaveprovidedreferencestothec l s DataNitro functions).
containingthesingle
parametervalues, Python calculatestheoptionvalueandreturnstheresult
toExcel.

Figure12­7.ScreenshotofExcelfunctiondialogforPythonfunction
Althoughthisfunctionisnotthatcomputationallydemanding,i
howtoharnesstheanalyticalpowerofPythonfromExcelandhowto t i l u s t r a t e s
exposetheresultsdirectlyto Excel(cells). Similarly,seeFigure12­8.
Here,weuseaparametergridtocalculatemultiple
Theformulaincel D11 thentakesontheform: optionvaluesat once.
= bsm_call_value($B$1, $A11, D$8, $B$4, $B$5)

Figure12­8.ScreenshotofExcelwithparametergridforEuropeanoption
values
Whereasinapreviousexampleweplotteddatacontainedinan
spreadsheet,wecannowalsoplotdatageneratedwithPython Excel
in our
spreadsheet.Figure12­9showsa
Europeanoptionvaluesurface. 3D plotgeneratedwith Excel forthe
Figure12­9.ScreenshotofExcelwithEuropeanoptionvaluesurfaceplot
xlwings
Atthetimeofthis writing,anewcontenderinthe Python-Excel
integrationworldhasemerged:
Python.It xlwings.xlwings
is, incontrasttothe DataNitrosolution,
functionalityforinteractingwithandscripting providesalmosta l
Excelanopensourcelibrary
spreadsheetswith the
“powered”spreadsheetonlyneedsa(minimal) Python instal ation. One
advantageofxlwingsisthatitworkswithExcelbothonWindowsand
andcanbefreelyshippedwithanyspreadsheet.Thereceiverofanxlwings
integrating
itisonlyinearlyrelease(0.3a
Apple/Macoperatingsystems. Inthetimeoft
Python and Excel tshouldgivei
solutionandapproachlookpromising.andanybodyinterestedin
addition,ithatiswelldocumented,although
isrwriting).Thewhole
y.
Conclusions
ThereareseveraloptionstointegratePythonwith Excel. Some Python
libraries—like xlwt
spreadsheets.Otherlibrarieslike or xlsxwriter—allow
xlrd thecreationof Excel
allowthereadingofarbitrary
spreadsheetf
fpandas
iwheni iles, ortheyallowbothreadingandwritingofspreadsheet
les. t comestowritinglargerdatasetstoa
i s , a t l e a s t for sometasks,alsohelpful.Forexample,
comestoreadingdatastoredinsuchafile format. i t
spreadsheetfile orwheniis usefult
Themostpowerfulsolution,however,a t
byDataNitrothatoffersatightintegrationofbothworlds.Ithassimilar thetimeoft h i s writingistheone
(orevenbetter)spreadsheetmanipulationcapabilitiesthanotherl
Inaddition, DataNitro allowsus,
Excelspreadsheets. More i t usto
importantly,for example,toexpose i b
Pythonplots to
allows defineuser­defined r a r i e s .
Python functions(UDFs)
wayasExcel’sb u i l t ­ i n for usagewith
functions are. Excel
xlwings, that are callableinthesame
anew,opensourcelibrary
thathasbeenmadeavailablerecently, is similarinscopeandcapabilities to
the DataNitro solution.
Inparticular,the
Excelas DataNitro and xlwings approachesallowustouse
capabilitiesofaflexibleandpowerfulgeneralGUI—availableonalmostevery
Python. Thebestofbothworlds,sotosay.
computerinthefinanceindustry—andcombineitwiththeanalytical
FurtherReading
Foralllibrariesandsolutionspresentedinthis chapter,therearehelpful
webresourcesavailable:
Forxlrdand xlwt,seehttp://www.python­excel.orgforthe onlinet
documentation;thereisalsoatutorialavailableinPDFformata
http://www.simplistix.co.uk/presentations/python­excel.pdf.
xlsxwriterisnicelydocumented
http://xlsxwriter.readthedocs.org. onthewebsite
OpenPyxl hasits homehere:http://pythonhosted.org/openpyxl/.
Fordetailedinformationabout PyXLL, seehttps://www.pyxll.com.
Freetrialsanddetaileddocumentationfor
http://www.datanitro.com. DataNitro canbefoundat
Youcanfindthedocumentationandeverythingelseyouneedregarding
xlwings athttp://xlwings.org.
Excel.inThismightsometimesbemoreefficientthanthewayspresentedinthe
[48]exportdata
withNotethatasimplemechanism
theformofacomma­separatedvalue(CSV)f ile and to importt
to generate Excel spreadsheetsfrom Pythonisto
his
followingdiscussion.
Cf.Chapter4forasimilardiscussioninthecontextof NumPyndarray objects
[49]andthebenefitsofvectorization.Theruleofthumbthereaswellashereistoavoid
loopsonthe Pythonlevel.
Chapter13. ObjectOrientation
andGraphicalUserInterfaces
First,solvetheproblem.Then,writethecode.
—JonJohnson
Objectorientationhasi ts fansandcritics. Referringtothequoteabove,
object­orientedimplementationstylesmightprovidethemostbenefitwhen
theyareappliedbyprogrammerswhoreallyunderstandtheproblemat
handandwhenthereismuchtogainfromabstractionandgeneralization.
Ontheotherhand,ifyoudonotknowwhatexactlytodo,adifferent,more
interactiveandexploratoryprogrammingstyle, likeprocedural
programming,mightbeabetterchoice.
In this chapter,wedonotwanttodiscusstherisksandmerits of using
objectorientation.Wetakeit forgrantedthatthis approachhasits place
whenitcomestothedevelopmentofmorecomplexfinancialapplications
(graphicaluserinterfaces(GUIs),objectorientationingeneralisaconditio
canumberofmeasurablebenefitsinthesecases.Wheni
f. theprojectimplementedinPartIIIofthebook)andthatitbringsalong
t comestobuilding
Therefore, wecombinethetwotopicsinthis chapterandintroducefirst
sinequanon.
fundamentalconcepts of Python and
knowledge,itis mucheasiertointroduce the objects.Equippedwitht
classes developmentofGUIs. his
ObjectOrientation
Wikipediaprovidesthefollowingdefinitionforobject­oriented
programming:
Object­orientedprogramming(OOP)isaprogrammingparadigmthat
describetheobject)andassociatedproceduresknownasmethods.Objects,
representsconceptsas“objects”thathavedataf ields (attributesthat
whichareusuallyinstancesofclasses,areusedtointeractwithoneanother
todesignapplicationsandcomputerprograms. are
Thisalreadyprovidesthemaintechnicaltermsthat
Python alsousedinthe
worldforclassesandobjectsandthatwillbemadeclearerinthe
remainderoft his section.
BasicsofPythonClasses
Wes t a r t bydefininganewclass(ofobjects).Tot h i s end,usethestatement
followingcodedefines anew
class, whichisappliedlikeaPython named
def statement for ExampleOne.This class
class functiondefinitions.The
doesnothingbut“exist.”The
says—itpassesanddoesnothing: pass commandsimplydoeswhatitsname
In [1]: classpassExampleOne(object):
However,theexistenceoftheclass ExampleOne
instancesoftheclassasnew Python objects: allowsustogenerate
In [2]: c = ExampleOne()
Inaddition,sincet his classinheritsfromthegeneral object clas , it
alreadyhassomebatteriesincluded.Forexample,thefollowingprovides
thestringrepresentationofthenewlygeneratedobjectbasedonourclass:
In [3]: c.__str__()
Out[3]: '<__main__.ExampleOne object at 0x7f8fcc28ef10>'
Wecanalsouse typetolearnabout
instanceoftheclass ExampleOne: thetypeoftheobject—int h i s case,an
In [4]: type(c)
Out[4]: __main__.ExampleOne
wedefineaspecialmethodcalled
Letusnowdefineaclassthathastwoainittthatiributsesautomaticallyinvokedat
, say,aandb.TothPython
everyinstantiationoftheclass.Notethattheobjectitself—i.e.,by is end,
convention, self—is alsoaparameter of this function:
In [5]: classExampleTwo(object):def_ init_ (self,a, b):
self.b = b
self.a=a
Instantiatingthenewclass ExampleTwo nowtakestwovalues,onefor
attributeaandoneforattributeb.Noteintheprecedingdefinitionthat
theseattributes are referencedinternally(
self.aandself.b, respectively: i . e . , intheclassdefinition)by
In [6]: c = ExampleTwo(1, 'text')
Similarly,wecanaccessthevaluesoftheattributesoftheobject
follows: c as
In [7]: c.a
Out[7]: 1
In [8]: c.b
Out[8]: 'text'
Wecanalsooverwriteouri
theat ributes: nitial valuesbysimplyassigningnewvaluesto
In [9]: c.a = 100
In[10]: c.a
Out[10]: 100
Python is quiteflexiblewhenit comes to theuseofclassesandobjects.Foras
example,attributesofanobjectcanbe
thefollowingexampleil ustrates: definedevenafter instantiation,
In [11]: c = ExampleOne()
In [12]: c.first_name
c.last_name =='Bourne'
'Jason'
c.movies = 4
In [13]: print c.first_name, c.last_name, c.movies
Out[13]: Jason Bourne 4
case,theclass ExampleThree is thesameasExampleTwoapart fromthehis
Theclassdefinitionthatfollowsintroducesmethodsforclasses.Int
factthatthereis adefinitionforacustommethod, addition:
In [14]: classdefExampleThree(object):
__init__(self, a, b):
self.a
self.b ==ba
def return
addition(self):
self.a + self.b
attributes a and b:
Instantiationworksasbefore.Thistimeweuseonlyintegersforthe
In[15]: c =ExampleThree(10, 15)
Acvaluesal ofthemethod addition thenreturnsthesumofthetwoattribute
(as long as it isdefined,giventhetypesofthea t ributes):
In [16]: c.addition()
Out[16]: 25
In [17]: c.addition()
c.a += 10
Out[17]: 35
Oneoftheadvantagesoftheobject­orientedprogrammingparadigmi
As
ExampleThreepointedout,theclassdefinitionsfor
reusability. ExampleTwo and
areonlydifferentwithrespecttothecustommethod s
usetheclass ExampleTwo andtoinheritfromtExampleThree
definition.Anotherwayofdefiningtheclass h i s isthereforeto
classthedefinitionofthe
specialfunction init:
In [18]: classdefExampleFour(ExampleTwo):
addition(self):
return self.a + self.b
thatofinstancesofclass ExampleThree:
Thebehaviorofinstancesofclass ExampleFour is nowexactlythesameas
In [19]: c =ExampleFour(10, 15)
In[20]: c.addition()
Out[20]: 25
Python allowsformultipleinheritances.However,oneshouldbecareful
withregardtoreadabilityandmaintainability,especiallybyothers:
In [21]: classdefmultiplication(self):
ExampleFive(ExampleFour):
returnself.a *self.b
In [22]: c = ExampleFive(10, 15)
In [23]: c.addition()
Out[23]: 25
In [24]: c.multiplication()
Out[24]: 150
Forexample,custommethoddefinitionsdonotnecessarilyneedtobe
andaslongasthey areinthe globalnamespace,theycanbeused
includedintheclassdefinitioni tself. Theycanbeplacedsomewhereewithinalse,
classdefinition.Thefollowingcodeil ustrates this approach:
In [25]: def multiplication(self):
return self.a * self.b
In [26]: classmultiplication
ExampleSix(ExampleFour):
= multiplication
as theinstanceoftheearlier class ExampleFive:
Andagain,theinstanceoftheclass ExampleSix behavesexactlythesame
In [27]: c =ExampleSix(10, 15)
In [28]: c.addition()
Out[28]: 25
In [29]: c.multiplication()
Out[29]: 150
It mightbehelpfultohave(class/object)privatea t ributes. Theseare
generallyindicatedbyoneortwoleadingunderscores,asthefollowing
classdefinitioni l ustrates:
In [30]: class ExampleSeven(object):
def __init__(self,
self.a=
self.b= a a, b):
b = a+b
self.__sum
multiplication
def return = multiplication
addition(self):
self.__sum
Thebehaviori
addition: s thesameasbeforewheni t comestoac a l ofthemethod
In [31]: c =ExampleSeven(10, 15)
In [32]: c.addition()
Out[32]: 25
followingsyntax,i
Here,youcannotdirectlyaccesstheprivateattribute
t is til possible: sum. However,viathe
In [33]: c._ExampleSeven__sum
Out[33]: 25
Astheclass ExampleSeven i s defined,onemustbecarefulwiththeinner
workings.Forexample,achangeofanattributevaluedoesnotchangethe
resultofthe addition methodcal :
In [34]: c.ac.a += 10
Out[34]: 20
In [35]: c.addition()
Out[35]: 25
This,ofcourse,is duetothefactthattheprivateattributeis notupdated:
In [36]: c._ExampleSeven__sum
Out[36]: 25
Callingthemultiplicationmethod,however,worksasdesired:
In[37]: c.multiplication()
Out[37]: 300
Toconcludetheintroductionintothe main conceptsof Python classesand
objects,wewanttopickoneotherspecialmethodofimportance:the iter
method.Itiscalledwheneveraniterationoveraninstanceofaclassi
askedfor. Tobeginwith,definealistoffirst namesasfollows: s
In [38]: name_list = ['Sandra', 'Lilli', 'Guido', 'Zorro', 'Henry']
Python it is usualtoi
Inofintegercounters terate oversuchlists directly—i.e.,withouttheuse
or indexes:
In [39]: for name
printinnamename_list:
Out[39]: Sandra
Lilli
Guido
Zorro
Henry
WearenowgoingtodefineanewPythonclassthatalsoreturnsvalues
froma
fromthellisti,stbutthelistissortedbeforetheiteratorstartsreturningvalues
. Theclass sorted_list containsthefollowingdefinitions:
initsorta t instantiation. elements weexpecta list object,whichwe
Toinitializetheattribute
iterdefinitionofanextmethod.
Thisspecialmethodiscalledwheneveraniterationisdesired;itneedsa
nextThismethoddefineswhathappensperiterationstep;itstartsatindex
-1 andincreasesthevalueby1perc
value self.position =elements
thenreturnsthevalueof atthecurrentindexvalueofal ; it
self.position.
Theclassdefinitionlookslikethis:
In [40]: classdef__init__(self,
sorted_list(object):elements):
self.elements = sorted(elements) # sorted list
object def __iter__(self):
self.position = -1
def returnself
if self.position == len(self.elements) - 1:
next(self):
raise StopIteration
self.position += 1
return self.elements[self.position]
Instantiatetheclassnowwiththe name_list object:
In [41]: sorted_name_list = sorted_list(name_list)
Theoutcome isas desired—iteratingoverthenewobjectreturnsthe
elementsinalphabeticalorder:
In [42]: for name
printinnamesorted_name_list:
Out[42]: Guido
Henry
Lilli
Sandra
Zorro
In principle,wehavereplicateda cal of thefunction sorted, whichtakes
asinputa list objectandreturnsasoutputa list object:
In [43]: type(sorted(name_list))
Out[43]: list
In [44]: for name
printinnamesorted(name_list):
Out[44]: Guido
Henry
Lilli
Sandra
Zorro
Ourapproach,however,worksonacompletelynewtypeofobject—
namely,a sorted_list:
In [45]: type(sorted_name_list)
Out[45]: __main__.sorted_list
objectorientationinPython.Inthefollowingdiscussion,theseconceptsare
iThisconcludestheratherconciseintroductionintoselectedconceptsof
l ustrated byintroductoryfinancialusecases.Inaddition,PartII makes
extensiveuseofobject­orientedprogrammingtoimplementaderivatives
analyticsl ibrary.
SimpleShortRateClass
of themostfundamentalconcepts in financeis discounting.Since it is
Oneconstantshortrateworldwithcontinuousdiscounting,thefactorto
discountafuturecashflowdueatdatet>0tothepresentt=0isdefined
sofundamental,itmightjustify thedefinitionofadiscountingclass.Ina
by fir.st thefollowingfunctiondefinition,whichreturnsthediscount
Consider
factorforagivenfuturedateandavaluefortheconstantshortr
for
thata NumPy universalfunctionisvectorization:
exponentialfunctiontoallow a t e
usedinthefunctiondefinitionforthe. Note
In [46]: import numpy as np t):
def discount_factor(r,
'''Function to calculate a discount factor.
Parameters
==========
r :float
positive, constant short rate
t :float,
future arrayof floats
e.g. 0.5 means halfin fraction
date(s), a year fromof years;
now
Returns
=======
df :float
'''df=discount factor
np.exp(-r * t)universal function for vectorization
#use
return df of NumPy
1; i.e., “nodiscounting”oftoday’s cash flows.However,givenashort
are al equal
Figure13­1illustrateshowthediscountfactorsbehavefordifferentvalues
tofortheconstantshortrateoverfiveyears.Thefactorsfort=0
rateof10%andacashflowdue in fiveyears,thecashflowwould
discountedtoavalueslightlyabove0.6percurrencyunit( be
i.e., to60%).We
generatetheplotasfollows:
In [47]: import matplotlib.pyplot as plt
%matplotlibinline
In [48]: tfor= np.linspace(0,
rplt.plot(t, 5) 0.1]:
in[0.01, 0.05,
lw=1.5) plt.xlabel('years')discount_factor(r, t), label='r=%4.2f' % r,
plt.ylabel('discount factor')
plt.grid(True)
plt.legend(loc=0)
approach.Wecallit short_rate sincethisisthecentralentity/objectand
Forcomparison,nowletuslookattheclass­basedimplementation
thederivationofdiscountfactorsis accomplishedviaamethodcal :

Figure13­1.Discountfactorsfordifferentshortratesoverfive years
In [49]: class'''Classtomodela
short_rate(object):constant short rate object.
Parameters
==========
name:namestring
: floatof theconstant
ratepositive, object
short rate
Methods
=======returns discount factors
get_discount_factors : forgiven list/array
'''def of__init__(self,
dates/times (as year fractions)
name, rate):
self.name = name
self.rate=rate: list/array-like
'''time_list
def get_discount_factors(self,time_list):
time_list = np.array(time_list) '''
return np.exp(-self.rate * time_list)
Tostart with,define sr tobeaninstanceoftheclass short_rate:
In[50]: sr =short_rate('r', 0.05)
In [51]: sr.name, sr.rate
Out[51]: ('r', 0.05)
Togetdiscountfactorsfromthenewobject,atimel
is needed: ist withyearfractions
[52]: time_list = [0.0, 0.5, 1.0, 1.25, 1.75, 2.0] # in year
Infractions
In [53]: sr.get_discount_factors(time_list)
Out[53]: array([ 1. , 0.97530991, 0.95122942, 0.93941306,
0.91621887, 0.90483742])
Figure13­2).Themajordifferencei
Usingt his object,itisquitesimpletogenerateaplotasbefore(see
s thatwefirstupdatetheattributerate
andthenprovidethetimelist t tothemethod get_discount_factors:
In [54]: for sr.rate=
r in [0.025,r 0.05, 0.1, 0.15]:
plt.plot(t,label='r=%4.2f'
sr.get_discount_factors(t),
% sr.rate, lw=1.5)
plt.ylabel('discount factor')
plt.xlabel('years')
plt.grid(True)
plt.legend(loc=0)

Figure13­2.Discountfactorsfordifferentshortratesoverfive years
Generally,discountfactorsare“only”ameanstoanend.Forexample,you
mightwanttousethemtodiscountfuturecashflows.Withourshortrate
object,t
dates/timesoft
example,wherethereisanegativecashflowtodayandpositivecashflows
his isaneasyexercisewhenwehavethecashflowsandthe
heir occurrenceavailable.Considerthefollowingcashflow
fter one yearandtwoyears,respectively.Thiscouldbethecashflow
aprofileofaninvestmentopportunity:
sr.rate= 0.05= np.array([-100, 50, 75])
In [55]: cash_flows
time_list = [0.0, 1.0, 2.0]
away:
Withthetime_listobject,discountfactorsareonlyonemethodcall
InIn[56]:
[57]: disc_facts = sr.get_discount_factors(time_list)
Out[57]: array([ 1. , 0.95122942, 0.90483742])
Presentvaluesfora l cashflowsareobtainedbymultiplyingthediscount
factorsbythecashflows:
In [58]: #disc_facts
present values
* cash_flows
Out[58]: array([-100. , 47.56147123, 67.86280635])
Atypicaldecisionruleininvestmenttheorysaysthatadecisionmaker
shouldinvestintoaprojectwheneverthenetpresentvalue(NPV),givena
certain(short)raterepresentingtheopportunitycostsoftheinvestment,is
positive.Inourcase,theNPVis simplythesumofthesinglepresent
values:
In [59]: #np.sum(disc_facts
netpresent value* cash_flows)
Out[59]: 15.424277577732667
Obviously,forashortrate of 5%theinvestmentshouldbemade.What
aboutarateof15%?ThentheNPVbecomesnegative,andtheinvestment
shouldnotbemade:
In [60]: sr.rate =0.15
np.sum(sr.get_discount_factors(time_list) * cash_flows)
Out[60]: -1.4032346276182679
CashFlowSeriesClass
Withtheexperiencegainedthroughthepreviousexample,thedefinitionof
anotherclasstomodelacashflowseriesshouldbestraightforward.This
classshouldprovidemethodstogivebackalist/arrayofpresentvaluesand
valuesanddates/times:
alsothenetpresentvalueforagivencashflowseries—i.e.,cashflow
In [61]: class'''cash_flow_series(object):
Class to modelacash flow series.
Attributes
==========
name:namestring
time_list:of thelist/array-like
object
listof:(positive)
cash_flows year fractions
list/array-like
corresponding
short_rate : list
instance of
of cashflow class
short_rate values
short rate object used for discounting
Methods
returnsan array: with present values
present_value_list
=======
net_present_value
returns NPV for: cash flow series
'''def __init__(self, name, time_list, cash_flows,
short_rate): self.name = name
self.time_list
self.cash_flows ==time_list
cash_flows
self.short_rate = short_rate
def present_value_list(self):
dfreturn= np.array(self.cash_flows) * df
self.short_rate.get_discount_factors(self.time_list)
def net_present_value(self):return
np.sum(self.present_value_list())
Weuseal objectsfromthepreviousexampletoinstantiatetheclass:
In [62]: sr.rate = 0.05
cfs= cash_flow_series('cfs', time_list, cash_flows, sr)
In [63]: cfs.cash_flows
Out[63]: array([-100, 50, 75])
In [64]: cfs.time_list
Out[64]: [0.0, 1.0, 2.0]
WecannowcomparethepresentvaluesandtheNPVwiththeresultsfrom
before.Fortunately,wegetthesameresults:
In [65]: cfs.present_value_list()
Out[65]: array([-100. , 47.56147123, 67.86280635])
In [66]: cfs.net_present_value()
Out[66]: 15.424277577732667
Thereisfurtherpotentialtogeneralizethestepsofthepreviousexample.
Oneoption i s todefine anew classthatprovidesamethodforcalculating
theNPVfordifferentshortrates—i.e.,as e n s i
course,the cash_flow_series classtoinheritfrom: t i v i t y analysis.Weuse, of
In [67]: classdefnpv_sensitivity(self,short_rates):
cfs_sensitivity(cash_flow_series):
for rate in short_rates:
npvs=[]
sr.rate =rate
npvs.append(self.net_present_value())
return np.array(npvs)
In [68]: cfs_sens = cfs_sensitivity('cfs', time_list, cash_flows, sr)
Forexample,definingal i
comparetheresultingNPVs:s t containingdifferentshortr a t e s , wecaneasily
In [69]: short_rates = [0.01, 0.025, 0.05, 0.075, 0.1, 0.125, 0.15,
0.2]
In [70]: npvs
npvs = cfs_sens.npv_sensitivity(short_rates)
Out[70]:
10.94027255,array([ 23.01739219, 20.10770244, 15.42427758,
-8.78945889]) 6.64667738, 2.53490386, -1.40323463,
showsthecutoffpointbetweenaprofitableinvestmentandonethatshould
Figure13­3showstheres ultgraphically.Thethickerhorizontalline(at 0)
bedismissedgiventherespective(short)rate:
In[71]: plt.plot(short_rates,
plt.plot(short_rates, npvs,
npvs, 'b')
'ro')(0, 0), 'r', lw=2)
plt.plot((0, max(short_rates)),
plt.grid(True)
plt.xlabel('short rate')value')
plt.ylabel('net present
Figure13­3.Netpresentvaluesofcashflowlist fordifferentshortrates
GraphicalUserInterfaces
Forthemajorityofcomputerusers,ascomparedtodevelopersordata
scientists,agraphicaluserinterface(GUI)iswhattheyareusedt
approacheslikeinteractivescripting,oruseofacommandlineinterface
GUIdoesnotonlybringalongvisualappealandsimplicity;i
ustoguideandcontroluserinteractionmuchbetterthanalternative o. Sucha
t alsoallows
buildsimpleGUIsforourshorttraits
TobuildtheGUIsweusethe library, documentationofwhich or
rate andcashflowseriesclasses.
shell.Inwhatfollows,webuildontheexamplesoftheprevioussectionand
can findat http://code.enthought.com/projects/traits/docs/html/index.html.you
traits isgenerallyusedforrapidGUIbuildingontopofexistingclasses
andonlyseldomformorecomplexapplications.Inwhatfollows,wewill
reimplementthetwoexampleclassesfrombefore,takingintoaccountthat
wewanttouseaGUIforinteractingwithinstancesoftherespective
classes.
ShortRateClasswithGUI
Tostart, weneedtoimportthe traits.api sublibrary:
In [72]: import
import numpyas
traits.apinpas trapi
Forthedefinitionofournew short_rate class,weusethe HasTraitstraits
classtoinheritfrom.Alsonoteinthefollowingclassdefinitionthat
hasi t s owndatatypes,whicharegenerallycloselyintertwinedwithvisual
elementsofaGUI—toputi traits knowswhichgraphical
elements(e.g., foratextfieltddifferently,
) to usetobuildaGUI(semi)automatically:
In [73]: classname=trapi.Str
rateshort_rate(trapi.HasTraits):
= trapi.Float
time_list = trapi.Array(dtype=np.float,
def get_discount_factors(self):
returnnp.exp(-self.rate shape=(5,))
* self.time_list)
Instantiationofsucha traits­based classis doneasusual:
In [74]: sr = short_rate()
However,viaac
HasTraits) a
aGUIil ofthemethod
s configure_traits (inheritedfrom
automaticallygenerated,andwecanusethisGUI to
inputvaluesfortheattributesofthenewobjectsr:
In [75]: sr.configure_traits()
Figure13­4showssuchasimpleGUI,whichinthiscasei
noinputvalueshavebeenputinthedifferentf i e l d s ) . s s t i l empty(
Notethatthelower
al belongto“Timelist”—thislayoutis generatedbydefault. i . e . ,
fivefields
Figure13­5showsthesamesimpleGUI,thistimehoweverwithvaluesin
everysinglefield. Pushingthe OK buttonassignsthevaluesfromtheinput
fieldstotherespectiveattributesoftheobject.
Figure13­4.Screenshotoftraits GUI(empty)
Figure13­5.Screenshotoftraits GUI(withdata)
Inef ect, this givesthesameresultsasthefollowinglinesofcode:
In [76]: sr.rate
sr.name == 0.05
'sr_class'
sr.time_list =[0.0, 0.5, 1.0, 1.5, 2.0]
Byprovidingthe traits­specific datatypes, traits is abletogeneratethe
correctvisualelementstoaccomplishtheseoperationsviaaGUI—i.e.a
Thebehaviorofthesr.name
textinputfieldfor
sr.time_list. andfiveinputelementsforthe list object
new objectaftertheinput
operationsi s thesameas
withour short_rate fromtheprevioussection:
In [77]: sr.rate
Out[77]: 0.05
In [78]: sr.time_list
Out[78]: array([ 0. , 0.5, 1. , 1.5, 2. ])
In[79]: sr.get_discount_factors()
array([
0.90483742]) 1. , 0.97530991, 0.95122942, 0.92774349,
Out[79]:
UpdatingofValues
Sofar, thenew short_rate classusing traits allowsustoinputdatafor
inputdataviaaGUI andesultthenmakingtheuseraccesstheresultsvia
interactivescripting.Tot
alsousedtopresentr
initializing attributesofaninstanceoftheclass.However,aGUIusuallyi
hsi.sYouwouldgenerallywanttoavoidproviding
end,weneedanothersublibrary, s
traitsui.api:
In [80]: import
import traits.api
traitsui.apiasastrapitrui
Thissublibraryallowsustogeneratedifferentviewsonthesame
class/object.Italsoprovidesmoreoptionsfor,e.g.,labelingandformatting.
ThekeyinthefollowingclassdefinitioniswhathappenswhentheUpdate
button is pushed.Inthis case,theprivatemethod \_update\_fired ihsis is
updatedlististhendisplayedintheGUIwindow.Aprerequisitefort
called,whichupdatesthelistobjectcontainingthediscountfactors.This
thatal inputparametershavebeenmadeavailablebytheuser:
In [81]: classnameshort_rate(trapi.HasTraits):
= trapi.Str
rate=
update trapi.Float
time_list
disc_list= trapi.Button
= trapi.Array(dtype=np.float, shape=(1,5))
def _update_fired(self):
self.disc_list = np.exp(-self.rate * self.time_list)
v = trui.View(trui.Group(trui.Item(name
trui.Item(name='rate'), = 'name'),
trui.Item(name='time_list',
label='Insertshow_label=False),
Time List Here'),
trui.Item('update',
trui.Item(name='disc_list',
label='PressUpdate
show_border=True, for Factors'),
label='Calculate Discount
Factors'), buttons
resizable= [trui.OKButton,
=True) trui.CancelButton],
Again,instantiationandconfigurationareachievedasbefore:
In [82]: sr = short_rate()
In [83]: sr.configure_traits()
Figure13­6showsthenew,enhancedGUI,whichi s t i l empty.Yousee
discountfactors. Update buttonandtheoutputfieldsforthe
thenewelements,likethe

Figure13­6.Screenshotoftraits GUIwithupdating(empty)
Figure13­7i l u s t r a t e s whathappenswitht h i
Providingvaluesfortheobjectattributesandpushingthe s newGUI“inaction.”
Update button
returnsthecalculateddiscountfactors—thistimewithintheGUIwindow.
Thefollowing Python codeshowsstep­by­steptheequivalentoperations
withoutaGUI.First,theassigningofvaluestotheat ributes:
In [84]: sr.rate
sr.name == 0.05
'sr_class'
sr.time_list = np.array(([0.0, 0.5, 1.0, 1.5, 2.0],),
dtype=np.float32)
Second,theupdateofthe list objectcontainingthediscountfactors:
In [85]: sr._update_fired()

Figure13­7.Screenshotoftraits GUIwithupdating(afterupdate)
Finally,theoutputofthecalculated/updated list withthediscountfactors:
In [86]: sr.disc_list
Out[86]:
0.90483742array([[ 1. , 0.97530991, 0.95122942, 0.92774349,
]])
CashFlowSeriesClasswithGUI
Thelast exampleinthissectionisabout the cash_flow_series class.In
principle,wehaveseeninthepreviousexamplethebasicworkingsof
traits wheni t comestopresentingresultswithinaGUIwindow.Here,we
onlywanttoaddsometwiststothestory:forexample,aslidertoeasily
changethevaluefortheshortr
isaccomplishedbyusingthe a t e . Intheclassdefinitionthatfollows,t
more
Range function,whereweprovideaoutput
minimum,amaximum,andadefaultvalue.Therearealso
fields h i s
value: toaccountforthecalculationofthepresentvaluesandthenetpresent
In [87]: classname=
cash_flow_series(trapi.HasTraits):
short_ratetrapi.Str
time_list
cash_flows
disc_values
present_values 0.5, 0.05)shape=(1,6))
==trapi.Array(dtype=np.float,
=trapi.Array(dtype=np.float,
trapi.Range(0.0,
trapi.Array(dtype=np.float,
= trapi.Array(dtype=np.float,
shape=(1,6))
shape=(1,6))
shape=(1,
6)) update net_present_value
= trapi.Button= trapi.Float
self.time_list)def _update_fired(self):
self.disc_values = np.exp(-self.short_rate *
self.present_values = self.disc_values *
self.cash_flows self.net_present_value = np.sum(self.present_values)
v = trui.View(trui.Group(trui.Item(name = 'name'),
trui.Item(name='time_list', label='Time
trui.Item(name='short_rate'),
Flows'), trui.Item(name='cash_flows', label='CashList'),
trui.Item('update', show_label=False),
trui.Item(name='disc_values',
label='Discount Factors'),
trui.Item(name='present_values',
label='Present Values'),
trui.Item(name='net_present_value',
label='NetPresent
show_border=True, Value'),Present
label='Calculate
Values'), buttons = [trui.OKButton,
resizable = True) trui.CancelButton],
Apartfromtheslightlymorecomplexclassdefinition,theusageis stil the
same:
In [88]: cfs = cash_flow_series()
In [89]: cfs.configure_traits()
presentvalues,andthenetpresentvalue.
empty).Noticetheslideranda
Figure13­8showsthenewGUIwithoutanyactionstakensofar(
l thenewfieldsforthecashflowvalues,the
i.e.,
already,butnootheraction has takenplace.
Figure13­9showsaversionoftheGUIwhereinputdatahasbeentypedin
Finally,Figure13­10presentstheGUIwithbothinputdataandresultsdata
—i.e.,a fter pushingthe Update button.Althoughthis is stil quiteasimple
example,theresultcanalmostbeconsideredanapplication.Wehave:
InputTheGUIallowsforinputtingdatatoinitialize al objectat ributes.
LogicThereis applicationlogicthatcalculatesdiscountfactors,present
values,andanNPV.
OutputTheGUIpresentstheresultsofapplyingthelogictotheinputdata.
Figure13­8.Screenshot of traits GUIforCashFlowSeries(empty)
Figure13­9.Screenshotoftraits GUIforCashFlowSeries(withinputdata)
Figure13­10.Screenshotoftraits GUIforCashFlowSeries(withresults)
Conclusions
Object­orientedparadigmsareanindispensibletoolformodernapplication
development. Python providesaratherflexibleframeworkforthe
definitionofcustomer­definedclassesandforworkingwithinstancesof
theseclasses.Thischapterprovidesonlythefundamentalsof
definitions—PartIIIofthebooki Pythonclass
tes theuseof(financial) Python
classesinamorecomplexandreallisutsitcraapplicationscenario.
Modernapplicationdesigngenerallybuildsongraphicaluser interfaces.
TheefficientbuildingofGUIsthereforei s generallyquiteimportant,even
linarapidapplicationdevelopmentscenario.Thischapterusesthetraits
ibrary, whichallowssimpleandefficientbuildingofGUIsbasedona
Pythonic, object­orientedapproach.Thesubsequentchaptershowshowto
buildGUIsbasedonwebtechnologies,atechnicalalternativenowadays
evenusedforin­houseapplicationsinfinancialinstitutions.
FurtherReading
Thefollowingwebresourcesaregoodstartingpointsfor Python classes
andobjectorientation,andfor traits:
ThePythonclas documentation:ht ps:/ docs.python.org/2/tutorial/clas es.html
Thetraitsdocumentation:ht p:/ code.enthought.com/projects/traits/docs/html/index.html
Helpfulresourcesinbookformare:
Downey,Allen(2012):ThinkPython.O’Reilly,Sebastopol,CA.
Goodrich,Michaelet al. (2013):DataStructuresandAlgorithmsin
Python.JohnWiley&Sons,Hoboken,NJ.
Langtangen,HansPetter(2009):APrimeronScientificProgramming
withPython.SpringerVerlag,Berlin,Heidelberg.
Chapter14.WebIntegration
Ihavebeenquotedsayingt
companies.Is t i l believet h hta.t,Morethanever,really.
a inthefuture,al companieswillbeInternet
—AndrewGrove
TheInternet,ortheWeb,hasevolvedfromsomeseparateworldinto
somethingthatiseverywhereandineverything.I t hasbecomeatechnology
platformenablingamultitudeofdifferentusecases.Fromafinance
perspective,thefollowingseemparticularlynoteworthy:
Dataprovision/gathering
inasimplifiedmannerandgenerally aoft reducedcosts;italsospeedsup
Webtechnologyallowstheprovision data and thegatheringthereof
technologiestoprovidethefinancialworldwithdatainrealtime.
BloombergandThomsonReuters,relyheavilyontheWebandrelated
ingeneralallassociatedprocesses.Largefinancialdataproviders,like
Trading/buying/selling
(UsingtheWebalsof acilitateaccessto
privateinvestorstodayhave s tradingoffinancialsecurities;even
e.g., onlinebrokerslikeInteractiveBrokers)andcantradesecuritiesin
professionaltradingfacilities
realtime.
Applicationproviding
ModelslikeSoftware­as­a­Service(SaaS)allowbothsmallcompanies,
likestartups,andlargeonestoprovideapplicationsinanefficient
audienceatverylfor it le cost.Largecorporationsbenefitfromweb
manner;eventhesmallestoutfitcantodayreachaglobaltarget
technologies,
applicationsthat areexample,whentheyusethemtoprovideinternal
accessible and usableviaanystandard web
browser,insteadofinstallingsuchapplicationsonhundredsoreven
thousandsofdifferentmachines.
Communication
Ofcourse,theWebfacilitates communicationwithinorganizationsand
acrossorganizations;themajorityoftoday’sbusinessandfinancial
communicationhasmovedfrompapertotheWeb.
webservers and serversingeneralacommoditythateverybody
Commoditization/scalability
Recentwebtechnologiesalsoallowforbettervirtualization,making
requirementschange;computingpowerandstoragecapacitybecome
at ratherlowvariablecostsandthatis easilyscalablewhen canrent
gettingfromtheplugsocketsa
moreandmorecomparabletoetlhome. ectricity, whichweareal usedto
covers:Python forthe Webis abroadtopicinihis chapterisabletocovera
Again,
byasinglechapterinthisbook.However,t tself thatcannotbecovered
numberofimportanttopicsfromafinanceperspective.Inparticular,it
Webprotocols
Thefirstsectionshowshowtotransferf iles via FTP andhowtoaccess
websitesvia HTTP.
Webplotting
Webtechnologiesgenerallyallowforbetterinteractivityandforbetter
real­timesupportthanstandardapproaches,forexample,forplotting
data;thesecondsectionintroducestheplottinglibraryBokehto
financialdata.
generateinteractivewebplotsandtorealizereal­timeplottingof
Webapplications
Oneof Python’s strengthsis its powerfulwebframeworkstodevelop
web­basedapplications;onethati s really Pythonic andthathas
becomequitepopularrecentlyi s Flask. Thischapteril ushtirsates
techniquesfordevelopingweb­basedapplicationsusingt
framework.
Webservices
Webserviceshavebecomeanimportantaspectofweb­enabled
service
applications;thelastsectionshowshowtodevelopasimpleweb
index. forthevaluationofEuropeanoptionsontheVSTOXXvolatility
WebBasics
Thissectiongivesaratherbriefoverviewofselected Python librariesfor
workingwithwebtechnologiesandprotocols.Severaltopics,likethe
handlingofemailfunctionalitywith Python, are nottouchedupon.
ftplibFile
Theworkwith Transfer Protocol (FTP)i s , as thenamesuggests,aprotocol
totransferfFTPilescalled ftplib: Python providesadedicatedlibraryto
overtheWeb.[50]
In [1]: import
import ftplib
numpy as np
Inwhatfollows,wewillconnecttoan
server.First,theconnection: FTP server,login, transferafile to
theserver,transferit backtothelocalmachine,anddeletethefileonthe
In [2]: ftp = ftplib.FTP('quant-platform.com')
Notevery FTP serveris passwordprotected,butthis oneis:
In [3]: ftp.login(user='python', passwd='python')
Out[3]: '230 Login successful.'
wecan we
withsomerandomdataandsaveit todisk:generatea NumPyndarray object
Tohaveafilethat transfer,
In100)))[4]: np.save('./data/array', np.random.standard_normal((100,
Forthe FTP filetransfertofollow,wehavetoopenthefileforreading:
In [5]: f =open('./data/array.npy', 'r')
STOR
Thisopenf ile can now bewritten,choosing here binarytransfer,bythe
commandincombinationwiththetargetfilename:
In [6]: ftp.storbinary('STOR array.npy', f)
Out[6]: '226 Transfer complete.'
transferred: t thedirectoryofthe FTP server.Indeed,thefile was
Letushavealooka
In [7]: ftp.retrlines('LIST')
Out[7]:
array.npy-rw------- 1 1001 1001 80080 Sep 29 11:05
'226 Directory send OK.'
Theotherwayaroundisprettysimilar. To retrieveadistantfileandtosave
it todisk,weneedtoopenanewfile, this timein write mode:
In [8]: f = open('./data/array_ftp.npy', 'wb').write
Again,wechoosebinaryt r a n s f e
retrievingthefile fromthe FTP server: r , andweusethe RETR commandfor
In [9]: ftp.retrbinary('RETR array.npy', f)
Out[9]: '226 Transfer complete.'
ile ontheserveranymore, we candeleteit:
In [10]: ftp.delete('array.npy')
Sincewedonotneedthef
Out[10]: '250 Delete operation successful.'
In [11]: ftp.retrlines('LIST')
Out[11]: '226 Directory send OK.'
Finally,weshouldclosetheconnectiontothe FTP server:
In [12]: ftp.close()
Inthelocaldirectorytherearenowtwofiles, theonethatwasgenerated
locallyandtheonegeneratedbyretrievingthefile fromtheserver:
In [13]: !ls -n ./data
<<<<<<< HEAD-rw-------
Out[13]: insgesamt 1561 1000 1000 77824 Sep 15 08:14 array_ftp.npy
-rw------- 1 1000 1000 80080 Sep 15 08:14 array.npy
Out[13]: insgesamt
======= -rw------- 1561 1000 1000 77824 Sep 29 17:05 array_ftp.npy
-rw-------11000100080080Sep2917:05 array.npy
>>>>>>> 798603793467fffcd06a9df88edf091e339dec37
In[14]: !rm#cleanup
-f ./data/arr*
directory
farwas donewithoutencryption(i.e.,wasfully
insecure).Bothlogininformationanddataweretransferredinreadable
Allthathashappenedso
form.However,formostapplicationssuchoperationsshouldbeencrypted
soothersarenotabletoreadthedataand/orstealthelogininformationand
doevenworsethings.
ftplib
same: canconnectto FTP serverssecurelyviathefunction
suchasecureconnectionisestablished,a FTP_TLS. Once
l otheroperationsremainthe
In [15]: ftps = ftplib.FTP_TLS('quant-platform.com')
In [16]: ftps.login(user='python', passwd='python')
Out[16]: '230 Login successful.'
In [17]: ftps.prot_p()
Out[17]: '200 PROT now Private.'
In [18]: ftps.retrlines('LIST')
Out[18]: '226 Directory send OK.'
In [19]: ftps.close()
httplib Transfer Protocol
whenevera(HTML­based)webpagei
Anotherimportantprotocol,i
theHyperText HT P).[51] ThisprotocolisusedPython
f notthemostimportantoneontheWeb,is
s (displayedinthebrowser.The
librarytoworkwith HTTP is called httplib:
In [20]: import httplib
Aswith FTP, wefirst needaconnectiontothe HTTP server:
In [21]: http = httplib.HTTPConnection('hilpisch.com')
Oncetheconnectioni s established,wecansendrequests,forexample
askingfortheindex.htmpage(file):
In [22]: http.request('GET', '/index.htm')
Totest whetherthis wassuccessful,usethe getresponse method:
In [23]: resp = http.getresponse()
Thereturnedobjectprovidesstatusinformation.Fortunately,ourrequest
wassuccessful:
In [24]: resp.status, resp.reason
Out[24]: (200, 'OK')
Equippedwiththeresponseobject,wecannowreadthecontentasfollows:
In [25]: content[:100]
content = resp.read()
#first 100 characters of the file
Out[25]: '<!doctype html>\n<htmlYveslang="en">\n\n\t<head>\n\t\t<meta
charset="utf
8">\n\n\t\t<title>Dr. J. Hilpisch \xe2\x80'
Onceyouhavethecontentofaparticularwebpage,therearemany
potentialusecases.Youmightwanttolookupcertaininformation,for
example.Youmightknowthatyoucanfindtheemailaddressonthepage
bylookingforE(int h i s veryparticularcase).Since contentisastring
object,youcanapplythefindmethodtolookforE:[52]
In [26]: index
index = content.find(' E ')
Out[26]: 2071
Equippedwiththeindexvaluefortheinformationyouarelookingfor, you
caninspectthesubsequentcharactersoftheobject:
In [27]: content[index:index + 29]
Out[27]: ' E contact [at] dyjh [dot] de'
Onceyouarefinished,youshouldagainclosetheconnectiontotheserver:
In [28]: http.close()
urllib
protocols.I t iscalledPythonurllib.
Thereisanother librarythatsupportstheuseofdifferentweb
Thereis alsoarelatedlibrarycalled
urllib2. Bothl i b r a r i e s are
inthespiritofthe“uniform”inurllib:
theWeb.Beginbyimporting
standardusecase,forexample,i designed
to to workwitharbitrarywebresources,
URLs (uniformresourcelocator).[53]A
retrievefiles, like CSV datafiles, via
In [29]: import urllib
Theapplicationofthelibrary’sfunctionsresemblesthatofboth ftplib
andhttplib.Ofcourse,weneedaURLrepresentingthewebresourceof
ofYahoo!FinancetoretrievestockpriceinformationinCSVformat:
interest(HTTPorFTPserver,ingeneral).Forthisexample,weusetheURL
In [30]: url = 'http://ichart.finance.yahoo.com/table.csv?
g=d&ignore=.csv'
url +='&s=YHOO&a=01&b=1&c=2014&d=02&e=6&f=2014'
Next,onehastoestablishaconnectiontotheresource:
In [31]: connect = urllib.urlopen(url)
Withtheconnectionestablished,readoutthecontentbycallingthe
methodontheconnectionobject: read
In [32]: data = connect.read()
Theresultinthis caseis historicalstockpriceinformationforYahoo!itself:
In [33]: print data
Out[33]: Date,Open,High,Low,Close,Volume,Adj Close
2014-03-05,39.83,40.15,39.19,39.50,12536800,39.50
2014-03-06,39.60,39.98,39.50,39.66,10626700,39.66
2014-03-04,38.76,39.79,38.68,39.63,16139400,39.63
2014-03-03,37.65,38.66,37.43,38.25,14714700,38.25
2014-02-28,38.55,39.38,38.22,38.67,16957100,38.67
2014-02-27,37.80,38.48,37.74,38.47,15489400,38.47
2014-02-24,37.23,37.71,36.82,37.42,15738900,37.42
2014-02-25,37.48,37.58,37.02,37.26,9756900,37.26
2014-02-26,37.35,38.10,37.34,37.62,15778900,37.62
2014-02-18,38.31,38.59,38.09,38.31,12096400,38.31
2014-02-19,38.06,38.33,37.68,37.81,15851900,37.81
2014-02-20,37.83,38.04,37.30,37.79,11155900,37.79
2014-02-21,37.90,37.96,37.22,37.29,12351900,37.29
2014-02-14,38.43,38.45,38.11,38.23,9975800,38.23
2014-02-13,37.92,38.69,37.79,38.52,12088100,38.52
2014-02-12,38.60,38.91,38.03,38.11,14088500,38.11
2014-02-10,38.00,38.13,37.25,37.76,17642900,37.76
2014-02-11,38.15,38.86,38.09,38.50,18348000,38.50
2014-02-05,35.60,35.94,34.99,35.49,14022900,35.49
2014-02-06,35.65,36.75,35.61,36.24,14250000,36.24
2014-02-07,36.65,37.27,36.24,37.23,16178500,37.23
2014-02-04,35.11,35.86,34.86,35.66,21082500,35.66
2014-02-03,35.94,36.01,34.66,34.90,22195200,34.90
Thelibraryalsoprovidesconveniencefunctionstocustomize
Forexample,youmightwantto be URLstrings.
abletoparameterizethesymbol to look
upandthestartingdate.Tot his end,defineanewURLstringwithastring
replacementpartwhereyoucaninserttheparameters:
In[34]:url = 'http://ichart.finance.yahoo.com/table.csv?
g=d&ignore=.csv'
urlurl +=+= '&%s'#for replacement with parameters
'&d=06&e=30&f=2014'
Thefunction
parameternamesandthevalues to associate: Python dictionarywiththe
urlencode takesasanargumenta
In'c':[35]:2014})params =urllib.urlencode({'s': 'MSFT', 'a': '05', 'b': 1,
sult, thereis a stringt: objectthatcanbeinsertedintothepreceding
AsrURL estringtocompletei
In [36]: params
Out[36]: 'a=05&s=MSFT&b=1&c=2014'
In [37]: url % params
Out[37]: 'http://ichart.finance.yahoo.com/table.csv?
g=d&ignore=.csv&a=05&s=MSFT&
b=1&c=2014&d=06&e=30&f=2014'
Equippedwiththisnew
fromtheconnection: URL s t r i n g , establishaconnectionandreadthedata
In [38]: connect = urllib.urlopen(url % params)
In [39]: data = connect.read()
Theresultagaini
Microsoft: s stockpricedata,this timeformoredatesandfor
In [40]: print data
Out[40]: Date,Open,High,Low,Close,Volume,Adj Close
2014-07-29,43.91,44.09,43.64,43.89,27763100,43.62
2014-07-30,44.07,44.10,43.29,43.58,31921400,43.31
2014-07-25,44.30,44.66,44.30,44.50,26737700,44.22
2014-07-28,44.36,44.51,43.93,43.97,29684200,43.70
2014-07-24,44.93,45.00,44.32,44.40,30725300,44.12
2014-07-23,45.45,45.45,44.62,44.87,52362900,44.59
2014-07-22,45.00,45.15,44.59,44.83,43095800,44.55
2014-07-21,44.56,45.16,44.22,44.84,37604400,44.56
2014-07-18,44.65,44.84,44.25,44.69,43407500,44.41
2014-07-17,45.45,45.71,44.25,44.53,82180300,44.25
2014-07-16,42.51,44.31,42.48,44.08,63318000,43.81
2014-07-15,42.33,42.47,42.03,42.45,28748700,42.19
2014-07-14,42.22,42.45,42.04,42.14,21881100,41.88
2014-07-11,41.70,42.09,41.48,42.09,24083000,41.83
2014-07-10,41.37,42.00,41.05,41.69,21854700,41.43
2014-07-09,41.98,41.99,41.53,41.67,18445900,41.41
2014-07-08,41.87,42.00,41.61,41.78,31218200,41.52
2014-07-07,41.75,42.12,41.71,41.99,21952400,41.73
2014-07-03,41.91,41.99,41.56,41.80,15969300,41.54
2014-07-02,41.73,41.90,41.53,41.90,20208100,41.64
2014-06-27,41.61,42.29,41.51,42.25,74640000,41.99
2014-06-30,42.17,42.21,41.70,41.70,30805500,41.44
2014-07-01,41.86,42.15,41.69,41.87,26917000,41.61
2014-06-26,41.93,41.94,41.43,41.72,23604400,41.46
2014-06-24,41.83,41.94,41.56,41.75,26509100,41.49
2014-06-25,41.70,42.05,41.46,42.03,20049100,41.77
2014-06-23,41.73,42.00,41.69,41.99,18743900,41.73
2014-06-20,41.45,41.83,41.38,41.68,47764900,41.42
2014-06-19,41.57,41.77,41.33,41.51,19828200,41.25
2014-06-17,41.29,41.91,40.34,41.68,22518600,41.42
2014-06-18,41.61,41.74,41.18,41.65,27097000,41.39
2014-06-16,41.04,41.61,41.04,41.50,24205300,41.24
2014-06-12,40.81,40.88,40.29,40.58,29818900,40.33
2014-06-13,41.10,41.57,40.86,41.23,26310000,40.97
2014-06-10,41.03,41.16,40.86,41.11,15117700,40.85
2014-06-11,40.93,41.07,40.77,40.86,18040000,40.61
2014-06-09,41.39,41.48,41.02,41.27,15019200,41.01
2014-06-06,41.48,41.66,41.24,41.48,24060500,41.22
2014-06-05,40.59,41.25,40.40,41.21,31865200,40.95
2014-06-03,40.60,40.68,40.25,40.29,18068900,40.04
2014-06-04,40.21,40.37,39.86,40.32,23209000,40.07
2014-06-02,40.95,41.09,40.68,40.79,18504300,40.54
urlretrieve i t
inasinglestep,whichis quiteconvenientinmanycircumstances: todisk
Thefunction allowsustoretrievecontentandsave
In [41]: urllib.urlretrieve(url % params, './data/msft.csv')
Out[41]: ('./data/msft.csv', <httplib.HTTPMessage instance at
0x7f92ca59afc8>)
Abriefinspectionofthecontentofthesavedf i l e
indeedretrievedandsavedthesamecontentasbefore:showsthatwehave
In [42]: csvcsv.readlines()[:5]
= open('./data/msft.csv', 'r')
Out[42]: ['Date,Open,High,Low,Close,Volume,Adj Close\n',
'2014-07-30,44.07,44.10,43.29,43.58,31921400,43.31\n',
'2014-07-29,43.91,44.09,43.64,43.89,27763100,43.62\n',
'2014-07-28,44.36,44.51,43.93,43.97,29684200,43.70\n',
'2014-07-25,44.30,44.66,44.30,44.50,26737700,44.22\n']
In [43]: !rm -f ./data/*
WebPlotting
Python.
strengthl
Chapter5introduces matplotlib,actthemostpopularplottinglibraryfor
However,aspowerfulasi
ies instaticplotting.Inf , matplotlibis alsoabletogenerate
mightbefor2Dand3Dplotting,its
interactivep
iThissections l o t s , e . g . , withslidersforvariables.Butiti
s notoneofittasrttsrewithgeneratings s safetosaythatt
ngths.[54] tatic plots, thenproceedstointeractive h i s
plotstofinallyarriveatreal­timeplotting.
StaticPlots
irst, abriefbenchmarkexampleusingthe pandas librarybasedona
FfinancialtimeseriesfromtheYahoo!FinanceAPI,asusedintheprevious
section:
In [44]: import numpy
pandasasasnppd
%matplotlib inline
AsshowninChapter6,using pandas makesdataretrievalfromtheWebin
generalquiteconvenient.Wedonotevenhavetouseadditionall i b r a r i e s ,
suchas urllib—almost everythinghappensunderthehood.Thefollowing
retrieveshistoricalstockpricequotesforMicrosoftInc.andstoresthedata
ina DataFrame object:
In [45]: url = 'http://ichart.yahoo.com/table.csv?
s=MSFT&a=0&b=1&c=2009'
data = pd.read_csv(url, parse_dates=['Date'])
pandas acceptscolumnnamesasparametervaluesforthe
coordinates.Theresultis showninFigure14­1: x and y
In [46]: data.plot(x='Date', y='Close')

Figure14­1.HistoricalstockpricesforMicrosoftsinceJanuary2009
(matplotlib)
GraphicsandplotslikeFigure14­1canofcoursealsobeusedinaweb
context.Forexample,it is straightforwardtosaveplotsgeneratedwith
matplotlib asf i l e s inthePNG(Portable Network
toincludesuchfilesinawebsite.However,recentwebtechnologiesGraphics)format and
Bokehis alibrarythatexplicitlyaims
typicallyalsoprovidei at providing
nteractivity, likepanningorzooming.
basedplotsto Python. Accordingtoits website: modern,interactiveweb­
Bokeh isa Pythoninteractivevisualizationlibraryforlargedatasetsthat
conciseconstructionofnovelgraphicsinthes
nativelyusesthel tyle oftoProtovis/D3,while
atest webtechnologies.Its goalis provideelegant,
deliveringhigh­performanceinteractivityoverlargedatatothinclients.
Threeelementsofthisdescriptionarenoteworthy:
Largedatasets
scatterplotwith1,000,000points—ingeneral,largepartsofthe
It isa“plottingproblem”initself toplotlargedatasets.Justimaginea
informationgetlost; Bokeh providesbuilt­in helpinthis regard.
Latestwebtechnologies
D3 (Data-Driven
Ingeneral, and visualization;i
JavaScriptDocuments)
ascomestowebdevelopment andalsoof choiceas
is thelanguage of todaywheni
Bokeh.t underliesl ibraries sucht
High­performanceinteractivity
OntheWeb,peopleareusedtoreal­timeinteractivity(thinkmodern
can
browsergames),which becomeanissuewhenvisualizingand
interactingwithlargedatasets;Bokehalsoprovidesbuilt­incapabilities
Onafundamentallevel,workingwith
toreachthis goal. Bokeh is notthatdifferentfrom
workingwith matplotlib.
option).Itisaseparate However,thedefaultoutputgenerallyisnota
HTMLfile: IPython Notebook (whichisalso an
standardwindowor,forexample,an
In [47]: import bokeh.plotting as bp
In[48]:
(Static)")bp.output_file("../images/msft_1.html", title="Bokeh Example
## use: bp.output_notebook("default")
for output within an IPython Notebook
Inthatarecontinuouslyenhanced.
termsofplotting, Bokeh providesawealthofdifferentplottingstyles
To start withthesimplestone,considerthe
followingcodethatgeneratesalineplotsimilartoour pandas/matplotlib
benchmarkplot. Theresultis shownasFigure14­2.Apartfromthe x and y
coordinates,allotherparametersareoptional:
In[49]: bp.line( # x coordinates
data['Date'],
# y coordinatesfor
data['Close'],
color='#0 6 c ',color
#seta the line
legend='MSFT',
#attach a legendStocklabelQuotes',
title='Historical
#plottitle
x_axis_type='datetime',
) tools# datetimeinformation
='' on x-axis
bp.show()
Inthetraditionoftylematplotlib,Bokeh
differentplots s. alsohasagalleryshowcasing
Figure14­2.ScreenshotofHTML­basedBokehplot
InteractivePlots
Thenextstepistoaddinteractivitytotheweb­basedp
interactivityelements(“tools”)include: lot. Available
panSupportspanningoftheplot(like panningwithamoviecamera);i.e.,
plottingframe x and y coordinates)relativetothefixed
movingtheplot(including
wheel_zoom
Enableszoomingintotheplotbyusingthemousewheel
box_zoom
Enableszoomingintotheplotbymarkingaboxwiththemouse
resetResetstheoriginal/defaultview of theplot
previewsave
Generatesastatic(bitmap)version of theplotthatcanbesavedin PNG
format
Thefollowingcodedemonstratesaddingthesetools:
In [50]: bp.output_file("../images/msft_2.html",
title="BokehExample (Interactive)")
bp.line(
data['Date'],
color='#0066cc',
data['Close'],
title='Historical Stock Quotes',
legend='MSFT',
x_axis_type=
tools='pan, "datetime",
wheel_zoom,interactive
box_zoom,tools
reset, previewsave'
) # adding alistof
bp.show()
functionisused
Theoutputofthitomovetheplotwithintheplottingframe(comparet
s codeis shownasFigure14­3,wherethepanning his
withFigure14­2).
matplotlib.
Inprinciple,amatplotlib
aseparate(Python­controlled)window.However,incontrastto
4showsazoomedandpannedversionofthe
defaultwith lInfact,theinteractivetoolsshownforBokeh are available
Bokeh, by
pandas plotinFigure14­1in
thefeaturesshownsofarcanalsobeimplementedbyusing
whenyouplotintoaseparatewindow.Figure14­
matplotlib cannot“export”t
standalonegraphicsfile.[5 ] h i s functionalitytobeincludedinaseparate,
Figure14­3.ScreenshotofHTML­basedBokehplotwithinteractiveelements
Figure14­4.Screenshotofpandas/matplotlib­basedplotwithinteractive
elements
Real­TimePlots
Theprevioussubsectionshowshoweasyitistogenerateinteractive,web­
Bokeh. However,Bokehshineswhenitcomestoreal­timehis
subsectioncontainsexamplesfortwodifferentreal­timeAPIs,oneforFX
visualizationo
basedplotswithf, forexample,high­frequencyfinancialdata.Therefore,t
(foreignexchange)datain JSON (JavaScript Object
andoneforintradaytickdataforstockpricesdeliveredin
format.Apartfromthe Notation) format
CSV textfile
APIsis alsoofinterest. visualizationaspect,howtoreadoutdatafromsuch
Real­timeFXdata
irst: JSON APIfor, amongothers,FXrates.
Ourfirstexampleisbasedona
Someimportsf
In[51]: import
import time
pandas asaspddt
import datetime
import requests
TheAPIweusei s fromOANDA, an FXonlinebroker.Thisbrokeroffers
exchangerates.OurexampleisbasedontheEUR–USDexchangerate(
anAPIsandboxthatprovidesrandom/dummydatathatresemblesreal cf.
theAPIguide):
Ininstruments=%s'
[52]: url = 'http://api-sandbox.oanda.com/v1/prices?
# real-time FX (dummy!) data from JSON API
ToconnecttotheAPIweusethe requests librarywhoseaimi s to
improvetheinterfacefor“humans”wheninteractingwithwebresources:
In [53]: instrument = 'EUR_USD'
api = requests.get(url % instrument)
Withtheopenconnection,datainJSONformati
method json ontheconnectionobject: s simplyreadbycallingthe
In [54]: data
data = api.json()
Out[54]: {u'prices':
u'bid':1.2582,u'i[{u'ask': 1.25829,
nstrument':u'EUR_USD',
u'time': u'2014-09-29T06:14:34.749878Z'}]}
tohave.Therefore,wetransformi
Unfortunately,thedataisnotyetcompletelyintheformatwewouldlikei
t abit. Thefollowingcodetakesonlythet
firstelement of thelistobjectstoredunderthe
objectis astandard dict object: key “prices.”Theresulting
In [55]: data
data = data['prices'][0]
Out[55]: {u'ask':
u'bid': 1.25829,
1.2582,u'EUR_USD',
u'instrument':
u'time': u'2014-09-29T06:14:34.749878Z'}
Sincewecollectsuchsmalldatasetsa
DataFrame t ahighfrequency,weusea
objecttostoreallthedata.Thefollowingcodei n i t i a l i z e s an
appropriate DataFrame object:
In [56]: ticks = pd.DataFrame({'bid':
'ask': data['bid'],
data['ask'],
'instrument':
'time': data['instrument'],
pd.Timestamp(data['time'])},
# initialization of index=[pd.Timestamp(data['time']),])
ticks DataFrame
In [57]: ticks[['ask', 'bid', 'instrument']]
Out[57]: 2014-09-29 06:14:34.749878+00:00 1.25829ask 1.2582bid instrument
EUR_USD
Implementingareal­timeplotrequirestwothings:real­timedatacollection
andreal­timeupdatesofthep l o t . With Bokeh, t h i s i s accomplishedby
follows:
newdata.It hastobestartedviatheshellorcommand­lineinterfaceas
usingtheBokehserver,whichhandlesreal­timeupdatesofaplotgiven
$bokeh-server
Withtheserverrunningin the background,let usimplementthereal­time
dataupdateroutine:
In [58]: import bokeh.plottingimportas bpGlyph
from bokeh.objects
Beforeanyupdatingtakesplace,thereneedstobeanobjecttobeupdated.
Thisagainisalineplot—ifonlywithverylittledata a t first.Theoutput i s
factlocally via
to http://localhost:5006/: in
the IPython Notebook thecode is executedi
directedit is redirectedagaintotheserver,which can beaccessedin
thiscase n. However,
In [59]: bp.output_notebook("default")
bp.line(ticks['time'], ticks['bid'],
x_axis_type='datetime', legend=instrument)
Out[59]: Usingsaved
To override,session configurationfor http://localhost:5006/
pass 'load_from_config=False' to Session
<bokeh.objects.Plot at 0x7fdb7e1b2e10>
Weneedtogetaccesstoourcurrentplot( i . e . , themostrecentlygenerated
plot). Callingthefunction curplot returnstheobjectwearelookingfor:
In [60]: bp.curplot()
Out[60]: <bokeh.objects.Plot at 0x7fdb7e1b2e10>
SuchaPlotobjectconsists of anumberofrenderingobjectsthat
accomplishdifferentplottingtasks,likeplottinga Gridor plottingtheline
(=Glyph)representingthefinancialdata.Allrenderingobjectsarestoredin
a list attributecalled renderers:
In [61]: bp.curplot().renderers
Out[61]: [<bokeh.objects.DatetimeAxis
<bokeh.objects.Grid at at 0x7fdbaece6b50>,
0x7fdb7e161190>,
<bokeh.objects.LinearAxis
<bokeh.objects.Grid at at0x7fdb7e161090>,
0x7fdb7e1614d0>,
at
<bokeh.objects.BoxSelectionOverlay
<bokeh.objects.BoxSelectionOverlay 0x7fdb7e161490>,
at0x7fdb7e161650>,0x7fdb7e161550>,
at
<bokeh.objects.Legendat0x7fdb7e161610>]
<bokeh.objects.Glyph
Thefollowing list comprehensionreturnsthefirst renderingobjectof
type Glyph:
In [62]: renderer = [rifforisinstance(r,
rin bp.curplot().renderers
Glyph)][0]
glyph attributeoftheobjectcontainsthetypeofthe
Thethis case,asexpected,a Line object: Glyph object—in
In [63]: renderer.glyph
Out[63]: <bokeh.glyphs.Line at 0x7fdb7e161590>
Withtherenderingobject,wecanaccessits datasourcedirectly:
In[64]: renderer.data_source
Out[64]: <bokeh.objects.ColumnDataSource at 0x7fdb7e1b2ed0>
In [65]: renderer.data_source.data
Out[65]: {'x': 2014-09-29 06:14:34.749878+00:00 2014-09-29
06:14:34.749878+00
:00Name: time, dtype: object, 'y': 2014-09-29
2582
06:14:34.749878+00:00 1.
Name: bid, dtype: float64}
In [66]: ds = renderer.data_source
Thisis theobjectthatwewillworkwithandthatis tobeupdatedwhenever
newdataarrives.Thefollowing while looprunsforapredeterminedperiod
oftimeonly.Duringtheloop,anewrequestobjectisgeneratedandthe
JSON dataisread.Thenewdatai
object.The xandy coordinates oftherenderingobjectareupdatedandthen
s appendedtotheexisting DataFrame
storedtothecurrentsession:
In [67]: start
# run =(time.time()-start)
while fortime.time()
60seconds < 60:
data == requests.get(url
data# connect and read data% instrument).json()
dict(data['prices'][0])
ticks# transform data to dict object
= ticks.append(pd.DataFrame({'bid':
'ask': data['ask'],
data['bid'],
data['instrument'], 'instrument':
pd.Timestamp(data['time'])}, index= 'time':
[pd.Timestamp(data['time']),]))
# append DataFrame object with new data to existing
object ds.data['x'] =coordinates
ticks['time']in rendering object
#update x
#update y coordinatesin
ds.data['y'] =ticks['bid'] rendering object
bp.cursession().store_objects(ds)
#storedataobjects
time.sleep(0.1)
# wait for a bit
Figure14­5showstheoutputoftheplottingexercise—i.e.,as
ofareal­timep l o t . tatic snapshot
Thisapproachandtheunderlyingtechnologyofcourse
todayonreal­time,high­frequencydata,andfarbeyond.
havemanyinterestingapplicationareas,bothinfinance,withits focus
Figure14­5.Screenshotofreal­timeBokehplotviaBokehServer(exchange
rate)
Real­timestockpricequotes
Thesecondexampleusesreal­time,high­frequencystockpricedata.F irst,
makesuretocorrectlydirecttheoutput(i.e.,inthiscasetothe
forthereal­timeplot): Bokeh server
In [68]: bp.output_notebook("default")
Out[68]: Usingsaved
To override,session configurationfor http://localhost:5006/
pass 'load_from_config=False' to Session
Chapter6provides
useinwhatfollows. an
I t examplebasedonthedatasourceandAPIthat
for we
isthestockpriceAPI intradayreal­timedata
providedbyNetfonds,aNorwegianonlinebroker.TheAPIandweb
service,respectively,havethefollowingbasic URL format:
In [69]: url1
url2 == 'http://hopey.netfonds.no/posdump.php?'
url = url1'date=%s%s%s&paper=%s.O&csv_format=csv'
+ url2
This URL is to becustomized
oneisinterestedi n: by providingdateinformationandthesymbol
In [70]: today = dt.datetime.now()
y=m ='%02d'
# current% %today.year
'%d' yeartoday.month
d #current
='%02d' month, add leading zero if needed
%(today.day)
day,add leading zero if needed
sym#current
='AAPL'
# Apple Inc. stocks
In [71]: y, m, d, sym
Out[71]: ('2014', '09', '29', 'AAPL')
In [72]: urlreq
urlreq = url % (y, m, d, sym)
Out[72]: 'http://hopey.netfonds.no/posdump.php?
date=20140929&paper=AAPL.O&csv_fo
rmat=csv'
Equippedwiththeright URL string, retrievingdatais onlyonelineofcode
away:
In [73]: data= pd.read_csv(urlreq,
# initialize parse_dates=['time'])
DataFrame object
iThedetailsofwhatfollowsareknownfromthepreviousexample.F
nitial plot: irst, the
In[74]: bp.line(data['time'],
x_axis_type='datetime',
data['bid'],legend=sym)
#intial plot
Out[74]: <bokeh.objects.Plot at 0x7f92bedc8dd0>
Second,selectionoftherenderingobject:
In [75]: renderer = [rifforisinstance(r,
r in bp.curplot().renderers
ds = renderer.data_source Glyph)][0]
Third,the while loopupdatingthefinancialdataandtheplotperloop:
In [76]: start =(time.time()
whiledata=time.time()- start) < 60:
data= pd.read_csv(urlreq,
data[data['time'] > parse_dates=['time'])
dt.datetime(int(y), int(m),
int(d), 10,0,0)]
#onlydata=fromdata['time']
ds.data['x'] trading start at 10am
ds.data['y']=data['bid']ds._dirty=True
bp.cursession().store_objects(ds)
time.sleep(0.5)
Figure14­6showstheresultingoutput—again,unfortunately,onlyas
snapshotofareal­timeplot. t a t i c
Figure14­6.Screenshotofreal­timeBokehplotviaBokehServer(stockquotes)
RapidWebApplications
Ifthe Python worldweretobedividedintocontinents,theremightbe,
amongothers,thescienceandfinancecontent,thesystemadministration
transparent,itishighlyprobablethatthewebdevelopmentcontinent,to
continent,andforsurethewebdevelopmentcontinent.Althoughnotreally
staywiththisconcept,mightbeoneofthelargestwheni
t andhouses(applications)builttoncomestopeople
OneofthemajorreasonsforPythonbeingstronginwebdevelopmenti
(developers)populatingi it. s
theavailabilityofdifferenthigh­level,full­stackframeworks.Asthe
PythonAwebapplicationmayuseacombinationofabaseHTTPapplication
webpagestates:
requestdispatcher,anauthenticationmoduleandanAJAXt
server,astoragemechanismsuchasadatabase,atemplateengine,a
o lkit. These
canbeindividualcomponentsorbeprovidedtogetherinahigh­level
framework.
Amongthemostpopularframeworks are:
Django
Flask
Pyramid/Pylons
TurboGears
Zope
Ieverybodyandeverydifferentapplicationtype.[56]Allhavet
t is safetosaythatthereis notasingleframeworkthatis bestsuitedfor
heaisrtstrengths
(regardingarchitecture,style,syntax,APIs,e
(andsometimesweaknesses),andoftenitismoreamatteroft
tc.) whatframework eis chosen.
Oneframeworkthathasrecentlygainedpopularityquiterapidlyis
It is theframeworkweusehere,mainlyforthefollowingreasons: Flask.
Applicationdevelopmentwith Flask is really Pythonic, witha lot of
Pythonic
theweb­relateddetailsbeingtakencareofbehindthescenes.
Compactness
Ibasedmainlyonstandardcomponentsandl
t isnottoocomplexandcanthereforebelearnedquiterapidly;itis
elsewhere. ibraries widelyused
Documentation
around300pagesavailableat thetimeofthHTMLis wriversionanda
Itiswelldocumented,withbothanonline ting.[57] PDF with
Thetwomainl i b r a r i e s that Flask r e l i e
Jinja2, awebtemplatinglanguage/enginefor Python s onare:
Werkzeug, aWSGI(Web Server Gateway Interface) toolkitfor
Python
Traders’ChatRoom
WewillnowdiveintotheexampleapplicationcalledTradechatfora
utorial ofthe Flask documentationbutincludesacoupleofchangesand
ttraders’chatroom,whichbasicallyreliesontheexampleusedinthe
addssomefurtherf
Thebasicideai u n c t i o n a l i t y . [ 5 8 ]
s tobuildaweb­basedapplicationforwhichtraderscan
markets.Themainscreenshallallowauserwhoisloggedintotypeintext
registerthatprovidesonecentralchatroomtoexchangeideasandtalk
thatis, after pushingabutton,addedtothetimeline,indicatingwhoadded
thecommentandwhent h i s happened.Themainscreenalsoshowsa l the
historicalentriesindescendingorder(fromnewesttooldest).
DataModeling
Westartbygeneratingtheneededdirectories. tradechat shallbethe main
directory.Inaddition, ata minimum,weneedthetwosubdirectories
static and templates (by Flask convention):
$mkdir
$mkdir tradechat
$ mkdir tradechat/static
tradechat/templates
Tostoredata—bothforregisteredusersandforcommentsmadeinthechat
room—weuseSQLite3(cf. http://www.sqlite.organd
http://docs.python.org/2/library/sqlite3.html)asadatabase.Twodifferent
wedo notdiscusshere.Youshould
tablesareneededthatcanbegeneratedbythe
Example14­1,thedetailsofwhich SQL schemapresentedin
tradechat. tables.sql in themaindirectoryofthe
storethisunderthefilename
application,
Example14­1.SQLschematogeneratetablesinSQLite3
drop table if exists comments;
create
iduser table primary
integer commentskey( autoincrement,
commenttexttextnot
not null,null,
);time text notnull
drop
create tabletableifusers
exists( users;
id integernotprimary
nametext null, key autoincrement,
);password textnot null
ThePythonCode
The SQL schemaisamaininputforthe Python/Flask applicationto
atthecompletePython
follow.Wewillgothroughthesingleelementsstepbysteptofinallyarrive
scripttobestoredunder tradechat.py inthemain
directory, tradechat.
Importsanddatabasepreliminaries
Atthebeginningweneedtoimportacoupleofl i b r a r i e s andalsosome
mainfunctionsfrom Flask. Weimportthefunctionsdirectlytoshortenthe
codethroughoutandincreasereadabilitysomewhat:
#Tradechat
## Asimple example for a web-based chat room
#based
# onFlask and SQLite3.
import
import osdatetime as dt
from sqlite3import
fromflask importFlask,
dbapi2request,
as sqlite3session, g, redirect, url_for,
abort,\render_template, flash
Thewholeapplicationhingesona Flask object,aninstanceofthemain
s executed,objectfor
classoftheframework.Instantiatingtheclasswith nameletsthe
inherittheapplicationname(i.e.,main)whenthescripti
example,fromashell:
#appthe= application
Flask(__name__)object from the main Flask class
Thenextstepi s todosomeconfigurationforthenewapplicationobject.In
particular,weneedtoprovideadatabasefilename:
#app.config.update(dict(
override configfrom environment variable'tradechat.db'),
DATABASE=os.path.join(app.root_path,
#the SQLite3 database file ("TC database")
DEBUG=True,
SECRET_KEY='secret_key',
# use secure key here for real applications
))app.config.from_envvar('TC_SETTINGS',
# do not complain if no config file silent=True)
exists
Havingprovidedthepathandfilenameofthedatabase,thefunction
connect_db connectstothedatabase and returnstheconnectionobject:
defconnect_db():
'''rv =Connects to the TC database.'''
sqlite3.connect(app.config['DATABASE'])
rv.row_factory
returnrv = sqlite3.Row
Flask usesanobjectcalled g tostoreglobaldataandotherobjects.For
example,webapplicationsservinglargenumbersofusersmakeit
necessarytoconnectregularlytodatabases.Itwouldbeinefficientto
instantiateaconnectionobjecteverytimeadatabaseoperationhasto
approach in
sqlite_db ofthe gobject. Thefunction get_dbs openedonlywhenthere
executed.Onecanratherstoresuchaconnectionobjectintheattribute
thatanewdatabaseconnectioni makesuseofthis be
connectionobjectstoredinthe gobject already: i s no
def get_db():
Opens anew connection
if'''nothasattr(g, to the TC database. '''
'sqlite_db'):
#open only ifconnect_db()
g.sqlite_db= noneexists yet
return g.sqlite_db
Atlfunctioneast once,weneedtocreate
init_db the tablesinthedatabase.Callingthe
forasecondtimewilldeletea l informationpreviously
storedinthedatabase(accordingtothe SQL schemaused):
def init_db():
'''withCreatestheTC database tables.'''
app.app_context():
db =get_db()
with #creates entries and users tables
db.cursor().executescript(f.read())
app.open_resource('tables.sql', mode='r')as f:
db.commit()
Flask
object.Forthef
Thefunction close_db
functiondecorator,i .e., @app.teardown_appcontext.
irst time(andforsurenotthel This g
closesthedatabaseconnectionifoneexistsinthe
ast time),weencountera
decoratorensuresthattherespectivefunctioniscalledwheneverthe
error/exception:
executionoftheapplicationisterminatedbytheuserorbyan
applicationcontexttearsdown—thatis,roughlyspeaking,whenthe
@app.teardown_appcontext
def close_db(error):
'''ClosestheTC'sqlite_db'):
ifhasattr(g, database at the end of the request. '''
g.sqlite_db.close()
Corefunctionality
implementthecorefunctionalityfortheapplication.First,wehaveto
Buildingonthedatabaseinfrastructure,wecannowproceedand
definewhathappenswhen we connecttothemain/homepageofthe
@app.route("/"). Thefunctiondecoratedinthat way willbecalled
application.Tothisend,weusetheFlaskfunctiondecorator
show_entries basicallyestablishesadatabaseconnection,retrieves al
wheneveraconnectionisestablishedtothemainpage.Thefunction
commentspostedsofar(maybenone,maybemany),andsendsthemtoa
template­basedrenderingenginetoreturnan HTML documentbasedonthe
templateandthedataprovided(moreonthetemplatingpartsoon):
@app.route('/')
defshow_entries():
'''Renders
dbquery
=get_db()all entries of the TC database. '''
desc'cursor='select comment, user, time from comments orderby id
comments = db.execute(query)
= cursor.fetchall()
return render_template('show_entries.html', comments=comments)
Weonlywanttoallowregistereduserstopostcommentsinthechatroom.
Therefore,wemustprovidefunctionalityforausertoregister.Tot
bythe applicationandtobeaccessed by theuser.TohisanHTMLend,
register,ausermustprovideausernameandapassword.Otherwise,
technically,wemustallowuseofthePOSTmethodfortherespective
toberendered
errori ion only. It is missinganumberofingredientsimportantforr
il ustrastreported.Thefunction register shouldbeconsideredasimpleeal­
worldapplications,likecheckingwhetherausernamealreadyexistsand
encryptionofthepasswords(theyarestoredasplaint ext). Onceusershave
successfullyregistered,theirstatusi s automaticallychangedto logged_in
andtheyareredirectedtothemainpagevia
redirect(url_for("show_entries")):
@app.route('/register',
''' Registers a new user
def register(): methods=['GET', 'POST'])
iferror= None == 'POST':in the TC database. '''
request.method
if request.form['username'] == '' or request.form['password']
db=get_db()
== '': error = 'Provide both a username and a password.'
else:#db.execute('insert
both fields have to be nonempty
into users (name, password) values(?,
?)', [request.form['username'],
request.form['password']])
db.commit()
session['logged_in']
#directly log new = True
user
flash('You wereinsucessfully registered.')
app.config.update(dict(USERNAME=request.form['username']))
return render_template('register.html',
returnredirect(url_for('show_entries'))
error=error)
Forsuchawebapplication,thereareprobablyreturningusersthatdonot
need or wanttoreregisteranew.Wethereforeneedtoprovideaformtolog
functionalityissimilartothatprovidedby register:logindoes. The
inwithanexistingaccount.Thisis whatthefunction
@app.route('/login',
def error
login(): methods=['GET', 'POST'])
''' Logsina
= None user.'''
if request.method
dbtry:= get_db() == 'POST':
= ?' idquery= db.execute(query,
= 'select id from users where name = ? and password
(request.form['username'],
request.form['password'])).fetchone()[0]
##isnotfound
fails ifrecord with=Trueprovided username and password
session['logged_in']
flash('You are now logged in.')
app.config.update(dict(USERNAME=request.form['username']))
return redirect(url_for('show_entries'))
except:
return render_template('login.html',
error = 'User not found or wrongerror=error)
password.'
Onceusershaveregistered
comments in or loggedinagain,theyshouldbeable
thechatroom.Thefunction add_entry toadd
storesthecomment
and
second)oftheposting.Thefunctionalsocheckswhethertheuseri
text, theusernameoftheuserwhocommented, theexacttime(tothe s logged
inornot:
@app.route('/add',
'''Adds entry tomethods=['POST'])
def add_entry(): the TC database. '''
ifdb not=abort(401)
session.get('logged_in'):
now =get_db()
dt.datetime.now()
db.execute('insert
?,[:-7]]) into comments (comment, user, time) values (?,
?)', [request.form['text'], app.config['USERNAME'], str(now)
db.commit()
flash('Your comment was successfully added.')
return redirect(url_for('show_entries'))
Finally,toendthesession,theusermustlogout.Thisi
logout supports: s whatthefunction
''' Logs outthe current user.
logout():
[email protected]('/logout')
session.pop('logged_in', None) '''
flash('You werelogged out')
return redirect(url_for('show_entries'))
f wewanttoruntheinePython
Iaddthefollowingl
theapplicationisserved: scriptasastandaloneapplicationweshould
s, whichmakesurethataserverisfiredupandthat
routine== '__main__':
#ifmain__name__
init_db() #comment
#TC out
database if
is data
tobe inkeptcurrent
app.run()
Puttinga l thesepiecestogether,weendupwiththe Python scriptshown
Example14­2.
asExample14­2.
application PythonscriptembodyingthecoreoftheTradechat
#Tradechat
##Asimple example for a web-based chat room
## based on Flask and SQLite3.
import os
import
from datetimeimportasdtdbapi2 as sqlite3
sqlite3
from flaskimport
\ render_template,Flask,flashrequest, session, g, redirect, url_for, abort,
#appthe= application
Flask(__name__)object from the main Flask class
override configfrom environment variable'tradechat.db'),
#app.config.update(dict(
DATABASE=os.path.join(app.root_path,
# the SQLite3 database file ("TC database")
DEBUG=True,
SECRET_KEY='secret_key',
# usesecure key here for real applications
))app.config.from_envvar('TC_SETTINGS',
# do notcomplain if no config file silent=True)
exists
defconnect_db():'''
Connects to the TC database.'''
rvreturnrv
= sqlite3.connect(app.config['DATABASE'])
rv.row_factory =sqlite3.Row
def get_db():
'''Opens aonlynew ifnoneexists
connection to the TC database. '''
ifnot # openhasattr(g, 'sqlite_db'):yet
g.sqlite_db
return g.sqlite_db = connect_db()
def init_db():
'''withCreates the TC database tables.'''
app.app_context():
dbwith=get_db()
app.open_resource('tables.sql', mode='r') as f:
db.cursor().executescript(f.read())
# creates entries and users tables
db.commit()
@app.teardown_appcontext
''' Closes theTCdatabase at the end of the request. '''
def close_db(error):
ifhasattr(g,'sqlite_db'):
g.sqlite_db.close()
@app.route('/')
defshow_entries():
'''Renders
dbcursor
=get_db() all entries of the TC database. '''
query ='select comment, user, time from comments orderby id desc'
= db.execute(query)
comments = cursor.fetchall()
return render_template('show_entries.html', comments=comments)
@app.route('/register',
''' Registers a new user
def register(): methods=['GET', 'POST']) '''
in the TC database.
if request.method == 'POST':
error=None
ifdb=request.form['username']
get_db() == '' or request.form['password'] ==
'': error = 'Provide both a username and a password.'
else:d#b.execute('insert
both fields have tointobeusers(name,
nonempty
password) values (?,
?)',
request.form['password']])[request.form['username'],
db.commit()
session['logged_in'] =True
loginnewuser
# directly weresucessfullyregistered.')
flash('You
app.config.update(dict(USERNAME=request.form['username']))
returnredirect(url_for('show_entries'))
return render_template('register.html', error=error)
@app.route('/login',
''' LogsNonein a user.methods=['GET',
def login(): ''' 'POST'])
iferror=
request.method == 'POST':
dbtry:=get_db()
?' query ='select id from users where name= ? and password=
id = db.execute(query, (request.form['username'],
request.form['password'])).fetchone()[0]
#fails with
is not = Trueprovided username and password
if record
found
#session['logged_in']
flash('Youare now logged in.')
app.config.update(dict(USERNAME=request.form['username']))
except:return redirect(url_for('show_entries'))
error = 'User not found or wrongerror=error)
password.'
return render_template('login.html',
@app.route('/add',
def add_entry(): methods=['POST'])
'''Addsentry
ifnot totheTCdatabase. '''
session.get('logged_in'):
now=dt.datetime.now()
db =abort(401)
get_db()
?)',db.execute('insert into comments (comment,user,time) values(?, ?,
[request.form['text'], app.config['USERNAME'], str(now)
[:-7]])db.commit()
flash('Yourcommentwas successfullyadded.')
return redirect(url_for('show_entries'))
@app.route('/logout')
def logout():
Logs outthe current user.
'''session.pop('logged_in', None) '''
flash('Youredirect(url_for('show_entries'))
return were logged out')
#ifmainroutine
__name__
init_db()=='__main__':
#comment
# TC out ifis data
database to beinkeptcurrent
app.run()
SECURITY
Althoughtheexampleinthis sectionil ustrates thebasicdesignofa
webapplicationin Python withFlask,itbarelyaddressessecurity
issues,whichareofparamountimportancewhenitcomestoweb
completetoolsetstotackletypicalsecurityissues(e.g.,encryption)
applications.However,Flaskandotherwebframeworksprovide
withduediligence.
Templating
Basically,templatingwith Flask (Jinja2)workssimilarlytosimplestring
replacementsin Python: youhaveabasic string
replacewhatandsomedatatobeinsertedintothe indicatingwhereto
string object.Consider
thefollowingexamples:
In [77]: '%d, %d, %d' % (1, 2, 3)
Out[77]: '1, 2, 3'
In [78]: '{}, {}, {}'.format(1, 2, 3)
Out[78]: '1, 2, 3'
In[79]: '{}, {}, {}'.format(*'123')
Out[79]: '1, 2, 3'
Templatingtogenerate HTML object“resembles”an
difference is thatthe string pagesworksprettysimilarly.Themajor
waysofcontrollingtheflowwhenrenderingthetemplate( HTML document(ora
e.g., the for
partthereof)andhascommandsforreplacementsandalso,forexample,
loop).Missinginformationisaddedduringtherenderingprocedure,aswe
addedtheintegerstothe string objectinthepreviousexamples.Consider
nowthefollowing string
sometemplate­specificcode: object,containingpartlystandard HTML codeand
In [80]: templJust=print
'''<!doctype
out html> provided to the template.
<b>numbers</b>
{%{{for number innumbers%}
<br><br>
number }}
{% endfor%}
'''
Sofar,thisisa string objectonly.Wehavetogeneratea
Template objectoutofit beforeproceeding: Jinja2
In [81]: from jinja2 import Template
In [82]: t = Template(templ)
ThisTemplateobjecthasamethodcalled
theparameter numbers: renderhis tomakevalid
outofthetemplateandsomeinputvalues—int HTML code
case,somenumbersvia
In [83]:html = t.render(numbers=range(5))
Thecodeisagainastringobject:
In [84]: html
Out[84]:
tothe tempu'<!doctype html>\n Just print out <b>numbers</b> provided
3\n \n late.\n
4\n '
<br><br>\n \n 0\n \n 1\n \n 2\n \n
Suchanobjectcontaining
as follows: HTML codecanberenderedin IPythonNotebook
In [85]: from IPython.display import HTML
HTML(html)
Out[85]: <IPython.core.display.HTML at 0x7fdb7e1eb890>
Ofcourse,templatinginvolvesmuchmorethanthissimpleexamplecan
illustrate(e.g.,inheritance).Moredetailscanbefoundat
alreadyincludea
http://jinja.pocoo.org.However,thetemplatesfortheTradechatapplication
followingtemplates:numberofimportantaspects.Specifically,weneedthe
layout.html
Definesthebasiclayoutfromwhichtheothertemplatesinherit
register.html
Thetemplatefortheuserregistrationpage
login.html
Thecorrespondingtemplate for theuserlogin
show_iesntloggedi
ries.htmlThne,mathetextf
inpashowing
ge ielthed forcommentsinthechatroomand,i
writingandpostingcomments f theuser
Thesef iles havetobestoredin
templateswhenusing
Example14­3showsthe templates, thedefault(sub)directoryfor
Flask.templatecontainingthebasiclayoutandsome
meta­information( l i k e thesitetitle).Thisisthetemplateallothertemplates
Example 14­3.TemplateforbasiclayoutofTradechatapplication
inheritfrom.
<!doctype html>
<title>Tradechat</title>
<linkhref="{{url_for('static',
rel=stylesheet type=text/cssfilename='style.css') }}">
<div<h1>Tradechat</h1>
class=page>
{%i<divclass=metanav>
fnotses ion.log ed_in%}<ahref="{ url_for('login') }}">log in</a><br>
{%<a<aelse%}href="{{ url_for('register') }}">register</a>
endifhref="{{
%} url_for('logout')
{%{%</div>formessagein }}">log out</a>
get_flashed_messages() %}
{%{%<divendfor class=flash>{{
%} message }}</div>
</div>block body %}{% endblock %}
course).Nocommentshavebeenpostedyet.
Figure14­7showsascreenshotofthemainpagea
applicationforthefirsttime.Nousersareregistered(orloggedin,of
fter startingthe
Figure14­7.Screenshot of “empty”homepageofTradechat
Example14­4providesthetemplatingcodefortheuserregistrationpage.
Here,formsareusedtoallowuserstoprovideinformationtothepagevia
POST method.
theExample14­4.TemplateforTradechatuserregistration
{% extends "layout.html" %}
{%block body%}
<h2>Register</h2>
{% iferror
endif%} %}<p class=error><strong>Error:</strong> {{ error }}{%
<form<dl>action="{{ url_for('register')}}" method=post>
<dd><font size="-1">Username</font>
<dd><input type=text name=username>
<dd><font size="-1">Password</font>
<dd><input
<dd><input type=password
type=submit name=password>
value=Register>
{%</endblock
dl></form> %}
Figure14­8showsascreenshotoftheregistrationpage.

Figure14­8.ScreenshotofTradechatregistrationpage
Thetemplatingcodefortheloginpage,asshowninExample14­5,i
similartothecodefortheregistrationpage.Again,theusercanprovides pretty
logininformationviaaform.
Example14­5.TemplateforTradechatuserlogin
{% extends "layout.html" %}
{%<h2>Login</h2>
block body %}
{% iferror %}<p class=error><strong>Error:</strong> {{ error }}{%
endif<form%}action="{{ url_for('login') }}" method=post>
<dl><dd><font size="-1">Username</font>
<dd><input
<dd><input type=textname=username>
<dd><font size="-1">Password</font>
type=password name=password>
<dd><input type=submit value=Login>
</dl>
</form>{%
endblock %}
theregistrationpagebutalsoprovidesmainlythesamefunctionality.
Theloginpage,asshowninFigure14­9,notonlylooksprettysimilarto
Figure14­9.ScreenshotofTradechatloginpage
Finally,Example14­6providesthetemplatingcodeforthemainpage.This
templatedoesmainlytwothings:
Enablescommenting
f theuserisloggedin, atextfield anda Post buttonareshownto
Iallowtheusertopostcomments.
Displayscomments
Allcommentsfoundinthedatabasearedisplayedinreverse
chronologicalorder(newestf irst, oldestlast).
Example14­6.TemplateforTradechatmainpagewithchatroom
comments
{% extends "layout.html" %}
{%block body %}
{%<form
if session.logged_in
action="{{ %}
url_for('add_entry') }}" method=post class=add-
comment><dl><dd>What's up?
<dd><textareatype=submit
<dd><input name=textvalue=Post>
rows=3 cols=40></textarea>
</dl>
{%endif%}
</form>
{%for
<ul<li>{{ comment
class=comments>
<fontcomment.comment|safe
size="-2">({{ }} }} @ {{ comment.time }})
incomments%}comment.user
</font>
{%<li><em>No
else %} comments so far.</em>
{% endfor %}
</ul>
{%endblock %} and haspostedsomecomments,themainpage
Onceauserisloggedin
thedatabase( ld andthe Post button as wellasal commentsstoredin
showsthetextfcf.ieFigure14­10).
Justshowingthescreenshotsincombinationwiththetemplatesi
inasense.Whatismissinginthemixisthestylinginformation. s cheating,
Figure14­10.ScreenshotofTradechatmainpage
Styling
Today’sstandardwhenit comestothestylingofwebpagesandweb­based
applicationsis CSS (Cascading Style Sheets). Ifyoutakeacloserlook
atthesingletemplates,youwillfindinmanyplacesparameterizationslike
class=commentsorclass=add-comment.
Therefore,letushavealooka WithoutacorrespondingCSS
file,theseparameterizationsareessentiallymeaningless.
t thefilestyle.css,stored
(sub)directory static and inthe
showninExample14­7.Hereyoufindthe
aforementionedparameters(comments, add-comment) again.Youalsofind
referencestostandard
standardtag,like HTML tags,like h1 forthehighest­rankingheader.All
h1, definesorchanges certainstyleelements(
informationprovidedafteracustomclassname,like comments,eora.g., font
Thisstyleinformationis thefinalingredientdefiningthelookofthe
typeand/orsize)oftherelevantobject.
Tradechatapplicationandexplainingwhy,forexample,the“Tradechat”
heading is displayedinblue(namely,duetotheline a, h1, h2 { color:
#0066cc; } ) .
Example14­7.CSSstylesheetforTradechatapplication
body {font-family: sans-serif; background: #eee; }
h1h2h1,a, {font-size:
h1,h2 {font-family:
h2 {color:1.4em;
1.0em;#0066cc;
} } sans-serif;
'Helvetica',
border-bottom: 2px solidmargin:
#eee;}0; }
.page {margin: 2empadding: auto; width:
0.8em;
none;
35em; border:
background:
margin: 0;
1px solid} #ccc;
white;
padding:
.comments
.comments {list-style:
li{ margin: 0.8em 1.2em; } 0; }
.comments
.add-comment li h2{
{ margin-left: -1em;
color:#0066cc; } 0.7em; border-bottom: 1px
font-size:
solid#ccc;
.add-comment }dl { font-weight: bold; }
.metanav{ text-align:right; margin-bottom:font-size:
1em; 0.8em; padding: 0.3em;
.flash {color:
.error { color: #ff4629;
#b9b9b9;font-size: 0.7em;}padding: 0.5em; }}
background:
font-size: 0.7em; #fafafa;
Icontainthesamef iles listed here: tradechat directoryshouldnow
f youhavefollowedeverystep,your
In [86]: import
forprintpath,ospathdirs, files in os.walk('../python/tradechat'):
forprintf inffiles:
Out[86]: ../python/tradechat
tradechat.py
tradechat.db
tables.sql
../python/tradechat/static
style.css
../python/tradechat/templates
layout.html
login.htmlregister.html
show_entries.html
Youcannowrunthemainscriptfromtheshellasfollowsands
application: tart the
$python tradechat.py
http://127.0.0.1:5000.Clickon register toregisterasauser,andafter
Youcanthenaccesstheapplicationviayourwebbrowserat
havingprovidedausernameandapasswordyouwillbeabletopostyour
comments.
WebServices
Thelasttopicinthischapter—andaveryinterestingandimportantone—is
webservices. Web servicesprovideasimpleandefficient meansto access
server­basedfunctionalityviawebprotocols.Forexample,oneoftheweb
serviceswiththehighesttrafficistheGooglesearchfunctionality.Weare
usedtovisitinghttp://www.google.comandtypingsomewordsofinterest
intothesearch/textinputf ield providedonthewebsite.However,what
happensafteryoupresstheReturnkeyorpushtheSearchbuttoni
pagetranslatesalltheinformationithas(fromthesearchf i e l d s thatthe
andmaybe
yourpersonalpreferences)intoamore or lesscomplex URL.
http://www.google.de/search?num=5&q=yves+python.Whenyouclick
SuchaURLcould,forexample,takeontheform
searchresults(num=5)thattheengineconsidersthebestmatchesgiven
this linkorcopyitintoyourwebbrowser,GoogleSearchreturnsthosefive
wordsprovided(q=Yves+Python).Yourwebbrowserthendisplays the
somethingsimilartoFigure14­11.
Usingwebservices,anykindofdata­andtransaction­orientedfinancial
servicecanbeprovidedviawebtechnologies.Forinstance,Yahoo!
suchawebserviceapproach.Morecomplexservicessuchasderivatives
FinanceandGoogleFinanceofferhistoricalstockpriceinformationvia
pricingandriskanalytics are alsoavailableviasuchservices(forexample,
theweb­basedanalyticssolutionDEXISION; c f .
suchaservice in of
analytics.com).Thefollowingexamplei l ustrateshttp://derivatives­
thecontext optionpricing. the implementationof
Figure14­11.ScreenshotofGooglesearchresultsviawebservice
TheFinancialModel
Inthissection,wearegoingtoimplementawebservicethatallowsusto
valuev o l a t i l i t y options( e . g . , onav o l a t i
oneofGruenbichlerandLongstaff(1996).Theymodelthevolatility l i t y index).Themodelweusei s the
process(e.g.,theprocessofavolatilityindex)indirectfashionbyasquare­
rootdiffusion,providedinEquation14­1.Thisprocessisknowntoexhibit
convenientfeaturesforv o l a t i l i t y modeling,likepositivityandmean
reversion.[59]
Equation14­1.Square­rootdiffusionforvolatility modeling
ThevariablesandparametersinEquation14­1havethefollowing
meanings:
Vt Thetimetvalueofthevolatility index(forexample,theVSTOXX)
B V Thelong­run meanof thevolatility index
DV TherateatwhichVtreverts toᵰ
-V Thevolatility ofthevolatility (“vol­vol”)
B V,Assumedtobeconstantandpositive
DV,and -V
Zt AstandardBrownianmotion
Basedonthismodel,GruenbichlerandLongstaff(1996)derivetheformula
providedinEquation14­2forthevalueofaEuropeanc al option.Inthe
formula,D(T)istheappropriatediscountfactor.Theparameterᵰdenotes
theexpectedpremiumforv
noncentralᵱ2distribution. o l a t i l i t y r i s k , while i s thecomplementary
Equation14­2.CalloptionformulaofGruenbichlerandLongstaff(1996)

TheImplementation
ThetranslationoftheformulaaspresentedinEquation14­2to Python is,
asusual,quitestraightforward.Example14­8showsthecodeofa
vol_pricing_formula.py andstoreitinasub­directory, volservice.
modulewithsuchavaluationfunction.Wecallthescript Python
Example14­8.Pythonscriptforv
#
o l a t i l i t y optionvaluation
## Valuation
in of European volatility
Gruenbichler-Longstaff (1996) call options
model
##square-root
--semianalyticalformula
diffusion framework
from scipy.statsas npimport ncx2
importnumpy
# Semianalytical option pricing formula of GL96
def calculate_option_value(V0,
''' Calculation of Europeankappa, theta,price
call option sigma,inzeta,T,
GL96 model.r,K):
Parameters
==========
V0 :float
current volatility level
kappamean: float
reversion factor
thetalong-run
:float
:floatmeanof ofvolatility
sigmavolatility volatility
zetavolatility
: risk premium
T : float
time-to-maturity
r :floatrisk-free
strike priceshortof rate
K :float the option
Returns
=======
valuenet: float
present value of volatility call option
'''D= np.exp(-r * T) # discountfactor
#alphavariables
=kappa
kappa+*zetatheta
beta=
gamma =4* beta/ (sigma ** 2 * (1 - np.exp(-beta * T)))
cx1lamb alpha/sigma ***2K,* nuT) +* 4,V0
nu ==4=1*gamma*np.exp(-beta
-ncx2.cdf(gamma
cx2cx3 == 11 -ncx2.cdf(gamma * K, nu + 2, lamb)
lamb)
- ncx2.cdf(gamma * K, nu, lamb)
#value+formula fornp.exp(-beta
D *=(D*
(alpha European
/ beta) *call(1* T)-pricenp.exp(-beta
* V0*cx1 * T))
* cx2-valueD * K * cx3)
return
Tosimplifytheimplementationofthewebservicewewriteaconvenience
anerror
andvol_pricing_service.py,
function, get_option_value,
functionwillreturn
aPythonmodulecalled of theseparameters.The
showninExample14­9.Thisscriptalsocontainsadictionarywith
neededparameterstocalculateac
necessaryparameters briefdescriptions all the
messagedetailingwhatismissingwhenever
whichwillcheckfortheprovisionofa
al optionvalue.Thefunctionisstoredin
thecodeofwhichis

one or moreparameters are missing.al ,Ithefunctionc


providedduringthewebservicec f al necessaryparametersare
al s thepricingfunction
calculate_option_value fromthe
Example14­9.Pythonscriptforv vol_pricing_formula.py s c
olatility optionvaluationandwebservice r i p t .
#helperfunction
##--
Valuation
#in of European volatility
Gruenbichler-Longstaff (1996) options
model
# square-rootdiffusion
parameter dictionaryframework
&web service function
#from vol_pricing_formula import calculate_option_value
# modelparameters
PARAMS={
'V0' : 'current volatility level',
'kappa'
'theta' :'mean
'sigma' 'long-runreversion
: 'volatility offactor',
meanof volatility',
volatility',
'zeta' : 'factor
horizonofinterest
'T''r' ::'risk-free
'time thein years',
expected volatility risk premium',
'K'} : 'strike' rate',
#deffunction for web service
get_option_value(data):
'''A helper function forweb service. '''
errorline ='Missing parameter %s (%s)\n'
forerrormsg
para =inPARAMS:
''
if not#check
data.has_key(para):
errormsg if+=allerrorline
parameters% (para,
are provided
PARAMS[para])
iferrormsg != '':
else:return errormsg
result =calculate_option_value(float(data['V0']),
float(data['kappa']),
float(data['theta']),
float(data['sigma']),
float(data['zeta']),
return str(result) float(data['T']),
)
float(data['K'])
float(data['r']),
Tobeginwith,weaddthepathoftheaforementioned Python scripts:
In [87]: import sys
sys.path.append("../python/volservice")
# adjust if necessary to your path
service( recal that Werkzeug tohandleour
Weusethelibrary WSGI application­basedweb
isanintegralpartofFlask).To t h i s end,we
needtoimportsomefunctionsfrom Werkzeug sublibraries:
In [88]: from werkzeug.wrappers import Request, Response
Furthermore,forourcore WSGI applicationtofollow,weneedthefunction
get_option_value thatwedefinede arlier:
In [89]: from vol_pricing_service import get_option_value
Theonlythingthatremainsis toimplementthe WSGI application(function)
itself.Thisfunctionmightinourcaselookasfollows:
In [90]: def application(environ,
request = start_response):
Request(environ)
#textwrap= get_option_value(request.args)
environ in new object
##getback
provideallparametersofcall
eithererror message tooption
or functionvalue
response = Response(text,
#returngenerate response mimetype='text/html')
object basedonthereturned text
response(environ, start_response)
Here, environ i
Requestfunction s adictionarycontaininga l
the environ informationabitmoreconvenient. incominginformation.The
start_response is
wrapsallinformationinamannerthatmakesaccessing
youhavethe Response function,whichtakescareoftheresponse.
usuallyusedtoindicatethestartofaresponse.However,withWerkzeug
Allparametersprovidedtothewebservice are found in the request.args
t ribute, andthisiswhatweprovidetothe get_option_value function.
aThisfunctionreturnseitheranerrormessageintextformorthecalculated
TobebetterabletoservethWSGI
thefunctionintoaseparate
optionvalueintextform. script ande.g.,addtheservingfunctionality
is function( viaalocal web server),weputto
iExample14­10.Pythonscriptforv ipt, called vol_pricing.py.
t. Example14­10showsthecodeoftolathiilsitsycroptionvaluationandweb
servicehelperfunction
# inGruenbichler-Longstaff
ValuationofEuropean volatility
##square-root (1996) options
model
diffusionforframework
## -- WSGIapplication web service
fromvol_pricing_service
fromwerkzeug.wrappers importRequest,
import get_option_value
fromwerkzeug.serving import run_simpleResponse
def application(environ,
request start_response):
=Request(environ)
text## wrap environinnew object
=provideallparametersofcall
get_option_value(request.args) tooptionvalue
function
#get back eithererror
# generateresponse
response message or
= Response(text,objectbasedonthereturned
mimetype='text/html') text
return response(environ, start_response)
if __name__=='__main__':
run_simple('localhost', 4000, application)
Beingintherightsubdirectory(volservice),youcannows tart the
applicationbyexecutingthefollowingcommandviatheshellorcommand­
lineinterface:
$*python
Runningvol_pricing.py
on http://localhost:4000/
Thisfiresupaseparate
Using wecan Python processthatservesthe
urllib,URL inyourwebbrowserandpressing
Copyingthe nowaccessthe“ful power”ofthe WSGI application.
web service.
somethingliketheresultshowninFigure14­12. theReturnkeyyields

Figure14­12.Screenshotoftheerrormessageofthewebservice
However,usuallyyouwanttouseawebservicequiteab
example,fromascriptingenvironmentlike To i t differently—for
IPython. this end,wecan
usethefunctionalitythe urlliblibrary provides:
In [91]: import numpy as np
import urllib
url = 'http://localhost:4000/'
Asimplecalltothewebservicewithoutprovidinganyparametersreturns
thefollowingerrormessage,which(apartfromformattingissues)i s the
sameasinthescreenshotinFigure14­12:
In [92]: print urllib.urlopen(url).read()
Missing parameter
Out[92]: Missing parameter rV0(risk-free
(current volatility
interest level)
rate)
Missing parameter
Missing parameter Tkappa(time(mean-reversion
horizon in factor)
years)
Missing
Missing parameter
parameter theta
zeta (long-runof mean
(factor the of volatility)
expected volatility
riskpremium)
Missing parameter sigma (volatility of volatility)
Missing parameter K (strike)
Ofcourse,weneedtoprovideanumberofparameters.Therefore,wef
builda URLater stringobjectinwhich wecan i r
replacespecificparametervaluess t
duringl cal s:
InV0=%s&kappa=%s&theta=%s&sigma=%s&zeta=%s'
[93]: urlpara = url + 'application?
urlpara+='&T=%s&r=%s&K=%s'
Apossibleparameterizationmightbethefollowingone:
In [94]: urlval
urlval = urlpara % (25, 2.0, 20, 1.0, 0.0, 1.5, 0.02, 22.5)
Out[94]: 'http://localhost:4000/application?
V0=25&kappa=2.0&theta=20&sigma=1.0&z
eta=0.0&T=1.5&r=0.02&K=22.5'
Usingthis particular URL stringreturnsanoptionvalue,asdesired:
In [95]: print urllib.urlopen(urlval).read()
Out[95]: 0.202937705934
Withsuchawebservice,youcanofcoursedomultiplec
multipleoptionvaluesquiteeasily: a l s tocalculate
In [96]: %%time
urlpara = 'http://localhost:4000/application?
V0=25&kappa=2.0'
urlpara += '&theta=25&sigma=1.0&zeta=0.0&T=1&r=0.02&K=%s'
strikes
results =
= np.linspace(20,
[] 30, 50)
for K in strikes:
K).read())) results.append(float(urllib.urlopen(urlpara %
results = np.array(results)
Out[96]: CPUWalltime:
times: user
196 ms64 ms, sys: 20 ms, total: 84 ms
In [97]: results
Out[97]: array([
4.1339945 , 4.91296701, 4.71661296, 4.52120153, 4.32692516,
3.19858765, 3.94264561, 3.75313813, 3.56575972, 3.38079846,
2.34078693, 3.01946028, 2.8437621, 2.67184576, 2.50406508,
1.60280064, 2.18230495, 2.02898213, 1.88111287, 1.738968,
1.01739405, 1.47281111, 1.34917004, 1.23204859, 1.12141092,
0.59445387, 0.9199686 , 0.82907686, 0.74462353, 0.66647327,
0.31824647, 0.52843174, 0.46798166, 0.41300694, 0.36319553,
0.1557064 , 0.27785656, 0.24171678, 0.20951651, 0.18094732,
0.06958767, 0.1334996 , 0.11414975, 0.09710449, 0.08234678,
0.02840802]) 0.05859317, 0.04915788, 0.04109348, 0.03422854,
Oneadvantageofthisapproachis thatyoudonotuseyourlocalresources
alsouse,forexample,parallelizationtechniques. Of course,inourexample
togettheresults,butrathertheresourcesofawebserver—whichmight
is localandthewebserviceuses the localcomputingresources.
aFigure14­13showsthevaluationresultsgraphically,concludingthis
lsection:
In [98]: import matplotlib.pyplot
plt.plot(strikes,
%matplotlibinline results,as'b')plt
plt.plot(strikes,
plt.xlabel('strike')results, 'ro')
plt.grid(True)
plt.ylabel('European call option value')
Figure14­13.ValueofEuropeanvolatility calloptionfordifferentstrikes
WEB SERVICES ARCHITECTURE
Thewebservicesarchitectureis oftenapowerfulandefficient
alternativetotheprovisionof Python­based analyticalfunctionality,
orevenwholeapplications.ThisholdstruefortheInternetaswellas
provided ina centralizedfashion.
simplifiesupdatesandmaintenance,sincesuchservicesaregenerally
formodelswhereprivatenetworksareused.Thisarchitecturealso

Conclusions
Nowadays,webtechnologiesareanintegralpartofalmostanyapplication
architecture.Theyarenotonlybeneficialforcommunicatingwiththe
outsideworldandprovidingsimpletosophisticatedwebservicesto
externale n t i t i e s , butalsowithin(financial)organizations.
Thischapterfirstillustratessomebasictechniqueswithregardtothemost
commoncommunicationprotocols(mainly
howtoimplementinteractive web FTP and HTTP).It alsoshows
plotting,howtointerfaceinrealtime
withweb­basedfinancialdataAPIs( e . g . , JSON­based) and howtovisualize
suchhighfrequencydatainrealtimewith Bokeh. Thesebasictoolsand
techniquesarehelpfulinalmostanycontext.
levelframeworkstodevelopevencomplexwebapplicationsinrapid
However,thePythonecosystemalsoprovidesanumberofpowerful,high
fashion.WeuseFlask,aframeworkwhichhasgainedsomepopularity
recently,toimplementasimplechatroomfortraderswithsimpleuser
administration(registrationandlogin).Allelementsofatypicalweb
stylingwithCSS—arei
application—corefunctionality l ustratedin. Python, templatingwith Jinja2,and
Finally,thel
services.Usingthea s t sectionint h i s chapteraddressestheimportanttopicofweb
Werkzeug library for asomewhatsimplifiedhandlingof
WSGI applications,weimplementaweb­basedpricingserviceforv olatility
optionsbasedonthemodelandformulaofGruenbichlerandLongstaff
(1996).
FurtherReading
Thefollowingwebresourcesarehelpfulwithregardtothetopicscovered
inthThePythondocumentationshouldbeastartingpointforthebasictools
is chapter:
andtechniquesshownint his chapter:http://docs.python.org;seealso
thisoverviewpage:http://docs.python.org/2/howto/webservers.html.
Youshouldconsultthehomepageof
webfocusedplottingl Bokeh formoreonthis
ibrary: http://bokeh.pydata.org.
FormoreonFlask,s tart withthehomepageoftheframework:
http://flask.pocoo.org;also,downloadthePDFdocumentation:
https://media.readthedocs.org/pdf/flask/latest/flask.pdf.
ApartfromthePythondocumentationitself,consultthehomepageof
theWerkzeuglibraryformoreonwebservices:
http://werkzeug.pocoo.org.
Flask
ForaApplicationswithPython.O’Reilly,Sebastopol,CA.
referenceinbookform,seethefollowing:
Grinberg,Miguel(2014):FlaskWebDevelopment—DevelopingWeb
Finally,herei s theresearchpaperaboutthevaluationofv o l a t
Gruenbichler,Andreas and FrancisLongstaff(1996):“ValuingFutures i l i t y options:
pp.985–1001.
andOptionsonVolatility.”Journal of BankingandFinance,Vol.20,
[50] Fordetailsandbackgroundrefertohttp://en.wikipedia.org/wiki/Ftp.
[51] Fordetailsandbackgroundrefertohttp://en.wikipedia.org/wiki/Http.
[52] Thisexampleis foril ustration purposesonly.Ingeneral,youwouldwanttouse
ibraries suchaslxmlorBeautifulRequests,
[53]There arealternativestotheselibraries,like
specializedl Soup. thatcomewithamore
[54]modernAPI.
For
homepage.moreinformationoninteractiveplotswith matplotlib, refertothelibrary’s
[55] Themajorityofgraphicsformats matplotlib canexporttoarestatic bynature
(i.e., bitmaps).Acounterexampleis graphicsin SVG (Scalable Vector Graphics)
format,whichcanbeprogrammedin JavaScript/ECMAScript. Thelibrary’s
[56]websiteprovidessomeexamplesofhowtodothis.
Seehttp://wiki.python.org/moin/WebFrameworksforfurtherinformationon
Pythonwebframeworks.See
https://wiki.python.org/moin/ContentManagementSystemsforanoverviewof
contentmanagementsystems(CMSs)for Python.
[57] Althoughtheframeworkis til quiterecent(ital startedin2010),thereare
alreadybooksabout Flask available.Cf.Grinberg(2014).
[58] Theexampleapplicationis called Flaskr andrepresentsamicroblog
application.Ourexamplei
Minitwit, s, moreorles , amixturebetween Flaskr and
another Flask exampleapplicationresemblingasimpleTwitterclone.
[59] Seealsothelargercasestudyaboutvolatility optionspresentedinChapter19.
PartII . DerivativesAnalytics
Library
Thispartofthebookisconcernedwiththedevelopment of asmaller,but
neverthelesss t i l powerful,real­worldapplicationforthepricingofoptions
andderivativesbyMonteCarlos i m u l a t i o n . [ 6 0 ] Thegoalistohave,inthe
end,asetofPythonclasses—alibrarywecal DX, forDerivatives
AnalytiX—thatallowsustodothefollowing:
Modeling
Americanoptions,includingtheir underlyingriskfactors,aswell as
theirrelevantmarketenvironments;tomodelevencomplexportfolios
Tomodelshortratesfordiscountingpurposes;tomodelEuropeanand
consistingofmultipleoptionswithmultiple,possiblycorrelated,
underlyingriskfactors
Simulation
TosimulateriskfactorsbasedongeometricBrownianmotionsand
are correlatedornot
theynumberofsuchriskfactorssimultaneouslyandconsistently,whether
jumpdiffusionsaswellasonsquare­rootdiffusions;tosimulatea
Valuation
Tovalue,bytherisk­neutralvaluationapproach,Europeanand
Americanoptionswitharbitrarypayoffs;tovalueportfolioscomposed
ofsuchoptionsinaconsistent,integratedfashion
Riskmanagement
ToestimatenumericallythemostimportantGreeks—i.e.,theDeltaand
theVegaofanoption/derivative—independentlyoftheunderlyingrisk
factorortheexercisetype
Ap liportfolio
cationToustehelibrarytovalueandmanageaVSTOXXvolatility options
VSTOXX)inamarket­basedmanner(i.e., withacalibratedmodelfortheDX
Thematerialpresentedint
QuantsGmbH (in h i s partofthebookreliesonthe
library, whichis developedandofferedbytheauthor and
combinationwiththe Python Quant Platform). Analytics
ThePythonThe
managementofcomplex,multi­riskderivativesandtradingbooks
full­fledgedversionallows,forinstance,themodeling,pricing,andrisk
composedthereof.
Thepartisdividedintothefollowingchapters:
technicalform.Theoretically,theFundamentalTheoremofAsset
Chapter15presentsthevaluationframeworkinboththeoreticaland
and are
Pricing therisk­neutralvaluationapproach central.Technically,for
thechapterpresents Python classesforrisk­neutraldiscountingand
marketenvironments.
Chapter16is concernedwiththesimulationofriskfactorsbased on
diffusionprocesses;agenericclassandthreespecializedclassesare
geometricBrownianmotions,jumpdiffusions,andsquare­root
discussed.
Chapter17addressesthevaluationofsinglederivativeswithEuropean
orgenericandtwospecializedclassesrepresentthemajorbuildingblocks.
Americanexercisebasedonasingleunderlyingriskfactor;again,a
independent
Chapter18i sofaboutthevaluation
ThegenericclassallowstheestimationoftheDelta
theoptiontype. and theVega
ofpossiblycomplexderivatives
portfolioswithmultiplederivativesbasedonmultiple,possibly
correlatedunderlyings;asimpleclassforthemodelingofaderivatives
positioni s presentedaswellasamorecomplexclassforaconsistent
portfoliovaluation.
Chapter19usesthe DX librarydevelopedintheotherchapterstovalue
andmanageaportfolioofoptionsontheVSTOXXv olatility index.
an introductiontoandacomprehensiveoverviewofoptionstrading
andrelatedtopicslike
[60]Cf. Bittman,James(2009):TradingOptionsasaProfessional(McGrawHill,
NewYork)for
optionsriskmanagement.marketfundamentalsandtheroleoftheso­calledGreeksin
Chapter15.
Framework Valuation
Compoundinteresti s
—AlbertEinstein thegreatestmathematicaldiscoveryofa l time.
Thischapterprovidestheframeworkforthedevelopmentofthe DX library
whichprovidesthetheoreticalbackgroundforthesimulationandvaluation.
byintroducingthemostfundamentalconceptsneededforsuchan
undertaking.ItbrieflyreviewstheFundamentalTheoremofAssetPricing,
Iandrisk­neutraldiscounting.Wetakeonlythesimplestcaseofconstant
t thenproceedsbyaddressingthefundamentalconceptsofdatehandling
shortratestoforthelibraryquiteeasily.Thischapteralsointroducestheconcept
beadded thediscounting,butmorecomplexandrealistic modelscan
ofa_marketenvironment_—i.e.,acollectionofconstants,l ists, andcurves
neededfortheinstantiationofalmostanyotherclasstocomeinsubsequent
chapters.
FundamentalTheoremofAssetPricing
TheFundamentalTheoremofAssetPricingisoneofthecornerstonesand
successstoriesofmodernfinancialtheoryandmathematics.[61]Thecentral
notionunderlyingtheFundamentalTheoremofAssetPricingistheconcept
ofamartingalemeasure;i .e.,a probabilitymeasurethatremovesthedrift
fromadiscountedriskfactor(stochasticprocess).Inotherwords,undera
martingalemeasure,allriskfactorsdriftwiththerisk­freeshortrate—and
notwithanyothermarketrateinvolvingsomekindofriskpremiumover
therisk­freeshortrate.
ASimpleExample
Considerasimpleeconomyatthedatestoday and tomorrowwitharisky
asset,a“stock,” and10arisklessasset,a“bond.”Thebondcosts
todayandpaysoff
10USDtodayand,withaprobabilityof60%and40%,respectively,pays 10 USD
USDtomorrow(zerointerest rates). Thestockcosts
off20USDand0USDtomorrow.Therisklessreturnofthebond is 0.The
expectedreturnofthestocki
premiumthestockpaysfori s
t s riskiness. ,or20%.Thisistherisk
can taketheexpectation,forexample,anddiscount
valueofsuchacontingentclaimthatpays5USDwith60%probabilityand
Considernowacalloptionwithstrikepriceof15USD.Whatisthef
0USDotherwise?We air
avalueof0.6 · 5=3USD,sincetheoptionpays5USDinthecasewhere
theresultingvalueback (here with zero interest rates). Thisapproachyields
thestockpricemovesupto 20 USDand0USDotherwise.
aportfoliooftradeds
optionpricingproblemslikethis:replicationoftheoption’spayoffthrough
However,thereis anotherapproachthathasbeensuccessfullyappliedto
ecurities. It iseasilyverifiedthatbuying0.25ofthe
stockperfectlyreplicatestheoption’spayoff(inthe60%casewethenhave
Whyist
overvaluestheoption.
USD.Takingexpectationsunderthereal­worldprobabilitymeasure
0.25·20=5USD).Aquarterofthestockonlycosts2.5USDandnot3
his case?The
real­worldmeasureimpliesariskpremiumof20%
forthestocksincetheriskinvolvedinthestock(gaining100%orlosing
“real” inthesensethat it cannotbediversifiedorhedgedaway.
100%)i s
Ontheotherhand,therei s aportfolioavailablethatreplicatestheoption’s
payoffwithoutanyr isk. Thisalsoimpliesthatsomeonewriting(selling)
suchanoptioncancompletelyhedgeawayanyrisk.[62]Suchaperfectly
hedgedportfolioofanoptionandahedgepositionmustyieldtheriskless
rateinordertoavoidarbitrageopportunities(i.e.,theopportunitytomake
somemoneyoutofnomoneywithapositiveprobability).
wesavetheapproach of takingexpectationstovaluethecaaway
CanYes,wecan.We“only”havetochangetheprobabilityinsuch l option?that
theriskyasset,thestock,driftswiththerisklessshort
scenariosaccomplishest his; thecalculationis ofzero.rate
Obviously,a(martingale)measuregivingequalmassof50%toboth
expectationsoftheoption’spayoffunderthenewmartingalemeasure .Now,taking
yieldsthecorrect(arbitrage­free)fair value:0.5 · 5+0.5 · 0=2.5USD.
TheGeneralResults
Thebeautyofthisapproachis thatit carriesovertoeventhemostcomplex
economieswith,forexample,continuoustimemodeling(
ofpointsintimetoconsider),largenumbers of i . e .
riskyassets,complex , a continuum
derivativepayoffs,e tc.
Therefore,considerageneralmarketmodelindiscretet i m
Ageneralmarketmodel ℳ indiscretetimeis acollectionof:
e : [ 6 3 ]
Afinite statespaceᵯ
Afiltration
Astrictly positiveprobabilitymeasurePdefinedon℘(ᵯ)
AterminaldateTòĪ,T<
Aset
processes
Ą
ofK+1strictly positivesecurityprice
Wewriteℳ={(ᵯ,℘(ᵯ),IJ,P),T,ĺ}.
Basedonsuchageneralmarketmodel,wecanformulatetheFundamental
Theorem of AssetPricingasfol ows:[64]
Considerthegeneralmarketmodelℳ.AccordingtotheFundamental
TheoremofAssetPricing,thefollowingthreestatementsareequivalent:
Therearenoarbitrageopportunitiesinthemarketmodelℳ.
Theset ĭ ofP­equivalentmartingalemeasuresis nonempty.
Theset Ĭofconsistentlinearpricesystemsis nonempty.
Wheni
l ustrattecomestovaluationandpricingofcontingentclaims(
iderivatives,futures,forwards,swaps,e i.e., options,
d bythefollowingcorollary: tc.), theimportanceofthetheoremis
f themarketmodel ℳ isarbitrage­free,thenthereexistsauniqueprice
Iassociatedwithanyattainable( i.e., replicable)contingentclaim(option,
derivative,etc.) VT. Itsatisfies ,wheree–rTis therelevant
risk­neutraldiscountfactorforaconstantshortrater.
simplereasoningfromtheintroductoryaboveindeedcarriesovertothe
Thisresulti l ustrates theimportanceofthetheorem,andshowsthatour
generalmarketmodel.
Duetotheroleofthemartingalemeasure,t his approachtovaluationis also
measureal riskyassetsdriftwiththerisklessshortrate—therisk­neutral
oftencalledthemartingaleapproach,or—sinceunderthemartingale
valuationapproach.Thesecondtermmight,forourpurposes,bethebetter
onebecauseinnumericalapplications,we“simply”lraete.theriskfactors
(stochasticprocesses)driftbytherisk­neutralshort Onedoesnothave
todealwiththeprobabilitymeasuresdirectlyforourapplications—they
are,applyandthetechnicalapproachweimplement.
however,whattheoreticallyjustifies thecentraltheoreticalresultswe
Finally,considermarketcompletenessinthegeneralmarketmodel:
Themarketmodel ℳ is completeiℳisf itarbitrage­free.Themarketmodelis
Supposethatthemarketmodel is arbitrage­freeandii.e.,freplicable).
contingentclaim(option,derivative,etc.)isattainable( every
martingalemeasure. f ĭ is asingleton;i.e., if thereis auniqueP­equivalent
completeifandonlyi
Thismainlycompletesthediscussionofthetheoreticalbackgroundfor
whatfollows.Foradetailedexpositionoftheconcepts,notions,definitions,
andresults, refertoChapter4ofHilpisch(2015).
Risk­NeutralDiscounting
Obviously,risk­neutraldiscountingis centraltotherisk­neutralvaluation
approach.Wetherefores tart bydevelopinga
discounting.However,itpays to f i r s t Python classforrisk­neutral
haveacloserlookatthemodeling
andhandlingofrelevantdatesforavaluation.
ModelingandHandlingDates
Anecessaryprerequisitefordiscountingi s themodelingofdates(seealso
intervalbetweentodayandthefinaldateofthegeneralmarketmodelT
AppendixC).Forvaluationpurposes,onetypicallydividesthetime
equallength),ortheycanbeheterogenous( i.e., ofvaryinglength).Ai.e.,ofinto
discretetimeintervals.Thesetimeintervalscanbehomogenous(
heterogeneoustimeintervals,sincethesimplercaseisthenautomatically
valuationlibraryshouldbeabletohandlethemoregeneralcaseof
included.Therefore,weworkwithl i s t s ofdates,assuming
to
relevanttimeintervalis oneday.Thisimpliesthatwedonot
intradayevents,forwhichwewouldhave that
(in thesmallest
modeltime additionaboutto
care
dTocompileal
ates).[65] ist ofrelevantdates,onecanbasicallytakeoneoftwo
datetime.datetime
approaches:constructingal
objectsinPython)orof
ist ofconcretedates(yearfractions(asdecimal
e.g., as
numbers,asis oftendone in theoreticalworks).
Forexample,thefollowingtwodefinitionsof
(roughly)equivalent: dates and fractions are
In [1]: import datetime as dt
In [2]: dates = [dt.datetime(2015,dt.datetime(2016,
1, 1), dt.datetime(2015,
1, 1)] 7, 1),
InOut[3]:
[3]: (dates[1] - dates[0]).days / 365.
0.4958904109589041
InOut[4]:
[4]:(dates[2] -dates[1]).days / 365.
0.5041095890410959
In [5]:fractions = [0.0, 0.5, 1.0]
yearby50.(0 a.m.)ofacertainday.Justconsidertheresultofdividinga
beginning
Theyareonlyroughlyequivalentsinceyearfractionsseldomlie on the
Sometimesi
function t i s necessarytogetyearfractionsoutofal i s t
get_year_deltas presentedinExample15­1doesthejob.
Example15­1.Functiontogetyearfractionsfromal ofdates.The
ist orarrayof
datetimeobjects
#
## DXget_year_deltas.py
Library Frame
#importnumpyas np
def '''get_year_deltas(date_list,
Return vector of floatsday_count=365.):
with day deltas in years.
Initial value normalized to zero.
Parameters
==========
date_list: listoforarray
numberof: floatdaysdatetime
collection
day_count forayearobjects
(to account for different conventions)
Results
delta_list
=======
year : array
fractions
'''
start =date_list[0]
delta_list =[(date - start).days / day_count
for date
return np.array(delta_list) in date_list]
Thisfunctioncanthenbeapplied as follows:
In [1]: import datetime as dt
In [2]: dates = [dt.datetime(2015,dt.datetime(2016,
1, 1), dt.datetime(2015,
1,1)] 7, 1),
InOut[4]:
[3]: array([0.
get_year_deltas(dates)
, 0.49589041, 1. ])
Whenmodelingtheshortrate, it becomesclearwhatthebenefitofthis is.
ConstantShortRate
Wefocusonthesimplestcasefordiscountingbytheshortr a t e ; namely,the
casewheretheshortrateis constantthroughtime.Manyoptionpricing
models,liketheonesofBlack­Scholes­Merton(1973),Merton(1976),and
continuousdiscounting,asisusualforoptionpricingapplications.Insucha
Cox­Ross­Rubinstein(1979),makethis assumption.[66]Weassume
case,thegeneraldiscountfactorasoftoday,givenafuturedatet
constantshortrateofr, i s thengiven by D0(t)= e – r t . Of and
course,fortheend a
oftheeconomywehavethespecialcaseD
andTareinyearfractions. 0(T)=e – r T . Notethatherebotht
Thediscountfactorscanalsobeinterpretedasthevalueofaunitzero­
twodatest ĤĤ
couponbond(ZCB)asoftoday,maturinga
is thengivenbytheequationDs(t)=D0(t)/D0(s)=e
r(t–s).
t tandT,r–rets/e–rs=e
s 0,thediscountfactorrelevantfordiscountingfromttos pectively–.r[t6·7]erGiven
s =e–
Example15­2presentsaPythonclassthattranslatesallthese
considerationsinto Python c o d e . [ 6 8 ]
Example15­2.Classforrisk­neutraldiscountingwithconstantshortrate
#
## DXconstant_short_rate.py
Library Frame
#from get_year_deltas import *
class'''Classforconstantshort
constant_short_rate(object):rate discounting.
Attributes
==========
name:namestring
of:thefloatobject(positive)
short_rate
constant ratefor discounting
Methods
=======get discount factors: given a list/array of datetime objects
get_discount_factors
''' or year fractions
def __init__(self,
self.short_ratename,
self.name=name short_rate):
= short_rate
if short_rate<0:
raise ValueError('Short rate negative.')
def ifdtobjects
get_discount_factors(self,
is True: date_list, dtobjects=True):
else:dlist =get_year_deltas(date_list)
dflist dlist= np.exp(self.short_rate
= np.array(date_list) * np.sort(-dlist))
return np.array((date_list, dflist)).T
Theapplicationoftheclass
simple,concreteexample. constant_short_rateis
We to bestillustratedbya
stick thesamelist of datetime objectsas
before:
In [1]: import datetime as dt
In ...:
[2]: dates = [dt.datetime(2015,
dt.datetime(2016, 1,1, 1)]1), dt.datetime(2015, 7, 1),
In [3]: from constant_short_rate import *
In [4]: csr = constant_short_rate('csr', 0.05)
InOut[5]:
[5]: csr.get_discount_factors(dates)
array([[datetime.datetime(2015,
[datetime.datetime(2015, 1,1,0,
7, 1, 0, 0), 0.9755103387657228],
0), 0.95122942450071402],
[datetime.datetime(2016, 1, 1, 0, 0), 1.0]],dtype=object)
Themainresultisatwo­dimensional
datetime ndarray objectcontainingpairsofa
objectandtherelevantdiscountfactor.Theclassingeneraland
theobject csr inparticularworkwithyearfractionsaswell:
In [7]: deltas = get_year_deltas(dates)
InOut[8]:
[8]: csr.get_discount_factors(deltas, dtobjects=False)
array([[[ 0.0.49589041,, 0.97551034],
0.95122942],
[ 1. , 1. ]])
Thisclasswilltakecareofa
classes. l discountingoperationsneededinother
MarketEnvironments name foracollectionofotherdataand
Python objects.However,it is ratherconvenienttoworkwitht
Marketenvironmentis“just”a his
abstractionsincei t simplifiesanumberofoperationsandalsoallowsfora
consistentmodelingofrecurringa s p e c t s . [ 6 9 ] Amarketenvironmentmainly
Python
Constants
consistsofthreedictionariestostorethefollowingtypesofdataand
objects:
Thesecanbe,forexample,modelparametersoroptionmaturitydates.
ListsThesearesequencesofobjectsingeneral,likea list objectofobjects
modeling(risky)securities.
Curves
Theseareobjectsfordiscounting;forexample,likeaninstanceofthe
constant_short_ratemarket_environment
Example15­3presentsthe clas .
Example15­3.
forarefresheronthehandlingof dict objects. c l a s . RefertoChapter4
lists, andcurvesClassformodelingamarketenvironmentwithconstants,
# DXmarket_environment.py
LibraryFrame
#
class'''market_environment(object):
Class to model a market environment relevant for valuation.
Attributes
==========
name:namestringof the:datetime
market environment
pricing_date object
date of the market environment
Methods
=======
add_constant
addsa : (e.g. model parameter)
constant
getsa constant
get_constant :
add_list
adds :a list (e.g. underlyings)
get_list
gets a: list
add_curve
adds a :market curve (e.g. yield curve)
get_curve
add_environment: : curve
gets amarket
adds and overwrites whole market environments
''' with constants, lists, and curves
def __init__(self,
self.pricing_datename,= pricing_date
self.name=name pricing_date):
self.constants
self.lists = {} = {}
self.curves = {}
def add_constant(self,
self.constants[key]key,= constant constant):
def get_constant(self,
return self.constants[key] key):
defadd_list(self,
self.lists[key]key,= list_object list_object):
defget_list(self,
return self.lists[key] key):
def add_curve(self,
self.curves[key] = curve key, curve):
def get_curve(self,
return self.curves[key] key):
def add_environment(self,
#overwritesexisting env):
values, if they exist
forkeyin
self.constants[key]
env.constants: = env.constants[key]
for keyinenv.lists:self.lists[key]= env.lists[key]
for self.curves[key]
key in env.curves:= env.curves[key]
Althoughthereisnothingspecialinthe market_environment
simpleexampleshallil ustrate howconvenienti clas , a
t is toworkwithinstances
oftheclass:
In [1]: from market_environment import *
In [2]: import datetime as dt
In [3]: dates = [dt.datetime(2015,dt.datetime(2016,
1, 1), dt.datetime(2015,
1, 1)] 7, 1),
In [4]: csr = constant_short_rate('csr', 0.05)
In [5]: me_1 = market_environment('me_1', dt.datetime(2015, 1, 1))
In [6]: me_1.add_list('symbols', ['AAPL', 'MSFT', 'FB'])
InOut[7]:
[7]:me_1.get_list('symbols')
['AAPL', 'MSFT', 'FB']
In [9]:
[8]:me_2
me_2.add_constant('volatility',
=market_environment('me_2',0.2)dt.datetime(2015, 1, 1))
In
Indiscounting
[10]: me_2.add_curve('short_rate',
class csr) # add instance of
InOut[11]:
[11]: me_2.get_curve('short_rate')
<constant_short_rate.constant_short_rate at 0x104ac3c90>
In [12]: me_1.add_environment(me_2) # add complete environment
InOut[13]:
[13]: me_1.get_curve('short_rate')
<constant_short_rate.constant_short_rate at 0x104ac3c90>
InOut[14]:
[14]: me_1.constants
{'volatility': 0.2}
In[15]: me_1.lists ['AAPL', 'MSFT', 'FB']}
Out[15]: {'symbols':
In[16]:
Out[16]: me_1.curves
{'short_rate': <constant_short_rate.constant_short_rate at
0x104ac3c90>}
InOut[17]:
[17]: me_1.get_curve('short_rate').short_rate
0.05
Thisillustratesthebasichandling of this rathergeneric“storage”clas . For
practicalapplications,marketdataandotherdataaswellas Python
st collected,thena market_environments thendeliveredinasingle
areffil edirwiththerelevantdataandobjects.Thisi objects
objectis instantiatedand
steptootherclassesthatneedthedataandobjectsstoredintherespective
market_environment object.
example,thatinstancesrate iofs set—alltheinstances
newconstantshort of the classcanlive in
the constant_short_rate
Amajoradvantageofthisobject­orientedmodelingapproachis,for
multipleenvironments.Oncetheinstancei s updated—forexample,whena
market_environment
discountingclasswillbeupdatedautomatically.
classcontainingthatparticularinstanceofthe
Conclusions
Thischapterprovidestheframeworkforthelargerprojectofbuildinga
Python librarytovalueoptionsandotherderivativesbyMonteCarlo
simulation.ThechapterintroducestheFundamentalTheoremofAsset
Pricing,i l u s t r
resultsinthisregard a t i n g i t byarathersimplenumericalexample.Important
are provided fora generalmarketmodelindiscrete
time.
Thechapteralsodevelopsa Python classforrisk­neutraldiscounting
purposestomakenumericaluseofthemachineryoftheFundamental
TheoremofAssetPricing.BasedonalistofeitherPython datetime
constant_short_rate providethe respectivediscountfactors(present
objectsorfloatsrepresentingyearfractions,instancesoftheclass
valuesofunitzero­couponbonds).
Thechapterconcludeswiththerathergeneric market_environment
whichallowsforthecollectionofrelevantdataand c
Python objectsfor l a s ,
modeling,simulation,valuation,andotherpurposes.
Tosimplifyfutureimportswewilluseawrappermodulecalled
dx_frame.py, aspresentedinExample15­4.
Example15­4.Wrappermoduleforframeworkcomponents
#
##dx_frame.py
DXLibrary Frame
#import datetime as dt
from
from get_year_deltas import
constant_short_rate get_year_deltas
import constant_short_rate
from market_environment import market_environment
Asingle import statementlikethefollowingthenmakesa
componentsavailableinasinglestep: l framework
from dx_frame import *
Thinkingofa
option to Python libraryandapackageofmodules,therei
storeallrelevant Python modulesina(sub)directory s alsothe
and toput in
thatdirectoryaspecial
whenstoringa initfile
in thatdoesalltheimports.Forexample,
l modules adirectorycalled dx, say,thefile presentedinhis
Example15­5doesthejob.However,noticethenamingconventionfort
particularfile.
Example15­5.Pythonpackagingfile
##DX Libraryfile
## packaging
__init__.py
#import datetime as dt
fromget_year_deltas
from importimportget_year_deltas
constant_short_rate constant_short_rate
from market_environment import market_environment
Inthatcaseyoucanjust usethedirectorynametoaccomplishallthe
importsatonce:
from dximport *
Orviathealternativeapproach:
import dx
FurtherReading
Usefulreferencesinbookformforthetopicscoveredinthis chapterare:
Delbaen,FreddyandWalterSchachermayer(2004):TheMathematics
ofFletcher,ShayneandChristopherGardner(2009):
Arbitrage.SpringerVerlag,Berlin,Heidelberg.
Python.JohnWiley&Sons,Chichester,England.FinancialModelling
inHilpisch,Yves(2015):DerivativesAnalyticswithPython.
Wiley
Finance,Chichester,England.http://derivatives­analytics­with­
Williams,
python.com.
David(1991):ProbabilitywithMartingales.Cambridge
Fortheoriginalresearchpapersdefiningthemodelscitedinthis
UniversityPress,Cambridge,England.
refertothe“FurtherReading”sectionsinsubsequentchapters. chapter,
[61]Cf. thebookbyDelbaenandSchachermayer(2004)foracomprehensivereview
anddetailsofthemathematicalmachineryinvolved.SeealsoChapter4ofHilpisch
(2015)forashorterintroduction,inparticularforthediscretetimeversion.
Thestrategywouldinvolvesellinganoptionatapriceof2.5USDandbuying
[62]0.25stocksfor2.5USD.Thepayoffofsuchaportfolioi s 0nomatterwhatscenario
playsoutinthesimpleeconomy.
[63] Cf.Williams(1991)ontheprobabilisticconcepts.
[64] Cf.DelbaenandSchachermayer(2004).
neverthelessnotdoneherefortheeaseoftheexposition.
[65]Adding atimecomponentis actuallyastraightforwardundertaking,whichis
Forthepricingo
[66]satisfied f, forexample,short­datedoptions,this assumptionseems
in manycircumstances.
Aunitzero­couponbondpaysexactlyonecurrencyunitatits maturityandno
[67]couponsbetweentodayandmaturity.
SeeChapter13forthebasicsofobject­orienteddevelopmentin
[68]andfortherestofthispart,wedeviatefromthestandardPEP8naming
conventionswithregardto Pythonclass names.PEP8recommendsusing Python. Here,
“CapWords”or“CamelCase”conventioningeneralfor
alternative“incaseswheretheinterfacei Pythonclass names.We
ratherusethefunctionnameconventionasmentionedinPEP8asavalid
s documentedandusedprimarilyas
callable.” a
[69]On this conceptseealsoFletcherandGardner(2009),whousemarket
environmentsextensively.
Chapter16. Simulationof
FinancialModels
Thepurposeofscienceis nottoanalyzeordescribebuttomakeuseful
modelsoftheworld.
—EdwarddeBono
Chapter10introducesinsomedetailtheMonteCarlosimulationof
stochasticprocessesusing Python and NumPy. Thischapterappliesthe
basictechniquespresentedtheretoimplementsimulationclassesasa
centralcomponentofthe DX library. Werestrict ourattentiontothree
widelyusedstochasticprocesses:
GeometricBrownianmotion
Thisis theprocessthatwasintroducedtotheoptionpricingliterature
bytheseminalworkofBlackandScholes(1973);itisusedseveral
timesthroughoutt
shortcomingsandgiventhemountingempiricalevidencefromfinancial
his bookandstil represents—despiteitsknown
purposes.
reality—abenchmarkprocessforoptionandderivativevaluation
Jumpdiffusion
Thejumpdiffusion,asintroducedbyMerton(1976),addsalog­
normallydistributedjumpcomponenttothegeometricBrownian
motion(GBM);thisallowsustotakeintoaccountt hat, forexample,
short­termout­of­the­money(OTM)optionsoftenseemtohavepriced
inthepossibilityoflargejumps.Inotherwords,relyingonGBMasa
optionssatisfactorily,whileajumpdiffusion may beabletodoso.
financialmodeloftencannotexplainthemarketvaluesofsuchOTM
Square­rootdiffusion
processstays
iandRoss(1985),isusedtomodelmean­revertingquantitieslike
Thesquare­rootdiffusion,popularizedforfinancebyCox,Ingersoll,
nterest ratesandvolatility; inadditiontobeingmean­reverting,the
thosequantities.positive,whichis generallyadesirablecharacteristicfor
Thechapterproceedsinthefirstsectionwithdevelopingafunctionto
generatestandardnormallydistributedrandomnumbersusingvariance
reductiontechniques.[70]Subsequentsectionsthendevelopageneric
aforementionedstochasticprocessesofi
simulationclassandthreespecificsimulationclasses,oneforeachofthe
nterest.
Forfurtherdetailsonthesimulationofthemodelspresentedint
referalsotoHilpisch(2015).Inparticular,thatbookalsocontainsahis chapter,
completecasestudybasedonthejumpdiffusionmodelofMerton(1976).
RandomNumberGeneration
Randomnumbergenerationis acentraltaskofMonteCarlosimulation.[71]
Chapter10showshowtouse Python andl i b r a r i e s suchas
togeneraterandomnumberswithdifferentdistributions.Forourprojectatnumpy.random
importantones.Thatiswhyitpaysofftohaveaconveniencefunction
hand,standardnormallydistributedrandomnumbersarethemost
availableforgeneratingthis particulartypeofrandomnumbers.
Example16­1.Functiontogenerate
Example16­1presentssuchafunction.
numbers standardnormallydistributedrandom
import numpy as np
def sn_random_numbers(shape, antithetic=True, moment_matching=True,
fixed_seed=False):
'''Returns an arrayof shapeshapewith (pseudo)randomnumbers
thatare standard normally distributed.
Parameters
==========
shape:tuple
generation(o,of n,m)
antithetic: of array
generationBoolean with shape (o, n, m)
antithetic
moment_matching : Booleanand secondvariates
matching offirst moments
fixed_seed : Boolean
flag to fix the seed
Results
=======
ran : (o, n, m) array of (pseudo)random numbers
'''iffixed_seed:
ifantithetic:
np.random.seed(1000)
2)) ran = np.random.standard_normal((shape[0],
-ran), axis=2) shape[1], shape[2]/
ran = np.concatenate((ran,
else:ran = np.random.standard_normal(shape)
if moment_matching:
ran=ran- np.mean(ran)
ran
if shape[0]= ran /
== 1: np.std(ran)
return ran[0]
else:return ran
Thevariancereductiontechniquesusedint
Theapplicationofthefunctioni
pathsandmomentmatching,arealsoi h i s
s straightforward: function,namelyantithetic
l ustrated inChapter10.[72]
In[1]: from sn_random_numbers import*
In [2]:
...: snrn = sn_random_numbers((2, 2,2), antithetic=False,
moment_matching=False,
...: fixed_seed=True)
InOut[3]:
[3]: snrn , 0.32093155],
array([[[-0.8044583
[-0.02548288, 0.64432383]],
[[-0.30079667, 0.38947455],
[-0.1074373 , -0.47998308]]])
In ...:
[4]: snrn_mm = sn_random_numbers((2, 3, 2), antithetic=False,
moment_matching=True,
...: fixed_seed=True)
InOut[5]:
[5]:snrn_mm
array([[[-1.47414161,
[0.01049828, 0.67072537],
[-0.51421897, 1.28707482],
0.80136066]],
[[-0.14569767,
[[ 1.19313679, -0.85572818],
1.3308292 , -0.82653845],
-1.47730025]]])
InOut[6]:
[6]: 1.8503717077085941e-17
snrn_mm.mean()
InOut[7]:
[7]: 1.0snrn_mm.std()
Thisfunctionwillproveaworkhorseforthesimulationclassestofollow.
GenericSimulationClass
Object­orientedmodeling—asintroducedinChapter13—allows
inheritanceofattributesandmethods.Thisi s whatwewanttomakeuseof
whenbuildingoursimulationclasses:westartwithagenericsimulation
classcontainingthoseattributesandmethodsthata
Tobeginwith,it isnoteworthythatweinstantiateanl othersimulation
classesshare.
simulationclassby“only”providingthreeat ributes: objectofany
nameA string objectasanameforthemodelsimulationobject
mar_env
Aninstanceofthe market_environment class
corrAflag(bool)indicatingwhethertheobjectis correlatedornot
Thisagaini
singlestep al u s t r a t e s theroleofamarketenvironment:toprovideina
dataandobjectsrequired for simulationandvaluation.The
methodsofthegenericclassare:
generate_time_grid
Thismethodgeneratesthetimegridofrelevantdatesusedforthe
same foreverysimulationclas .
simulation;thistaskisthe
get_instrument_values
Everysimulationclasshastoreturnthe ndarray objectwiththe
simulatedinstrumentvalues(
Example16­2presents
prices,volatilities). e . g . , simulatedstockprices,commodities
suchagenericmodelsimulationclas . Themethods
makeuseofothermethodsthatthemodel­tailoredclasseswillprovide,like
self.generate_paths. Alldetailsinthis regardwillbecomeclearwhen
wehavethefullpicture ofa specialized,nongenericsimulationclass.
Example16­2.Genericfinancialmodelsimulationclass
##DXLibrary Simulation
simulation_class.py
#iimport
mport numpy
pandasasasnppd
class'''simulation_class(object):
Providing base methods for simulation classes.
Attributes
==========
namename:string
ofthe objectof market_environment
mar_envmarket:instance
environment data forothersimulationobject
corrTrue:Boolean
if correlated with model
Methods
=======
generate_time_grid
returns time : for simulation
grid
get_instrument_values
returnsthecurrent : instrument values (array)
'''
def __init__(self,
try:self.name name, mar_env,
self.pricing_date
=name corr):
= mar_env.pricing_date
self.initial_value
self.volatility = = mar_env.get_constant('initial_value')
mar_env.get_constant('volatility')
self.final_date= mar_env.get_constant('final_date')
self.currency = mar_env.get_constant('currency')
self.paths = mar_env.get_constant('paths')
self.frequency= mar_env.get_constant('frequency')
self.discount_curve
try:#if time_grid in= mar_env.get_curve('discount_curve')
#self.time_grid mar_env take this
(forportfolio= valuation)
mar_env.get_list('time_grid')
except:self.time_grid = None
try:#if thereare special dates,thenadd these
except:self.special_dates
self.special_dates = []mar_env.get_list('special_dates')
self.instrument_values
self.correlated = corr == None
ifcorr#onlyis True:
needed inarea correlated
portfolio context when
#risk factors
self.cholesky_matrix =
mar_env.get_list('cholesky_matrix')
self.rn_set = mar_env.get_list('rn_set')[self.name]
except:print "Error parsing market= mar_env.get_list('random_numbers')
self.random_numbers
environment."
def generate_time_grid(self):
start=self.pricing_date
end# pandas
= self.final_date
date_rangeforBusiness
function
#freq
#time_grid =e.g.'B'
'W' for Weekly, 'M' for Day,
Monthly
= pd.date_range(start=start, end=end,
freq=self.frequency).to_pydatetime()
time_grid
#ifstart = list(time_grid)
enhancenotintime_grid:
time_grid by start,end, andspecial_dates
time_grid.insert(0,
# insert startdateifstart)not in list
# insertinendtime_grid:
if endnot dateif not in list
time_grid.append(end)
if len(self.special_dates)
#addall special dates> 0:
time_grid.extend(self.special_dates)
#delete
time_grid duplicates
= list(set(time_grid))
#sortlist
time_grid.sort()
self.time_grid = np.array(time_grid)
def ifget_instrument_values(self,
self.instrument_values isfixed_seed=True):
None:if there are no instrument values
#only initiate simulation
self.generate_paths(fixed_seed=fixed_seed, day_count=365.)
elif#also
fixed_seed is False:
initiate resimulation when fixed_seedday_count=365.)
is False
self.generate_paths(fixed_seed=fixed_seed,
return self.instrument_values
Parsing of themarketenvironment
followinglineofcodei i s embeddedinasingleailstry-except
codeconcise,therearenosanitychecksimplemented.Forexample,the
clause,whichraisesanexceptionwhenevertheparsingf
s considereda“success,”nomatteri .f Tokeepthe
thecontentis
rathercarefulwhencompilingandpassing market_environment objectsto
indeedaninstanceofadiscountingclassornot.Therefore,onehastobe
anysimulationclass:
self.discount_curve = mar_env.get_curve('discount_curve')
Table16­1showsa l componentsthata market_environment objectmust
containforthegenericandthereforeforallothersimulationclasses.
Table16­1.Elementsofmarketenvironmentfora
Type
l simulationclasses
Element Mandatory Description
initial_value Constant Yes Initial valueofprocessat pricing_date
volatility Constant Yes Volatilitycoefficientofprocess
final_date Constant Yes Simulationhorizon
currency Constant Yes Currencyofthefinancialentity
frequency Constant Yes Datefrequency,as pandasfreq parameter
paths Constant Yes Numberofpathstobesimulated
discount_curve Curve Yes Instanceof constant_short_rate
time_grid List No Timegridofrelevantdates
context) (in portfolio
random_numbers List No Randomnumber array (forcorrelated
objects)
cholesky_matrix List No Choleskymatrix(forcorrelatedobjects)
rn_set List No dict objectwithpointertorelevant
randomnumbers et
Everythingthathastodowiththecorrelationofmodelsimulationobjectsi
in
simulationofsingle,uncorrelatedprocesses.Similarly,theoption
explained subsequentchapters.Inthischapter,wefocusonthe passato s
time_grid
later. isonlyrelevantinaportfoliocontext,somethingalsoexplained
GeometricBrownianMotion
Equation16­1(seealsoEquation10­2inChapter10,inparticularforthe
GeometricBrownianmotionisastochasticprocessasdescribedin
already set equaltotheriskless,constantshortraterrift oftheprocessi
meaningoftheparametersandvariables).Thed , implyingthatwes
operateundertheequivalentmartingalemeasure(seeChapter15).
Equation16­1.StochasticdifferentialequationofgeometricBrownian
motiondSt =rStdt+ᵰStdZt
Equation16­2presentsanEulerdiscretizationofthestochasticdifferential
generalmarketmodel
0<tfurtherd ℳfrom Chapter15,withafiniteset of relevantdates
1< te2tails). Weworkinadiscretetimemarketmodel,suchasthe
equationforsimulationpurposes(seealsoEquation10­3inChapter10for
< < T. to
Equation16­2.Differenceequation
motion simulatethegeometricBrownian

TheSimulationClass
Example16­3nowpresentsthespecializedclassfortheGBMmodel.We
presentitinits entiretyfirst and highlightselectedaspectsafterward.
Example16­3.SimulationclassforgeometricBrownianmotion
##DX Library Simulation
# geometric_brownian_motion.py
#from
importnumpy
sn_random_numbers
asnp import sn_random_numbers
from simulation_class import simulation_class
class'''Classtogenerate
geometric_brownian_motion(simulation_class):
the Black-Scholes-Mertonsimulatedpaths based motion
geometric Brownian on model.
Attributes
==========
namename: string
of the objectof market_environment
mar_envmarket:instance
environment data for simulation
corrTrue:Boolean
ifcorrelated with other model simulation object
Methods
=======
updateupdates
: parameters
generate_paths
returns : Carlo paths given the market environment
Monte
'''
def __init__(self, name, mar_env, corr=False):
super(geometric_brownian_motion, self).__init__(name, mar_env,
corr)
def update(self,
final_date=None):if initial_value=None,
initial_value
self.initial_valueisnotNone: volatility=None,
= initial_value
ifvolatility is not None:
self.volatility= volatility
iffinal_date is not None:
self.final_date = final_date
def generate_paths(self, =None day_count=365.):
self.instrument_valuesfixed_seed=False,
if self.time_gridisNone:
self.generate_time_grid()
#M=numberof # methoddatesfor
from generic
time simulation class
grid
len(self.time_grid)
##I=number of paths for path simulation
array
paths= initialization
self.paths
#paths[0] np.zeros((M,
initialize first I)) initial_value
datewith
= self.initial_value
if not# ifnot
self.correlated:
correlated, generateM,random
rand =sn_random_numbers((1, I), numbers
else:# ifcorrelated, use randomfixed_seed=fixed_seed)
number objectas provided
inmarket
#rand = self.random_numbers
environment
short_rate
#get = self.discount_curve.short_rate
short ratelen(self.time_grid)):
fordriftof process
for #t select
in range(1,
therand[t]
righttime slicefrom the relevant
#randomran=self.correlated:
ifnot numberset
else:ran = np.dot(self.cholesky_matrix, rand[:, t, :])
dt =ran(self.time_grid[t]
= ran[self.rn_set]- self.time_grid[t - 1]).days /
day_count # difference between two dates asyear fraction
paths[t] = paths[t - 1] * np.exp((short_rate
* - 0.5 ** 2) *
self.volatility
dt + self.volatility * np.sqrt(dt) *
ran) # generate simulated
Inthisparticularcase,the = pathsvalues for theobjecthastocontainonly
market_environment
self.instrument_values respective date
thedataandobjectsshowninTable16­1—i.e.,theminimumsetof
components.
Themethod update doeswhatioftsthemodel.Themethod
selectedimportantparameters namesuggests:itallowstheupdatingof
generate_paths
is,ofcourse,ab it moreinvolved.However,ithasanumberofinline
commentsthatshouldmakeclearthemostimportantaspects.Some
correlationbetweendifferentmodelsimulationobjects.Thiswillbecome
complexityisbroughtintothis methodby,inprinciple,allowingforthe
clearer,especiallyinExample18­2.
AUseCase
geometric_brownian_motion
market_environment irst, wel ushavetratestogeneratea
IPythonclassessioni
Thefollowinginteractiveobjectwitha l. Fmandatoryelements: theuse of the
In [1]: from dx import *
In1))[2]: me_gbm = market_environment('me_gbm', dt.datetime(2015, 1,
In [3]: me_gbm.add_constant('initial_value',
me_gbm.add_constant('volatility', 36.)
0.2)
me_gbm.add_constant('final_date','EUR')
me_gbm.add_constant('currency', dt.datetime(2015, 12, 31))
#monthlyfrequency(respective'M')month end)
me_gbm.add_constant('frequency',
me_gbm.add_constant('paths',10000)
In [4]: csr = constant_short_rate('csr', 0.05)
In [5]: me_gbm.add_curve('discount_curve', csr)
Second,weinstantiateamodelsimulationobject:
In [6]: from dx_simulation import *
In [7]: gbm = geometric_brownian_motion('gbm', me_gbm)
Third,wecanworkwiththeobject.Forexample,l
inspectthetime_grid.Youwillnoticethatwehave13 datetime objects
et usgenerateand
inthe time_grid arrayobject(al themonthendsintherelevantyear,plus
thepricing_date):
In [8]:gbm.generate_time_grid()
In [9]: gbm.time_grid
Out[9]: array([datetime.datetime(2015,
datetime.datetime(2015, 1,1, 1,31,0,0,0),0),
datetime.datetime(2015, 3,2, 31,28, 0,0, 0),0),
datetime.datetime(2015,
datetime.datetime(2015, 5,4, 30,31,30, 0,0, 0),0),
datetime.datetime(2015,
datetime.datetime(2015, 7,6, 31, 0,0, 0),0),
datetime.datetime(2015,
datetime.datetime(2015, 9,8, 30,31, 0,0, 0),0),
datetime.datetime(2015,
datetime.datetime(2015, 11,10, 30,31, 0,0, 0),0),
datetime.datetime(2015,
datetime.datetime(2015, 12, 31, 0, 0)], dtype=object)
Next,wemightaskforthesimulatedinstrumentvalues:
In [10]: %time paths_1 = gbm.get_instrument_values()
Out[10]: CPUWalltimes:
time: user
12.8 10.7
ms ms, sys: 2.91 ms, total: 13.6 ms
In [11]: paths_1
,Out[11]: array([[ 36.36. ,, 36.36. ,], 36. , ..., 36.
36.22258915, [37.37221481, 38.08890977, 34.37156575, ...,
35.05503522, 39.63544014],
34.80319951, 33.60600939, 42.18817025,
[39.45866146,
37.62733874],
32.38579992,...,
...,
37.5619937 , [ 40.15717404, 33.16701733, 23.32556112, ...,
29.89282508,, 30.2202427 ], 21.70771374, ...,
[42.0974104 36.59006321,
35.70950512, 30.64670854, 30.45901309],
35.92624556, [43.33170027, 37.42993532, 23.8840177 , ...,
27.87720187, 28.77424561]])
Letusgenerateinstrumentvaluesforahighervolatility aswell:
In [12]: gbm.update(volatility=0.5)
In [13]: %time paths_2 = gbm.get_instrument_values()
Out[13]: CPUWalltimes:
time: user
10.2 ms9.78 ms, sys: 1.36 ms, total: 11.1 ms
Thedifferenceinthetwosetsofpathsis il ustrated inFigure16­1:
In [14]: import matplotlib.pyplot
inline 4))as pltpaths_1[:, :10], 'b')
p1plt.figure(figsize=(8,
%matplotlib
= plt.plot(gbm.time_grid,
p2= plt.plot(gbm.time_grid, paths_2[:, :10], 'r-.')
plt.grid(True)
l1 = plt.legend([p1[0], p2[0]], 'high volatility'],
['low volatility',
loc=2) plt.gca().add_artist(l1)
plt.xticks(rotation=30)

Figure16­1.SimulatedpathsfromGBMsimulationclass
JumpDiffusion
geometric_brownian_motion class,itisnowstraightforwardto
Equippedwiththebackgroundknowledgefromthe
(1976).Recallthestochasticdifferentialequationofthejumpdiffusion, as
implementaclassforthejumpdiffusionmodeldescribedbyMerton
showninEquation16­3(seealsoEquation10­8inChapter10,inparticular
forthemeaningoftheparametersandvariables).
Equation16­3.StochasticdifferentialequationforMertonjumpdiffusion
modeldSt =(r –rJ)Stdt+ᵰStdZt +JtStdNt
4(seealsoEquation10­9inChapter10andthemoredetailedexplanations
AnEulerdiscretizationforsimulationpurposesis presentedinEquation16­
giventhere).
Equation16­4.EulerdiscretizationforMertonjumpdiffusionmodel
TheSimulationClass
Example16­4presentsthe Python codeforthe jump_diffusion
simulationclass.Thisclassshouldbynowcontain
themodeli s no surprises.Ofcourse,
different,butthedesignandthemethodsareessentiallythe
Example16­4.Simulationclassforjumpdiffusion
same.
##DX Library Simulation
## jump_diffusion.py
importnumpy asnp
from
from sn_random_numbers importsimulation_class
simulation_class import sn_random_numbers
class'''Class
jump_diffusion(simulation_class):
the Mertonto(1976)generatejumpsimulated
diffusionpaths
model.based on
Attributes
==========
namename: string
ofthe objectof market_environment
mar_envmarket:instance
environment data for simulation
corrTrue:Boolean
ifcorrelated with other model object
Methods
=======
updateupdates
: parameters
generate_paths
returns Monte: Carlo paths given the market environment
'''
def __init__(self, name, mar_env,
super(jump_diffusion, corr=False): mar_env, corr)
self).__init__(name,
try:# additional parameters needed
self.lamb=
self.mu = mar_env.get_constant('lambda')
mar_env.get_constant('mu')
except:self.delt
print "Error= mar_env.get_constant('delta')
parsingmarketenvironment."
def update(self, initial_value=None,
mu=None, delta=None, volatility=None, lamb=None,
final_date=None):
ifinitial_value isnotNone:
self.initial_value
if volatility isnot =initial_value
None:
self.volatility
if lambself.lamb
is not=None: = volatility
if muself.muis not=None: lamb
if delta is muNone:
not
self.delt = delta
if final_dateis not None:
self.instrument_valuesfinal_date
self.final_date= = None
def generate_paths(self,
if self.time_grid is fixed_seed=False,
None: day_count=365.):
#method from generic simulation class
self.generate_time_grid()
#number
M#number ofof dates
paths for time grid
=len(self.time_grid)
I# =arrayself.paths
initialization for
I)) path simulation
paths
#paths[0]== np.zeros((M,
initializeself.initial_value
first date with initial_value
if self.correlated isFalse:generate random numbers
#sn1ifnot= sn_random_numbers((1,
correlated, fixed_seed=fixed_seed)
M, I),
else:# if market
#sn1in=correlated, use random number object as provided
environment
self.random_numbers
## standard
for the normallydistributed
jump component pseudorandom numbers
sn2= sn_random_numbers((1,fixed_seed=fixed_seed)
M, I),
rj= self.lamb * (np.exp(self.mu + 0.5* self.delt ** 2) - 1)
short_rate
for tin = self.discount_curve.short_rate
range(1, len(self.time_grid)):
#select
#ifrandom the right
number settime slice from the relevant
self.correlated
ran = sn1[t] is False:
else:#ranonly= np.dot(self.cholesky_matrix,
withcorrelation inportfoliosn1[:, contextt, :])
dt = ran= ran[self.rn_set]- self.time_grid[t - 1]).days /
(self.time_grid[t]
day_count #difference between two dates as year fraction
component poi#Poisson-distributed
= np.random.poisson(self.lamb
pseudorandom* dt,numbersfor
I) jump
dt paths[t] =paths[t - 1] * (np.exp((short_rate-- 0.5 * self.volatility**rj 2) *
ran) + self.volatility* np.sqrt(dt)*
+ (np.exp(self.mu
sn2[t])-1)* +poi)self.delt *
self.instrument_values = paths
Ofcourse,sincewearedealingnowwithadifferentmodel,weneeda
differentsetofelementsinthe market_environment object.Inadditionto
thoseforthe geometric_brownian_motion class(seeTable16­1),there
arelog­normaljumpcomponent,
threeadditions,asoutlinedinTable16­2:namely,theparametersofthe
lambda,mu,anddelta.
Table16­2.Specificelementsofmarketenvironmentforjump_diffusion
class
Element Type Mandatory Description
lambda Constant Yes Jumpintensity(probabilityp.a.)
mu Constant Yes Expectedjumpsize
delta Constant Yes Standarddeviationofjumpsize
Forthegenerationofthepaths,t h i s classofcourseneedsfurtherrandom
numbersbecauseofthejumpcomponent.Inlinecommentsinthemethod
generate_paths highlightthetwospotswheretheseadditionalrandom
numbersaregenerated.ForthegenerationofPoisson­distributedrandom
numbers,seealsoChapter10.
AUseCase
In whatfollows,weagainil ustrate theuseofthesimulationclass
jump_diffusion interactively.Wemakeuseofthe market_environment
objectdefinedfortheGBMobjectintheprevioussection:
In [15]: me_jd = market_environment('me_jd', dt.datetime(2015, 1, 1))
In [16]: #me_jd.add_constant('lambda',
add jump diffusion specific0.3)parameters
me_jd.add_constant('mu',
me_jd.add_constant('delta',-0.75)0.1)
Tothis environment,weaddthecompleteenvironmentoftheGBM
simulationc las , whichcompletestheinputneeded:
In [17]: me_jd.add_environment(me_gbm)
Basedonthis market_environment object,wecaninstantiatethe
simulationclassforthejumpdiffusion:
In [18]: from jump_diffusion import jump_diffusion
In [19]: jd = jump_diffusion('jd', me_jd)
instrumentvaluesisnowformallythesame.Themethodc
Duetothemodelingapproachwehaveimplemented,thegenerationof
al inthis caseis
tothejumpcomponent:
abitslower,however,sinceweneedtosimulatemorenumericalvaluesdue
In[20]: %time paths_3 = jd.get_instrument_values()
Out[20]: Wall time: user
CPU times: 21.9 ms19.7 ms, sys: 2.92 ms, total: 22.6 ms
Withtheaimofagaincomparingtwodifferentsetsofpaths,change,for
example,thejumpprobability:
In [21]: jd.update(lamb=0.9)
In [22]: %time paths_4 = jd.get_instrument_values()
Out[22]: CPUWalltimes:
time: user
27.7 26.3
ms ms, sys: 2.07 ms, total: 28.4 ms
Figure16­2comparesacoupleofsimulatedpathsfromthetwosetswith
lowandhighintensity(jumpprobability),respectively.Youcanspotafew
jumpsforthelowintensitycaseandmultiplejumpsforthehighintensity
caseinthefigure:
In [23]: p1plt.figure(figsize=(8,
= plt.plot(gbm.time_grid,
4)) paths_3[:, :10], 'b')
p2= plt.plot(gbm.time_grid, paths_4[:, :10], 'r-.')
plt.grid(True)
l1 = plt.legend([p1[0],
['low p2[0]], 'high intensity'], loc=3)
intensity',
plt.gca().add_artist(l1)
plt.xticks(rotation=30)

Figure16­2.Simulatedpathsfromjumpdiffusionsimulationclass
Square­RootDiffusion
Thethirdstochasticprocesstobesimulatedis thesquare­rootdiffusionas
Equation16­5showsthestochasticdifferentialequation
usedbyCox,Ingersoll,andRoss(1985)tomodelstochasticshortr ates.
oftheprocess(see
alsoEquation10­4inChapter10forfurtherdetails).
Equation16­5.Stochasticdifferentialequationofsquare­rootdiffusion
Weusethediscretizationscheme as presented in Equation16­6(seealso
Equation10­5inChapter10,aswellasEquation10­6,foranalternative,
exactscheme).
truncationscheme)
Equation16­6.Eulerdiscretizationforsquare­rootdiffusion(full

TheSimulationClass
Example16­5presentsthe Python codeforthe square_root_diffusion
simulationclass.Apartfrom,ofcourse,adifferentmodelanddiscretization
scheme,theclassdoesnotcontainanythingnewcomparedtotheothertwo
Example16­5. Simulationclassforsquare­rootdiffusion
specializedclasses.
#
## DXsquare_root_diffusion.py
Library Simulation
#importnumpy asnp
from sn_random_numbers import sn_random_numbers
from simulation_class import simulation_class
class'''Classtogenerate
square_root_diffusion(simulation_class):
the Cox-Ingersoll-Rosssimulated pathsbaseddiffusion
(1985) square-root on model.
Attributes
==========
name:namestring
ofthe object
mar_envmarket: instanceof
environment market_environment
data for simulation
corrTrue: Boolean
if correlated with other model object
Methods
=======
updateupdates
: parameters
generate_paths
returns : Carlo paths given the market environment
Monte
'''
def __init__(self, name, mar_env,self).__init__(name,
super(square_root_diffusion, corr=False): mar_env, corr)
try:self.kappa = mar_env.get_constant('kappa')
except:print
self.theta
"Error= parsing
mar_env.get_constant('theta')
market environment."
def update(self, initial_value=None,
theta=None, volatility=None, kappa=None,
final_date=None):
if initial_value
self.initial_value
isnotNone:
=initial_value
ifvolatility is not None:
ifkappais not None: volatility
self.volatility=
ifthetais self.kappanot=None:
kappa
self.thetais= theta
if final_date not None:
self.instrument_valuesfinal_date
self.final_date= = None
def generate_paths(self,
if self.time_gridisNone: fixed_seed=True, day_count=365.):
self.generate_time_grid()
MIpaths== self.paths
len(self.time_grid)I))
paths_ ==np.zeros((M,
np.zeros_like(paths)
paths[0]
paths_[0] ==self.initial_value
self.initial_value
if self.correlated is False: M, I),
rand =sn_random_numbers((1,
else:rand = self.random_numbersfixed_seed=fixed_seed)
for dtt in= (self.time_grid[t]
range(1, len(self.time_grid)):
- self.time_grid[t - 1]).days /
day_count if self.correlated is False:
else:rranan ==rand[t]
np.dot(self.cholesky_matrix, rand[:, t, :])
# fullrantruncation Euler discretization
= ran[self.rn_set]
paths_[t] = (paths_[t - 1]+self.kappa
* (self.theta - np.maximum(0, paths_[t - 1,
:])) * dt *+ self.volatility
np.sqrt(np.maximum(0,
* np.sqrt(dt)
paths_[t*-1,:]))
ran)
self.instrument_values =paths
paths[t] = np.maximum(0,paths_[t])
Table16­3liststhetwoelementsofthemarketenvironmentthatare
specifictothis class.
Table16­3.Specificelementsofmarketenvironmentfor
square_root_diffusionclass
Element Type Mandatory Description
kappa Constant Yes Meanreversionfactor
theta Constant Yes Long­termmeanofprocess
AUseCase
process:
weneedamarketenvironment,forexampletomodelavolatility(index)
Aratherbriefusecaseil ustrates theuseofthesimulationclass.Asusual,
In1))[35]: me_srd =market_environment('me_srd', dt.datetime(2015, 1,
In [36]: me_srd.add_constant('initial_value', .25) 12, 31))
me_srd.add_constant('final_date', dt.datetime(2015,
me_srd.add_constant('volatility',0.05)
'W')
me_srd.add_constant('currency', 'EUR')
me_srd.add_constant('frequency',
me_srd.add_constant('paths',10000)
Twocomponentsofthemarketenvironmentarespecifictotheclass:
In [37]: #me_srd.add_constant('kappa',
specific to simualation class4.0)
me_srd.add_constant('theta', 0.2)
Althoughwedonotneeditheretoimplementthesimulation,thegeneric
simulationclassrequiresadiscountingobject.Thisrequirementcanbe
justified fromarisk­neutralvaluationperspective,whichi
goalofthewhole DX analyticslibrary: s theoverarching
In [38]:#required but notneededforthe class
me_srd.add_curve('discount_curve', constant_short_rate('r',
0.0))
In [39]: from square_root_diffusion import square_root_diffusion
In [40]: srd =square_root_diffusion('srd', me_srd)
asinput,bycallingthe get_instrument_valuesmarket_environmentobject
Asbefore,wegetsimulationpaths,giventhe method:
In [41]: srd_paths = srd.get_instrument_values()[:, :10]
Figure16­3il ustrates themean­revertingcharacteristicbyshowinghow
thesinglesimulatedpathsonaveragereverttothelong­termmean theta
(dashedline):
In [42]: plt.figure(figsize=(8,
plt.plot(srd.time_grid, 4))srd.get_instrument_values()[:, :10])
', lw=2.0)plt.axhline(me_srd.get_constant('theta'),
plt.xticks(rotation=30)
plt.grid(True)
color='r', ls='--
Figure16­3.Simulatedpathsfromsquare­rootdiffusionsimulationclass
(dashedline=long­termmeantheta)
Conclusions
Thischapterdevelopsal thetoolsandclassesneededforthesimulationof
thethreestochasticprocessesofi
jumpdiffusions,andsquare­rootdiffusions.Thechapterpresentsafunction
nterest: geometricBrownianmotions,
toconvenientlygeneratestandardnormallydistributedrandomnumbers.It
Tosimplifyfutureimports,wecanagain
andpresentsusecasesfortheseclasses.
thisfoundation,thechapterintroducesthreespecializedsimulationclasses
thenproceedsbyintroducingagenericmodelsimulationclass.Basedon
as useawrappermodulecalled
dx_simulation.py, presentedinExample16­6.
Example16­6.Wrappermoduleforsimulationcomponents
## DX Library Simulation
# dx_simulation.py
#iimport
mport numpy
pandasasasnppd
fromdx_frame import * import sn_random_numbers
fromsn_random_numbers
fromsimulation_classimport
fromgeometric_brownian_motionsimulation_class
import geometric_brownian_motion
fromjump_diffusionimport jump_diffusion
from square_root_diffusion import square_root_diffusion
Aswiththefirstwrappermodule, dx_frame.py, thebenefitisthatasingle
import statementmakesavailableal simulationcomponentsinasingle
step:
fromdx_simulation import *
Since dx_simulation.py alsoimportseverything
singleimportinfactexposesa from dx_frame.py, this
l functionalitydevelopedsofar.Thesame
holdstruefortheenhanced init fileinthedxdirectory,asshownin
Example16­7.
Example16­7.EnhancedPythonpackagingfile
## DX Library
## packaging
__init__.pyfile
#iimport
mport numpy
pandas asasnppd
import datetime as dt
# frame
from
from get_year_deltas importimportget_year_deltas
constant_short_rate constant_short_rate
market_environment import market_environment
# simulation import sn_random_numbers
fromsn_random_numbers
fromsimulation_classimport
fromgeometric_brownian_motion simulation_class
import geometric_brownian_motion
fromjump_diffusionimport jump_diffusion
from square_root_diffusion import square_root_diffusion
FurtherReading
Usefulreferencesinbookformforthetopicscoveredinthis chapterare:
Glasserman,Paul(2004):MonteCarloMethodsinFinancial
Engineering.Springer, New York.
Hilpisch,Yves(2015):DerivativesAnalyticswithPython.Wiley
Finance,Chichester,England.http://derivatives­analytics­with­
python.com. are:
Originalpaperscitedint h i s chapter
Black,FischerandMyronScholes(1973):“ThePricingofOptionsand
CorporateLiabilities.” JournalofPoliticalEconomy,Vol.81,No.3,
pp.638–659.
Cox,John,JonathanIngersoll,andStephenRoss(1985):“ATheoryof
385–407.
theTermStructureofInterestRates.”Econometrica,Vol.53,No.2,pp.
Merton,Robert(1973):“TheoryofRationalOptionPricing.”Bell
JournalofEconomicsandManagementScience,Vol.4,pp.141–183.
Merton,Robert(1976):“OptionPricingWhentheUnderlyingStock
ReturnsAreDiscontinuous.”JournalofFinancialEconomics,Vol.3,
No.3,pp.125–144.
[71]Cf. of“random”numbersknowingthattheyareingeneral
[70]Wespeak
“pseudorandom”only.
variables.Glasserman(2004),Chapter2,ongeneratingrandomnumbersandrandom
Glasserman(2004)presentsinChapter4anoverviewandtheoreticaldetailsof
[72]differentvariancereductiontechniques.
Chapter 17.Derivatives
Valuation
Derivativesareahuge,complexissue.
—JuddGregg
Optionsandderivativesvaluationhaslongbeenthedomainofso­called
rocketscientistsonWallStreet—i.e.,peoplewithaPh.D.inphysicsora
similarlydemandingdisciplinewheni
However,theapplication of t comestothemathematicsinvolved.
by
likeMonte simulation generallyalthemeansofnumericalmethods
Carlo
theoreticalmodelsthemselves. themodels
i s it le lessinvolvedthanthe
the valuation ofs onlypossiblea
Europeanexercise—i.e.,whereexercisei
predetermineddate.Iti
Thisisparticularlytruefors abitlesstrueforoptionsandderivativeswith
optionsandderivativeswith
t acertain,
prespecifiedperiodoftime.ThischapterintroducesandusestheLeast­
Americanexercise,whereexerciseis allowedatanypointovera
SquaresMonteCarlo(LSM)algorithm,whichhasbecomeabenchmark
algorithmwhenitcomestoAmericanoptionsvaluationbasedonMonte
Thecurrent chapteris similarinstructuretoChapter16inthatit first
Carlosimulation.
introducesagenericvaluationclassandthenprovidestwospecialized
valuationclasses,oneforEuropeanexerciseandanotheroneforAmerican
The genericvaluationclasscontainsmethodstonumericallyestimatethe
exercise.
mostimportantGreeksofanoption:theDeltaandtheVega.Therefore,the
valuationclassesareimportantnotonlyforvaluationpurposes,butalsofor
riskmanagementpurposes.
GenericValuationClass
Aswiththegenericsimulationc l a s , weinstantiate an objectofthe
valuationclassbyprovidingonlyafewinputs(inthis case,four):
nameA string objectasanameforthemodelsimulationobject
underlying
Aninstanceofasimulationclassrepresentingtheunderlying
mar_env
Aninstanceofthe market_environment class
payoff_funcA
Python stringcontainingthepayofffunctionfortheoption/derivative
Thegenericclasshasthreemethods:
updateThismethodupdatesselectedvaluationparameters(at ributes).
deltaThismethodcalculatesanumericalvaluefortheDeltaofan
option/derivative.
vegaThismethodcalculatestheVegaofanoption/derivative.
Equippedwiththebackgroundknowledgefromthepreviouschaptersabout
the DX library, thegenericvaluationclassaspresentedinExample17­1
shouldbealmostself­explanatory;whereappropriate,inlinecommentsare
alsoprovided.Weagainpresenttheclassini t s entiretyfirstandhighlight
selectedtopicsimmediatelyafterwardandinthesubsequentsections.
Example17­1.Genericvaluationclass
##DXLibrary Valuation
valuation_class.py
#
class'''Basic
valuation_class(object):
class for single-factor valuation.
Attributes
==========
namename:string
of the object
underlying
instance : of simulation class
mar_envmarket: instance of market_environment
environment
payoff_func payoff data
derivatives:string forvaluation
in Python syntax - 100, 0)'
Example:
Example: 'np.maximum(maturity_value
valuesof the underlying
where maturity_valueisthe
respective 'np.maximum(instrument_values
NumPyvector- with
100, 0)'
where
valuesinstrument_valuesistheNumPy
oftheunderlying over the wholematrixtime/path
with grid
Methods
=======
update:updates selected valuation parameters
deltareturns
: theDelta ofthe derivative
vegareturns
: the Vega of the derivative
'''
def __init__(self,
try:self.name=namename, underlying,
self.pricing_date mar_env, payoff_func=''):
= mar_env.pricing_date
try:self.strike = mar_env.get_constant('strike')
except: # strike is optional
self.maturity
pass = mar_env.get_constant('maturity')
self.currency
#simulation = mar_env.get_constant('currency')
parameters and discount curve from simulation
object self.frequency
self.paths = underlying.paths
self.discount_curve
= underlying.frequency
= underlying.discount_curve
self.payoff_func==underlying
self.underlying payoff_func
provide pricing_date and maturity to underlying
#self.underlying.special_dates.extend([self.pricing_date,
except:print "Error parsing market environment."self.maturity])
def update(self, initial_value=None, volatility=None,
strike=None,maturity=None):
if initial_value is notNone:
self.underlying.update(initial_value=initial_value)
is None:
if volatility not
self.underlying.update(volatility=volatility)
ifstrikeis not None:
self.strike= strike
ifmaturity is
self.maturitynot None:
=maturity
#add new maturity date if not in time_grid
if notself.underlying.special_dates.append(maturity)
maturityin self.underlying.time_grid:
self.underlying.instrument_values = None
def delta(self,
interval=interval=None,
if interval None: accuracy=4): / 50.
is self.underlying.initial_value
#calculate left valuefornumericalDelta
#forward-difference approximation
value_left
#numerical=underlying
initial_del= self.present_value(fixed_seed=True)
value forrightvalue
self.underlying.initial_value+interval
self.underlying.update(initial_value=initial_del)
#calculate right value for numerical delta
value_right
#reset = self.present_value(fixed_seed=True)
the initial_value of the simulation object interval)
self.underlying.update(initial_value=initial_del-
delta =(value_right - value_left) / interval
#correct
ifdelta for
<-1.0: potential numerical errors
delta -1.0
return
elifreturn >1.01.0:
else:return round(delta, accuracy)
def vega(self, interval=0.01,
if#calculatetheleft
interval accuracy=4): / 50.:
<self.underlying.volatility
# forward-difference valuefor
approximation / 50.
interval = self.underlying.volatility
value_left = numerical Vega
self.present_value(fixed_seed=True)
#vola_del volatility valueforrightvalue
numerical= self.underlying.volatility+interval
#update the simulation object
self.underlying.update(volatility=vola_del)
#calculate
value_right= theright valuefor numerical Vega
self.present_value(fixed_seed=True)
#reset volatility valueof simulationobject
self.underlying.update(volatility=vola_del
vega = round(vega,
return - value_left) / interval- interval)
(value_rightaccuracy)
Onetopiccoveredbythegeneric valuation_class classistheestimation
ofGreeks.Thisis somethingweshouldtakeacloserlookat.Tothisend,
consideris that wehaveacontinuouslydifferentiablefunction of the
availablethatrepresentsthepresentvalueofanoption.TheDelta
option thendefinedasthefirst0p;iar.tei.a,l derivativewithrespecttothe
currentvalueoftheunderlyingS .
SupposenowthatwehavefromMonteCarlovaluation(seeChapter10and
subsequentsectionsinthischapter)anumericalMonteCarloestimator
fortheoptionvalue.AnumericalapproximationfortheDeltaofthe
optionisthengiveninEquation17­1.[73] Thisis whatthe delta methodof
thegenericvaluationclassimplements.Themethodassumestheexistence
set ofparametervalues.
ofapresent_valuemethodthatreturnstheMonte
certain Carlo estimatorgivena
Equation17­1.NumericalDeltaof an option
Similarly,theVegaoftheinstrumenti s definedasthefirst partial
derivativeofthepresentvaluewithrespecttothecurrent(instantaneous)
latility ᵰ0, i.e.,.Againassumingtheexistence ofa
vestimatorforthevalueoftheoption,Equation17­2providesanumerical
oapproximationfortheVega.Thisiswhatthe MonteCarlo
vega methodofthe
valuation_class classimplements.
Equation17­2.NumericalVegaofanoption
NotethatthediscussionofDeltaandVegaisbasedonlyontheexistenceof
eitheradifferentiablefunction ora MonteCarloestimatorforthepresent
valueofanoption.Thisistheveryreasonwhywecandefinemethodsto
numericallyestimatethesequantitieswithoutknowledgeofthe exact
definitionandnumericalimplementationoftheMonteCarloestimator.
EuropeanExercise
Thefirstcasetowhichwewanttospecializethegenericvaluationclassis
Europeanexercise.Tot
generateaMonte Carlo h i s end,considerthefollowingsimplifiedrecipeto
estimatorfor an optionvalue:
1.SimulatetherelevantunderlyingriskfactorSundertherisk­neutral
measureItimesat thematurity
underlying to comeupwithof theoptionT—i.e.,
as manysimulatedvaluesofthe
2. Calculatethepayoffh
valueoftheunderlying—i.e., T oftheoptionat maturityforeverysimulated
3. DerivetheMonteCarloestimatorfortheoption’spresentvalueas
TheValuationClass
Example17­2showstheclassimplementingthe present_value method
basedonthisrecipe. In addition,itcontainsthemethodgenerate_payoff
togeneratethesimulatedpathsandthepayoffoftheoptiongiventhe
simulatedpaths.This,ofcourse,buildstheverybasisfortheMonteCarlo
estimator.
Example17­2.ValuationclassforEuropeanexercise
##DXLibraryValuation
## valuation_mcs_european.py
import numpy as np
from valuation_class import valuation_class
class'''Classto
valuation_mcs_european(valuation_class):
by single-factorvalueMonteEuropean
Carlooptionswith
simulation. arbitrary payoff
Methods
=======
generate_payoff
returns : given the paths and the payoff function
payoffs
present_value
returns : value (Monte Carlo estimator)
present
'''
defgenerate_payoff(self,
'''Parameters fixed_seed=False):
==========
fixed_seed : Booleanseed for valuation
'''try: use same/fixed
#strike
strike= defined?
except:pass self.strike
paths =
self.underlying.get_instrument_values(fixed_seed=fixed_seed)
time_grid
try:time_index== self.underlying.time_grid
time_index = np.where(time_grid
int(time_index) == self.maturity)[0]
except:print "Maturity date not in time grid of underlying."
maturity_value
#average
mean_valuevalue =overpaths[time_index]
whole path axis=1)
= np.mean(paths[:time_index],
#maximum
max_value value overwholepath axis=1)[-1]
= np.amax(paths[:time_index],
#minimum
min_value value over wholepath axis=1)[-1]
= np.amin(paths[:time_index],
try:payoff = eval(self.payoff_func)
return"Error
except:print payoffevaluating payoff function."
defpresent_value(self,
'''Parameters accuracy=6, fixed_seed=False, full=False):
==========
accuracy :int decimals inforreturned result
number :ofBoolean
fixed_seed same/fixed seed valuation
use: Boolean
fullreturn also full 1d array of present values
'''cash_flow
discount_factor= self.generate_payoff(fixed_seed=fixed_seed)
=self.discount_curve.get_discount_factors(
(self.pricing_date, self.maturity))[0,1]
result
if full: = discount_factor *np.sum(cash_flow) /len(cash_flow)
else:return round(result, accuracy), discount_factor * cash_flow
return round(result, accuracy)
generate_payoff methodprovidessomespecialobjectstobeusedfor
Thethedefinitionofthepayoffoftheoption:
strike is thestrikeoftheoption.
maturity_value representsthe1D ndarray objectwiththesimulated
valuesoftheunderlyingatmaturityoftheoption.
mean_value istheaverageoftheunderlyingoverawholepathfrom
todayuntil maturity.
max_value is themaximumvalueoftheunderlyingoverawholepath.
min_value givestheminimumvalueoftheunderlyingoverawhole
path.
Thelast threeespeciallyallowfortheefficienthandlingofoptionswith
Asian(i.e.,lookback)features.
AUseCase
iTheapplicationofthevaluationclass an underlyingfortheoption
valuation_mcs_europeanis
instantiated,weneedasimulationobject—i.e.,
l ustrated byaspecificusecase.However,beforeavaluationclass canbestbe
be valued.FromChapter16,weusethe geometric_brownian_motion
toclasstomodeltheunderlying.Wealsousetheexampleparameterizationof
therespectiveusecasethere:
In [1]: from dx import *
In1))[2]: me_gbm = market_environment('me_gbm', dt.datetime(2015, 1,
In [3]: me_gbm.add_constant('initial_value',
me_gbm.add_constant('volatility', 36.)
0.2)
me_gbm.add_constant('final_date','EUR')
me_gbm.add_constant('currency', dt.datetime(2015, 12, 31))
me_gbm.add_constant('frequency', 'M')
me_gbm.add_constant('paths',10000)
In [4]: csr = constant_short_rate('csr', 0.06)
In [5]: me_gbm.add_curve('discount_curve', csr)
In [6]: gbm = geometric_brownian_motion('gbm', me_gbm)
In addition toa simulationobject,weneedtoprovideamarketenvironment
fortheoptioni t s e l f . Ithastocontaina
Optionally,wecanprovidea strike: t l e a s t a maturity anda currency.
In [7]: me_call =market_environment('me_call', me_gbm.pricing_date)
In [8]: me_call.add_constant('currency',
me_call.add_constant('strike',
me_call.add_constant('maturity',40.)'EUR')
dt.datetime(2015, 12, 31))
Acentralelement,ofcourse, i s thepayofffunction,providedhere as a
string objectcontaining Python codethatthe eval functioncanevaluate.
WewanttodefineaEuropeancalloption.Suchanoptionhasapayoffof
handKbeingthestrikepriceoftheoption.In
T =max(ST –K,0),withST beingthevalueoftheunderlyingat maturity
form: Python and NumPy—i.e., with
vectorizedstorageofal simulatedvalues—thistakesonthefollowing
In [9]: payoff_func = 'np.maximum(maturity_value - strike, 0)'
Wecannowputa l theingredientstogethertoinstantiatethe
valuation_mcs_european class:
In [10]: from valuation_mcs_european import valuation_mcs_european
In [11]: eur_call = valuation_mcs_european('eur_call',
underlying=gbm,
payoff_func=payoff_func) mar_env=me_call,
Witht
methodchisavaluationobjectavailable,a l quantitiesofinterest areonlyone
l away.Letusstart withthepresentvalueoftheoption:
In [12]: %time eur_call.present_value()
Out[12]: Wall
CPU times: user ms41.7 ms, sys: 11 ms, total: 52.7 ms
time: 44.6
Out[12]: 2.180511
TheDeltaoftheoptioni
oftheunderlying: s
of, as expectedforaEuropeancal option,positive
—i.e.,thepresentvalue theoptionincreaseswithincreasinginitialvalue
In [13]: %time eur_call.delta()
Out[13]: CPUWalltime:
times: user
11.1 10.9
ms ms, sys: 1.09 ms, total: 12 ms
0.4596
TheVegai s calculatedsimilarly.It showstheincreaseinthepresentvalue
oftheoptiongivenanincreaseintheinitialvolatilityof1%;e . g . , from24%
i s for
to25%.TheVega positive bothEuropeanputandcal options:
In [14]: %time eur_call.vega()
Out[14]: CPUWalltimes:
time: user
15.6 15.2
ms ms, sys: 1.34 ms, total: 16.5 ms
14.2782
Once we havethevaluationobject,amorecomprehensiveanalysisofthe
presentvalueandtheGreeksiseasilyimplemented.Thefollowingcode
calculatesthepresentvalue,Delta, and
underlyingrangingfrom34to46EUR:Vegaforinitial valuesofthe
In [15]: %%time
s_list == np.arange(34., 46.1,v_list2.)= []
p_list []; d_list = [];
for eur_call.update(initial_value=s)
s in s_list:
p_list.append(eur_call.present_value(fixed_seed=True))
v_list.append(eur_call.vega())
d_list.append(eur_call.delta())
Out[15]: Wall time: 248userms239 ms, sys: 8.18 ms, total: 248 ms
CPU times:
Equippedwitha l thesevalues,wecangraphicallyinspectther
Example17­3.Helperfunctiontoplotoptionss tatistics esults. To
thisend,weuseahelperfunctionasshowninExample17­3.
## DX Library Valuation
# plot_option_stats.py
#import matplotlib.pyplot as plt
def plot_option_stats(s_list,
'''different
Plots option p_list,andd_list, v_list):
initial values ofthe underlying.for a set of
prices, Deltas, Vegas
Parameters
==========
s_listset:ofarrayinitialor listvalues of the underlying
p_list: arrayor
present values list
d_list: arrayfororlist
results Deltas
v_list: array orlist
results for Vegas
'''plt.figure(figsize=(9,
sub1 = plt.subplot(311) 7))
plt.plot(s_list,
plt.plot(s_list, p_list,
p_list, 'ro',
'b') label='present value')
plt.grid(True); plt.legend(loc=0)visible=False)
plt.setp(sub1.get_xticklabels(),
sub2 = plt.subplot(312)
plt.plot(s_list, d_list, 'go', label='Delta')
plt.plot(s_list,
plt.grid(True); d_list, 'b')
plt.legend(loc=0)
plt.ylim(min(d_list)
plt.setp(sub2.get_xticklabels(), - 0.1, max(d_list) + 0.1)
visible=False)
plt.plot(s_list, v_list, 'yo', label='Vega')
sub3=plt.subplot(313)
plt.xlabel('initial
plt.plot(s_list, v_list, value'b')of underlying')
plt.grid(True); plt.legend(loc=0)
Importingt his functionandprovidingthevaluationresultstoit generatesa
picturelikethatshowninFigure17­1:
In [16]: from plot_option_stats
%matplotlib inline import plot_option_stats
In [17]: plot_option_stats(s_list, p_list, d_list, v_list)
Figure17­1.Presentvalue,Delta,andVegaestimatesforEuropeancalloption
Thisi l ustrates thatworkingwiththe
areinvolved—boilsdown toan DX library—althoughheavynumerics
approachthatiscomparabletohavinga
notonlyapplytosuchsimplepayoffsastheoneconsideredsofar.WithTo
closed­formoptionpricingformulaavailable.However,thisapproachdoes
exactlythesameapproach,wecanhandlemuchmorecomplexpayoffs.
this end,considerthefollowingpayoff,amixtureofaregularandanAsian
payoff:
In [18]: payoff_func = 'np.maximum(0.33 * (maturity_value +
max_value) -# 40,payoff0)'dependent on both the simulated maturity value
# and the maximum value
Everythingelseshallremainthesame:
In[19]:eur_as_call
underlying=gbm, = valuation_mcs_european('eur_as_call',
payoff_func=payoff_func) mar_env=me_call,
Allstatistics,ofcourse,changeinthis case:
In [20]: %%time
s_list ==[];np.arange(34., 46.1,v_list2.)= []
p_list s in s_list:d_list
for eur_as_call.update(s) = [];
p_list.append(eur_as_call.present_value(fixed_seed=True))
d_list.append(eur_as_call.delta())
v_list.append(eur_as_call.vega())
Out[20]: CPUWalltime:
times: user
303 ms286 ms, sys: 14.5 ms, total: 300 ms
Figure17­2showsthatDeltabecomes1whenthei
increaseofthei
underlyingreachesthestr
nitial valueoftheunderlyingleads n
toi t i a l valueofthe
ikepriceof40inthiscase.Every(marginal)
thesame(marginal)
increaseintheoption’svaluefromthisparticularpointon:
In [21]: plot_option_stats(s_list, p_list, d_list, v_list)
Figure17­2.Presentvalue,Delta,andVegaestimatesforEuropean–Asiancall
option
AmericanExercise
ThevaluationofoptionswithAmericanexercise—orBermudanexercise,
tothisend[74]—is muchmore involvedthanwithEuropeanexercise.
Therefore,wehavetointroduceabitmorevaluationtheoryfirstbefore
proceedingtothevaluationclas .
Least­SquaresMonteCarlo
modelasimplenumericalmethodtovalueEuropeanandAmericanoptions
AlthoughCox,Ross,andRubinstein(1979)presentedwiththeirbinomial
wasthevaluationofAmericanoptionsbyMonte Carlo simulation(MCS)
inthesameframework,onlywiththeLongstaff­Schwartz(2001)model
satisfactorilysolved.ThemajorproblemisthatMCSperseisaforward­
accomplished by backwardinduction,estimatingthecontinuationvalue of
movingalgorithm,whilethevaluationofAmericanoptionsisgenerally
theAmericanoptionstartinga t maturityandworkingbacktothepresent.
ThemajorinsightoftheLongstaff­Schwartz(2001)modelistousean
ordinaryleast­squaresregres ioln[availablesimulatedvalues—takinginto
basedonthecrosssectionofa
account,perpath: 75] toestimatethecontinuationvalue
Thesimulatedvalueoftheunderlying(s)
Theinnervalueoftheoption
Theactualcontinuationvaluegiventhespecificpath
Indiscretetime,thevalueofaBermudanoption(andinthelimitofan
Americanoption)isgivenbytheoptimalstoppingproblem,aspresentedin
Equation17­3forafinitesetofpointsintime0<t1 <t2 < <T.[76]
Equation17­3.OptimalstoppingproblemindiscretetimeforBermudan
option
Equation17­4presentsthecontinuationvalueoftheAmericanoptionat
date0ģtm<T.Iti s j u s t therisk­neutralexpectationatdatet
martingalemeasureofthevalueoftheAmericanoption a m+1 underthe
t the
subsequentdate.
Equation17­4.ContinuationvaluefortheAmericanoption
ThevalueoftheAmericanoption atdate canbeshowntoequalthe
formulainEquation17­5—i.e.,themaximumofthepayoffofimmediate
exercise(innervalue)andtheexpectedpayoffofnotexercising
(continuationvalue).
Equation17­5.ValueofAmericanoptionatanygivendate
In Equation17­5,theinnervalue is ofcourseeasilycalculated.The
continuationvalueis whatmakesi
(2001)modelapproximatest h i s t abit trickier. TheLongstaff­Schwartz
valuebyaregression,aspresentedin
numberofbasisfunctionsfortheregressionused,ᵯ*aretheoptimal
Equation17­6.There,istandsforthecurrentsimulatedpath,Disthe
regressionparameters,andbd is theregressionfunctionnumberedd.
Equation17­6.Regression­basedapproximationofcontinuationvalue
Theoptimalregressionparametersaretheresultofthesolutionoftheleast­
isquaresregressionproblempresentedinEquation17­7.Here,
sregressed/estimatedone).
theactualcontinuationvalueat datetm forpathi(andnota
Equation17­7.Ordinaryleast­squaresregression

Thiscompletesthebasic(mathematical)toolsettovalueanAmerican
optionbyMCS.
TheValuationClass
Example17­4presentstheclassforthevaluationofoptionsandderivatives
implementationoftheLSMalgorithminthe present_value method
withAmericanexercise.Thereisonenoteworthystepinthe
(whichisalsocommentedoninline):theoptimaldecisionstep.Here,iti
importantt h a t , basedonthedecisionthati s s
made,theLSMalgorithmtakes
estimatedcontinuationv
eithertheinnervalueortheactualcontinuationvalue—andnotthe
alue.[7 ]
Example17­4.ValuationclassforAmericanexercise
##DXLibraryValuation
## valuation_mcs_american.py
import numpy as np
from valuation_class import valuation_class
class'''Classto
valuation_mcs_american(valuation_class):
by single-factorvalueMonteAmerican
Carlooptionswith
simulation. arbitrary payoff
Methods
=======
generate_payoff
returns : given the paths and the payoff function
payoffs
present_value
returns : value (LSM Monte Carlo estimator)
present
according
''' to Longstaff-Schwartz (2001)
def generate_payoff(self,
'''Parameters fixed_seed=False):
==========
fixed_seed : seed for valuation
use same/fixed
'''try:strike = self.strike
except:pass
paths =
self.underlying.get_instrument_values(fixed_seed=fixed_seed)
time_grid = self.underlying.time_grid
try:time_index_start = int(np.where(time_grid ==
self.pricing_date)[0])
time_index_end = int(np.where(time_grid == self.maturity)
[0]) einstrument_values
xcept:print "Maturity= paths[time_index_start:time_index_end
date notintimegridof underlying." + 1]
try:payoff
return instrument_values,
=eval(self.payoff_func) payoff, time_index_start,
time_index_end
except:print "Error evaluating payoff function."
def present_value(self, accuracy=6, fixed_seed=False, bf=5,
full=False):
'''Parameters
==========
accuracy :intof decimals in returned result
number
fixed_seed :boolean
useintsame/fixed seed for valuationregression
bf :number
fullreturn : Booleanofalsobasisfullfunctions for
1d array of present values
'''instrument_values,
time_index_end=\ inner_values, time_index_start,
self.generate_payoff(fixed_seed=fixed_seed)
self.underlying.time_grid[time_index_start:time_index_end
time_list = + 1]
discount_factors = self.discount_curve.get_discount_factors(
time_list, dtobjects=True)
V=for inner_values[-1]
t in range(len(time_list) -factor
2,0,-1):
#derive
dfrg#regression relevant discount
==np.polyfit(instrument_values[t],
discount_factors[t, 1] / for given time interval
discount_factors[t
step V*df,bf)+ 1, 1]
C## optimal
=calculation of continuation
np.polyval(rg, values per path
instrument_values[t])
# if decisionis satisfied
condition step: (inner value > regressed cont.
value) # then takeinnervalue; take actual cont.value otherwise
df= V=np.where(inner_values[t]
discount_factors[0, 1]/ >C,inner_values[t],
discount_factors[1, 1] V * df)
resultfull:= df * np.sum(V) / len(V)
ifelse:
rreturn round(result, accuracy)
eturn round(result, accuracy), df * V
AUseCase
Ashasbecomebynowthemeansofchoice,ausecaseshallillustratehow
toworkwiththevaluation_mcs_americanc lass.theTheusecasereplicates
sameas before,a by
allAmericanoptionvalues as presentedinTable1oftheseminalpaper
LongstaffandSchwartz(2001).Theunderlyingi
geometric_brownian_motion object.Thestartingparameterizationfor the
underlyingisasfollows:
In [22]: from dx_simulation import *
In1))[23]: me_gbm = market_environment('me_gbm', dt.datetime(2015, 1,
In [24]: me_gbm.add_constant('initial_value',
me_gbm.add_constant('volatility', 0.2)36.)
me_gbm.add_constant('final_date','EUR')
me_gbm.add_constant('currency', dt.datetime(2016, 12, 31))
#weeklyfrequency
me_gbm.add_constant('frequency',
me_gbm.add_constant('paths',50000)'W')
In [25]: csr = constant_short_rate('csr', 0.06)
In [26]: me_gbm.add_curve('discount_curve', csr)
In [27]: gbm = geometric_brownian_motion('gbm', me_gbm)
Theoptiontypeis anAmericanputoptionwithpayoff:
In [28]: payoff_func = 'np.maximum(strike - instrument_values, 0)'
strikepricei
ThefirstoptioninTable1ofthepaperhasamaturityofoneyear,andthe
s 40throughout:
In[29]: me_am_put1,=market_environment('me_am_put',
dt.datetime(2015, 1))
In31))[30]: me_am_put.add_constant('maturity', dt.datetime(2015, 12,
me_am_put.add_constant('strike',
me_am_put.add_constant('currency',40.)'EUR')
Thenextstepi
assumptions: s toinstantiatethevaluationobjectbasedonthenumerical
In [31]: from valuation_mcs_american import valuation_mcs_american
In [32]: am_put = valuation_mcs_american('am_put',
mar_env=me_am_put, underlying=gbm,
payoff_func=payoff_func)
ThevaluationoftheAmericanputoptiontakesmuchlongerthanthesame
taskfortheEuropeanoptions.Notonlyhaveweincreasedthenumberof
pathsandthefrequencyforthevaluation,butthealgorithmismuchmore
regressionperinductionstep.Ournumericalvalue
computationallydemandingduetothebackwardinductionandthe
correctonereportedintheoriginalpaperof4.478: is prettyclosetothe
In [33]: %time am_put.present_value(fixed_seed=True, bf=5)
Out[33]: CPUWalltimes:
time: user
1.6 s1.36 s, sys: 239 ms, total: 1.6 s
4.470627
representsalowerboundofthemathematicallycorrectAmericanoption
DuetotheveryconstructionoftheLSMMonteCarloestimator,it
alue.[78] Therefore, we wouldexpectthenumericalestimatetol
vthetruevalueinanynumericallyr ealistic case.Alternativedualestimators
ie under
canprovideupperbounds
estimatorsthendefineanintervalas wel .[for79] thetrueAmericanoptionvalue.
Takentogether,twosuchdifferent
Themainstatedgoaloft h i s usecase is toreplicateal Americanoption
loop,thevaluationobjecthastobeupdatedaccordingtothethen­current
combinethevaluationobjectwithanestedloop.Duringtheinnermost
valuesofTable1intheoriginalpaper.Tothisend,weonlyneedto
parameterization:
In [34]: %%time
ls_table = [] in (36., 38., 40., 42., 44.):
for initial_value
for volatility
formaturityinin(0.2,(dt.datetime(2015,
0.4): 12,31),
dt.datetime(2016, 12, 31)):
am_put.update(initial_value=initial_value,
maturity=maturity)
volatility=volatility,
ls_table.append([initial_value,
volatility,
maturity,
am_put.present_value(bf=5)])
Out[34]: CPUWalltimes:
time: user
33.9 31.1
s s, sys: 3.22 s, total: 34.3 s
FollowingisoursimplifiedversionofTable1inthepaperbyLongstaff
andSchwartz(2001).Overall,ournumericalvaluescomeprettycloseto
thosereportedinthepaper,wheresomedifferentparametershavebeen
used(theyuse,forexample,doublethenumberofpaths):
In [35]: print
print 22"S0* "-"| Vola | T | Value"
for rprintinls_table:
(r[0],
"%d | r[1], - 2014,% \r[3])
| %d | %5.3f"
%3.1f r[2].year
Out[35]: S0----------------------
| Vola | T | Value
3636 ||0.20.2 || 12 || 4.444 4.769
3636 |0.4
|0.4 || 21 || 8.378 7.000
383838 ||0.2
|0.2 || 21 || 3.645 3.210
3840 || 0.20.40.4 ||| 121 ||| 2.267
6.066
7.535
404040 |0.4
|| 0.40.2 ||| 212 ||| 6.753
2.778
5.203
424242 |0.4
|0.2 || 21 || 2.099
|0.2 | 1 | 4.459
1.554
444442 ||| 0.20.20.4 ||| 212 ||| 1.618
6.046
1.056
4444 || 0.40.4 || 12 || 3.846 5.494
Toconcludetheusecase,notethattheestimationofGreeksforAmerican
optionsisformallythesameasforEuropeanoptions—amajoradvantage
ofourapproachoveralternativenumericalmethods(
model): l i k e thebinomial
In [36]: am_put.update(initial_value=36.)
am_put.delta()
Out[36]: -0.4655
In [37]: am_put.vega()
Out[37]: 17.3411
Conclusions
Thischapteris aboutthenumericalvaluationofbothEuropeanand
AmericanoptionsbasedonMonteCarlosimulation.Thechapterintroduces
agenericvaluationclass,called valuation_class. Thisclassprovides
methods,forexample,toestimatethemostimportantGreeks(Delta,Vega)
factor/stochasticprocess)usedforthevaluation.
forbothtypesofoptions,independentofthesimulationobject(risk
Basedonthegenericvaluationclass,thechapterpresentstwospecialized
classes, valuation_mcs_european and valuation_mcs_american.
classforthevaluationofEuropeanoptionsi s mainlyastraightforwardThe
implementationoftherisk­neutralvaluationapproachpresentedin
Chapter15incombinationwiththenumericalestimationofanexpectation
term(
Chapteri . e . ,
9). anintegralbyMonteCarlosimulation,asdiscussedin
TheclassforthevaluationofAmericanoptionsneedsacertainkind of
regression­basedvaluationalgorithm.Thisisduetothefactthatforafor
Americanoptionsanoptimalexercisepolicyhastobederived
valuation.Thisis theoreticallyandnumericallyab
However,therespective present_value it moreinvolved.
methodoftheclassi s s t i l
concise.
Theapproachtakenwiththe DX derivativesanalyticslibraryprovestobe
ofoptionswiththefollowingfeatures:
beneficial.Withouttoomucheffortweareabletovalueaprettylargeclass
Singleriskfactoroptions
EuropeanorAmericanexercise
Arbitrarypayoff
Inaddition,wecanestimatethemostimportantGreeksfort
this timecalled dx_valuation.py, aspresentedinExample17­5. his classof
options.Tosimplifyfutureimports,wewillagainuseawrappermodule,
Example17­5.Wrappermodulefora
valuationclasses
# l componentsofthelibraryincluding
## DXdx_valuation.py
Library Valuation
#iimport
mportnumpy
pandasasasnppd
from
from dx_simulation
valuation_classimport
import* valuation_class
from valuation_mcs_european
valuation_mcs_american import
import valuation_mcs_european
valuation_mcs_american
Again,letusenhancethe
tostayconsistenthere. init file inthe dx directory(seeExample17­6)
Example17­6.EnhancedPythonpackagingfile
#DXLibrary
packaging file
# __init__.py
#iimport
mport numpy
pandasasasnppd
import datetime as dt
#fromframeget_year_deltas import get_year_deltas
from
from constant_short_rate
market_environment importmarket_environment
import constant_short_rate
fromplot_option_stats import plot_option_stats
simulation
#from sn_random_numbers importsimulation_class
sn_random_numbers
from simulation_class import
fromgeometric_brownian_motionimportsquare_root_diffusion
import geometric_brownian_motion
fromjump_diffusionimport
fromsquare_root_diffusion jump_diffusion
#fromvaluation
valuation_class import valuation_class
fromvaluation_mcs_european
from valuation_mcs_american import valuation_mcs_european
import valuation_mcs_american
FurtherReading
Referencesforthetopicsofthis chapterinbookformare:
Glasserman,Paul(2004):MonteCarloMethodsinFinancial
Hilpisch,
Engineering.Springer,NewYork.
Yves(2015):DerivativesAnalyticswithPython.Wiley
Finance,Chichester,England.http://derivatives­analytics­with­
python.com.
Originalpaperscitedint
Pricing:ASimplified h i s chapter:
Cox,John,StephenRoss,andMarkRubinstein(1979):“Option
7,No.3,pp.229–263.Approach.”JournalofFinancialEconomics,Vol.
“A
Kohler,Michael(2010): ReviewonRegression­BasedMonteCarlo
MethodsforPricingAmericanOptions.”InLucDevroyeeta l. (eds.):
RecentDevelopmentsinAppliedProbabilityandS
Verlag,Heidelberg,pp.37–58. t a t i s t i c s . Physica­
Longstaff,Francis and EduardoSchwartz(2001):“ValuingAmerican
1, pp.113–147.
OptionsbySimulation:ASimpleLeastSquaresApproach.”Reviewof
FinancialStudies,Vol.14,No.
Fordetails on howtoestimateGreeksnumericallybyMonteCarlosimulation,
[73]refertoChapter7ofGlasserman(2004).Weonlyuseforward­differenceschemes
heresincethisleadstoonlyoneadditionalsimulationandrevaluationoftheoption.
revaluationsandthereforeahighercomputationalburden.
Forexample,acentral­differenceapproximationwouldleadtotwooption
[74] Americanexercisereferstoasituationwhereexerciseis
possibleateveryinstant
oftimeoverafixedtimeinterval(atleastduringtradinghours).Bermudanexercise
generallyreferstoasituationwheretherearemultiple,discreteexercisedates. In
numericalapplications,Americanexercisei
andmaybel
[75]That iswhyt
et inhgeithenumberofexercisedatesgotoi
r algorithmisgenerallyabbreviatedasLSM,forLeast­Squares
s approximatedbyBermudanexercise,
nfinity inthelimit.
MonteCarlo.
Kohler(2010)providesaconciseoverviewofthetheoryofAmericanoption
[76]valuationingeneralandtheuseofregression­basedmethodsinparticular.
SeealsoChapter6ofHilpisch(2015).
[77][78]The mainreasonisthatthe“optimalexercisepolicy”
estimatesofthecontinuationvalueis only“suboptimal.”basedontheregression
[79] Cf.Chapter6inHilpisch(2015)foradualalgorithmleadingtoanupperbound
anda Python implementationthereof.
Chapter18.PortfolioValuation
Priceiswhatyoupay.Valueiswhatyouget.
—WarrenBuffet
Bynow,thewholeapproachforbuildingthe
associatedbenefits—should DX derivativesanalyticslibrary—andits
be ratherclear.Bystrictlyrelying on MonteCarlo
simulationastheonlynumericalmethod,weaccomplishanalmostcomplete
modularizationoftheanalyticslibrary:
Discounting
Therelevantrisk­neutraldiscountingistakencareofbyaninstanceofthe
constant_short_rate class.
Relevantdata
Relevantdata,parameters,andotherinput are storedin(several)instancesofthe
market_environment class.
Simulationobjects
Relevantriskfactors(underlyings)aremodeledasinstancesofoneofthree
simulationclasses:
geometric_brownian_motion
jump_diffusion
square_root_diffusion
Valuationobjects
Optionsandderivativestobevaluedaremodeledasinstancesofoneoftwo
valuationclasses:
valuation_mcs_european
valuation_mcs_american
Onelaststepismissing:thevaluationofpossiblycomplexportfoliosofoptionsand
derivatives.Tothisend,werequirethefollowing:
NonredundancyEvery
risk factor(underlying)ismodeledonlyonceandpotentiallyusedbymultiple
valuationobjects.
Correlations
Correlationsbetweenriskfactorshavetobeaccountedfor.
Positions
An optionsposition,forexample,canconsistofcertainmultiplesofanoptions
contract.
currencyforbothsimulationandvaluationobjects,weassumethatwevalueportfolios
However,althoughwehaveinprincipleallowed(andevenrequired)providinga
portfoliosignificantly,becausewecanabstractfromexchangeratesandcurrencyrisks.
denominatedinasinglecurrencyonly.Thissimplifiestheaggregationofvalueswithina
amorecomplexonetomodelandvalueaderivativesportfolio.
Thechapterpresentstwonewclasses:asimpleonetomodeladerivativesposition,and
DerivativesPositions
Inprinciple,aderivativespositionisnothingmorethanacombinationofavaluation
objectandaquantityfortheinstrumentmodeled.
TheClass
Example18­1presentstheclasstomodeladerivativesposition.It ismainlyacontainer
forotherdataandobjects. In addition,itprovidesaget_infomethod,printingthedata
Example18­1.Asimpleclasstomodeladerivativesposition
andobjectinformationstoredinaninstanceoftheclass.
##DX Library Portfolio
## derivatives_position.py
class'''derivatives_position(object):
Class to model a derivatives position.
Attributes
==========
namename: string
quantity
number :ofoffloat
theobject
underlying
name of:string assets/derivatives
asset/risk making up the position
factor forthe
constants,
mar_env: derivative
instancelists,of market_environment
valuation class toandusecurves relevant for valuation_class
otype:string
payoff_func
payoff string : stringfor the derivative
Methods
=======
get_info
prints : information about the derivative position
'''
self.name = namename, quantity, underlying, mar_env, otype, payoff_func):
def __init__(self,
self.quantity = quantity
self.underlying = underlying
self.mar_env
self.otype = mar_env
= otype= payoff_func
self.payoff_func
def get_info(self):
print "NAME"
print
print self.name,
"QUANTITY" '\n'
print
print self.quantity,
"UNDERLYING" '\n'
print self.underlying,
print"MARKET ENVIRONMENT"'\n'
forprintkey,"\n**Constants**"
valuein self.mar_env.constants.items():
print printkey,value
for key,"\n**Lists**"
value in self.mar_env.lists.items():
print"\n**Curves**"
print key, value
for keyin
print self.mar_env.curves.items():
key, value
print
print "\nOPTION
self.otype, TYPE"'\n'
print "PAYOFF FUNCTION"
self.payoff_func
Todefineaderivativespositionweneedtoprovidethefollowinginformation,whichis
almostthesameasfortheinstantiationofavaluationclass:
nameNameofthepositionasa string object
quantityQuantity ofoptions/derivatives
underlying
Instanceofsimulationobject as ariskfactor
mar_env
Instanceofmarket_environment
otypestring, either“European”or“American”
payoff_func
Payoffasa Pythonstring object
AUseCase
Thefollowinginteractivesessionillustratestheuseoftheclass.However,weneedto
firstdefineasimulationobject—butnotinfull;onlythemostimportant,object­specific
informationisneeded.Here,webasicallysticktothenumericalexamplesfromthe
previoustwochapters:
In [1]: fromdx
import *
Forthedefinitionofthederivativesposition, wedo notneeda“full”
market_environment object.Missinginformationisprovidedlater(duringtheportfolio
valuation),whenthesimulationobjectisinstantiated:
In [2]: me_gbm = market_environment('me_gbm', dt.datetime(2015, 1, 1))
In [3]:me_gbm.add_constant('initial_value',
me_gbm.add_constant('volatility', 36.)
0.2)
me_gbm.add_constant('currency', 'EUR')
However, for theportfoliovaluation,oneadditionalconstantisneeded—namely,forthe
modeltobeused.Thiswillbecomeclearinthesubsequentsection:
In [4]: me_gbm.add_constant('model', 'gbm')
Withthesimulationobjectavailable,wecanproceedtodefineaderivativespositionas
follows:
In [5]: fromderivatives_position import derivatives_position
In [6]: me_am_put =market_environment('me_am_put', dt.datetime(2015, 1, 1))
In [7]: me_am_put.add_constant('currency',
me_am_put.add_constant('maturity',40.)'EUR')
me_am_put.add_constant('strike', dt.datetime(2015, 12, 31))
In [8]: payoff_func = 'np.maximum(strike - instrument_values, 0)'
In [9]: am_put_pos = nderivatives_position(
ame='am_put_pos',quantity=3,
underlying='gbm',
mar_env=me_am_put,
otype='American',
payoff_func=payoff_func)
Informationaboutsuchanobjectisprovidedbythe get_info method:
In [10]: am_put_pos.get_info()
Out[10]: NAME
am_put_pos
QUANTITY
3
UNDERLYING
gbm
MARKET ENVIRONMENT
**Constants**
strike
maturity40.02015-12-31 00:00:00
currency EUR
**Lists**
**Curves**
OPTION
AmericanTYPE
PAYOFF FUNCTION - instrument_values, 0)
np.maximum(strike
DerivativesPortfolios
Fromaportfolioperspective,a“relevantmarket”ismainlycomposedoftherelevant
riskfactors(underlyings)andtheircorrelations, as wellasthederivativesandderivatives
marketmodelℳasdefinedinChapter15,andapplyingtheFundamentalTheoremof
positions,respectively,tobevalued.Theoretically,wearenowdealingwithageneral
AssetPricing(withitscorollaries)toit.[80]
TheClass
Asomewhatcomplex Python classimplementingaportfoliovaluationbasedonthe
FundamentalTheoremofAssetPricing—takingintoaccountmultiplerelevantrisk
factorsandmultiplederivativespositions—ispresentedasExample18­2.Theclassis
rathercomprehensivelydocumentedinline,especiallyduringpassagesthatimplement
functionalityspecifictothepurposeathand.
Example18­2.Aclasstovalueaderivativesportfolio
##DXLibrary Portfolio
# derivatives_portfolio.py
#iimport
mportnumpy
pandasasasnppd
from dx_valuation import *
# models= {'gbm'
models available: geometric_brownian_motion,
forriskfactormodeling
'jd'
'srd'::square_root_diffusion}
jump_diffusion,
#allowed={'European'
otypes exercise types:valuation_mcs_european,
'American' :valuation_mcs_american}
classderivatives_portfolio(object):
''' Class for building portfolios of derivatives positions.
Attributes
==========
namename: strof the object
positions : dictof positions (instances of derivatives_position class)
dictionary
val_env:
marketenvironmentfor
market_environmentthe valuation
: dict of market environments for the assets
assetsdictionary
correlations
fixed_seed : :Boolean
correlations listbetween assets
flagfor fixedrng seed
Methods
=======
get_positions
prints : about the single portfolio positions
information
get_statistics :
''' returns a pandas DataFrame object with portfolio statistics
def __init__(self, name, positions,fixed_seed=False):
correlations=None, val_env, assets,
self.name = name
self.positions
self.val_env = = positions
val_env
self.assets=
self.underlyings assets= set()
self.correlations
self.time_grid=None = correlations
self.underlying_objects
self.valuation_objects = {}
={}
self.fixed_seed
self.special_dates = fixed_seed
=[]
posin
for #determine self.positions:
earliest starting_date = \
self.val_env.constants['starting_date']
#determinemin(self.val_env.constants['starting_date'],
latest date of relevance
positions[pos].mar_env.pricing_date)
self.val_env.constants['final_date'] =\
max(self.val_env.constants['final_date'],
positions[pos].mar_env.constants['maturity'])
#collect
#add to all underlyings
set; avoids redundancy
self.underlyings.add(positions[pos].underlying)
#start=
generateself.val_env.constants['starting_date']
general time grid
time_grid
end = self.val_env.constants['final_date']
= pd.date_range(start=start,end=end,
freq=self.val_env.constants['frequency']
time_grid = ).to_pydatetime()
list(time_grid)
for pos in self.positions:
ifmaturity_date = positions[pos].mar_env.constants['maturity']
maturity_date not in time_grid:
self.special_dates.append(maturity_date)
time_grid.insert(0,maturity_date)
ififendstartnotintime_grid:
not in time_grid:
time_grid.insert(0,start)
# delete duplicate entries
time_grid.append(end)
time_grid=
# sortdateslist(set(time_grid))
time_grid.sort() in time_grid
self.time_grid = np.array(time_grid)self.time_grid)
self.val_env.add_list('time_grid',
if correlations is not None:
#take
ul_list= careof correlations
sorted(self.underlyings)
correlation_matrix =np.zeros((len(ul_list),
np.fill_diagonal(correlation_matrix, 1.0) len(ul_list)))
correlation_matrix = pd.DataFrame(correlation_matrix,
index=ul_list, columns=ul_list)
fori,corrj,=min(corr,
corrin correlations:
#fillcorrelation0.999999999999)
correlation_matrix.loc[i, matrix j]
== corr
correlation_matrix.loc[j,
#determine Cholesky matrix i] corr
cholesky_matrix = np.linalg.cholesky(np.array(correlation_matrix))
#dictionary
#slice
#respective withrandomindexnumber
oftheunderlying positions
arrayforto betheusedby
rn_set ={asset: for assetul_list.index(asset)
in self.underlyings}
##allrandomunderlyings
numbers array,
(if to be used byexist)
correlations
random_numbers= sn_random_numbers((len(rn_set),
self.val_env.constants['paths']),
len(self.time_grid),
fixed_seed=self.fixed_seed)
## addto beallshared
to valuation
with environment
every underlying that is
self.val_env.add_list('cholesky_matrix',
self.val_env.add_list('random_numbers', cholesky_matrix)
random_numbers)
forasset
self.val_env.add_list('rn_set',rn_set)
in self.underlyings:
#mar_env
select=market environment of asset
self.assets[asset]
#add valuation environment to market environment
mar_env.add_environment(val_env)
#selectright
model= simulation class
models[mar_env.constants['model']]
#ifinstantiatesimulation
correlations is not object
None:
else:self.underlying_objects[asset] = model(asset,
corr=True)mar_env,
self.underlying_objects[asset] = model(asset, mar_env,
corr=False)
for #select rightvaluationclass (European, American)
posin positions:
#pick
val_class= marketotypes[positions[pos].otype]
environment andadd valuation environment
mar_env = positions[pos].mar_env
mar_env.add_environment(self.val_env)
#self.valuation_objects[pos]
instantiate valuation class= \
val_class(name=positions[pos].name,
mar_env=mar_env,
underlying=self.underlying_objects[
payoff_func=positions[pos].payoff_func)
positions[pos].underlying],
defget_positions(self):
'''Convenience method to get information about
all barderivatives
forposin positions in a portfolio. '''
= '\n'self.positions:
+50*'-'
print bar
self.positions[pos].get_info()
printbar
def get_statistics(self,
'''Providesportfolio fixed_seed=False):
statistics.'''
res_list
# iterate=over[] all positions in portfolio
for pos, value
pvp ==p.name, in self.valuation_objects.items():
#self.positions[pos]
res_list.append([
p.quantity,
value.present_value(fixed_seed=fixed_seed)
calculate all present values for the single instruments
pv,value.currency,
#single
pv* instrument valuetimes quantity
p.quantity,
#calculate
value.delta()Delta* p.quantity,
of position
#value.vega()
calculate Vega* p.quantity,
of position
#res_df
generate apandasDataFrame object with all results
]) = pd.DataFrame(res_list,
columns=['name','quant.',
'pos_value', 'value','pos_vega'])
'pos_delta', 'curr.',
return res_df
AUseCase
Intermsofthe DXanalytics library,themodelingcapabilitiesare,onahighlevel,
restrictedtoacombinationofasimulationandavaluationclass.Thereareatotalofsix
possiblecombinations:
models ={'gbm'
'jd' :geometric_brownian_motion,
:jump_diffusion
'srd': square_root_diffusion}
otypes ={'European'
'American' :valuation_mcs_european,
: valuation_mcs_american}
Intheinteractiveusecasethatfollows,wecombineselectedelementstodefinetwo
differentderivativespositionsthatwethencombineintoaportfolio.
Webuildontheusecaseforthe derivatives_position classwiththe gbm and
am_put_pos objectsfromtheprevioussection.Toillustratetheuseofthe
derivatives_portfolio class,letusdefinebothanadditionalunderlyingandan
additionaloptionsposition.First,a jump_diffusion object:
In [11]: me_jd = market_environment('me_jd', me_gbm.pricing_date)
In [12]: #me_jd.add_constant('lambda',
add jumpdiffusion-specific0.3)parameters
me_jd.add_constant('delta', 0.1)
me_jd.add_constant('mu',-0.75)
# add otherparameters from gbm
me_jd.add_environment(me_gbm)
In [13]: #me_jd.add_constant('model',
needed for portfolio valuation
'jd')
Second,aEuropeancalloptionbasedonthisnewsimulationobject:
In [14]: me_eur_call = market_environment('me_eur_call', me_jd.pricing_date)
In [15]: me_eur_call.add_constant('strike',
me_eur_call.add_constant('maturity',38.)dt.datetime(2015, 6, 30))
me_eur_call.add_constant('currency', 'EUR')
In [16]: payoff_func = 'np.maximum(maturity_value - strike, 0)'
In [17]:eur_call_pos =derivatives_position(
name='eur_call_pos',
quantity=5,
underlying='jd',
mar_env=me_eur_call,
otype='European',
payoff_func=payoff_func)
Fromaportfolioperspective,therelevantmarketnowis:
In [18]: underlyings
positions = ={'am_put_pos'
{'gbm': me_gbm,: am_put_pos,
'jd' : me_jd}'eur_call_pos' : eur_call_pos}
Forthemomentweabstractfromcorrelationsbetweentheunderlyings.Compilinga
market_environment fortheportfoliovaluationisthelaststepbeforewe can
instantiatea derivatives_portfolio class:
In [19]: csr# discounting object for the valuation
= constant_short_rate('csr', 0.06)
In [20]: val_env.add_constant('frequency',
val_env =market_environment('general',
'W') me_gbm.pricing_date)
# monthly frequency 25000)
val_env.add_constant('paths',
val_env.add_constant('starting_date',
val_env.add_constant('final_date', val_env.pricing_date)
val_env.pricing_date)
#not yetknown; takepricing_datecsr)temporarily
val_env.add_curve('discount_curve',
# select single discount_curve for whole portfolio
In [21]: from derivatives_portfolio import derivatives_portfolio
In [22]: portfolio = derivatives_portfolio(
name='portfolio',
positions=positions,
val_env=val_env,
assets=underlyings,
fixed_seed=True)
Nowwecanharnessthepowerofthevaluationclassandgetabunchofdifferent
statisticsforthederivatives_portfolioobjectjustdefined:
In [23]:portfolio.get_statistics()
Out[23]:
name quant.5 2.814638
0eur_call_pos value curr.EUR pos_value
14.073190 pos_delta
3.3605 pos_vega
42.7900
1 am_put_pos 3 4.472021 EUR 13.416063 -2.0895 30.5181
Thesumofthepositionvalues,Deltas,andVegasisalsoeasilycalculated.Thisportfolio
isslightlylongDelta(almostneutral)andlongVega:
In [24]: portfolio.get_statistics()[['pos_value',
#aggregateoverallpositions 'pos_delta', 'pos_vega']].sum()
Out[24]: pos_value
pos_delta 27.489253
1.271000
pos_vega 73.308100
dtype: float64
Acompleteoverviewofallpositionsisconvenientlyobtainedbythe get_positions
method—suchoutputcan,forexample,beusedforreportingpurposes(butisomitted
hereduetoreasonsofspace):
In [25]: portfolio.get_positions()
Ofcourse,youcanalsoaccessanduseall(simulation,valuation,etc.)objectsofthe
derivatives_portfolio objectindirectfashion:
In [26]: portfolio.valuation_objects['am_put_pos'].present_value()
Out[26]: 4.450573
In [27]: portfolio.valuation_objects['eur_call_pos'].delta()
Out[27]: 0.6498
foreachsimulationobject:
factorsarenotcorrelated.Thisiseasilyverifiedbyinspectingtwosimulatedpaths,one
Thisderivativesportfoliovaluationisconductedbasedontheassumptionthattherisk
In [28]:path_no
path_gbm=777
= portfolio.underlying_objects['gbm'].get_instrument_values()[
:, path_no]
path_jd = portfolio.underlying_objects['jd'].get_instrument_values()[:,path_no]
Figure18­1showstheselectedpathsindirectcomparison—nojumpoccursforthejump
diffusion:
In [29]: import matplotlib.pyplot as plt
%matplotlibinline
In [30]: plt.plot(portfolio.time_grid,
plt.figure(figsize=(7, 4)) path_gbm, 'r', label='gbm')
plt.plot(portfolio.time_grid,
plt.legend(loc=0); plt.grid(True)path_jd, 'b', label='jd')
plt.xticks(rotation=30)
Figure18­1.Noncorrelatedriskfactors
Nowconsiderthecasewherethetworiskfactors are highlypositivelycorrelated:
In [31]: correlations = [['gbm', 'jd', 0.9]]
Withthisadditionalinformation,anew
instantiated: derivatives_portfolio objectistobe
In [32]: port_corr = derivatives_portfolio(
name='portfolio',
positions=positions,
val_env=val_env,as ets=underlyings,
correlations=correlations,
fixed_seed=True)
Inthiscase,thereisnodirectinfluenceonthevaluesofthepositionsintheportfolio:
In [33]: port_corr.get_statistics()
Out[33]: name quant. value curr. pos_value pos_delta pos_vega
01 eur_call_pos
am_put_pos 53 2.804464
4.458565 EUREUR 14.022320 3.3760 42.3500
13.375695 -2.0313 30.1416
However,thecorrelationtakesplacebehindthescenes.Forthegraphicalillustration,we
takethesametwopathsasbefore:
In [34]: path_jd
path_gbm==port_corr.underlying_objects['jd'].\
port_corr.underlying_objects['gbm'].\
get_instrument_values()[:,path_no]
get_instrument_values()[:, path_no]
Figure18­2nowshowsadevelopmentalmostinperfectparallelismbetweenthetwo
risk factors:
In [35]: plt.plot(portfolio.time_grid,
plt.figure(figsize=(7, 4)) path_gbm, 'r', label='gbm')
plt.plot(portfolio.time_grid,
plt.xticks(rotation=30) path_jd, 'b', label='jd')
plt.legend(loc=0); plt.grid(True)
Figure18­2.Highlycorrelatedriskfactors
approaches,liketheapplicationofanalyticalformulaeorthebinomialoptionpricing
portfoliopresentvalue.Thisissomethingimpossibletogenerateingeneralwithother
Asalastnumericalandconceptualexample,considerthefrequencydistributionofthe
model.We get thecompletesetofpresentvaluesperoptionpositionbycalculatinga
presentvalueandpassingtheparameterflag full=True:
In [36]: pv1pv1 = 5 * port_corr.valuation_objects['eur_call_pos'].\
present_value(full=True)[1]
Out[36]: array([ 0.0. ,, 22.55857473,
0. ]) 8.2552922 , ..., 0. ,
In [37]: pv2 = 3 * port_corr.valuation_objects['am_put_pos'].\
present_value(full=True)[1]
Out[37]: array([
pv2 22.04450095, 10.90940926, 20.25092898, ...,
17.7583897 , 0. ]) 21.68232889,
First,wecomparethefrequencydistributionofthetwopositions.Thepayoffprofilesof
thetwopositions,asdisplayedinFigure18­3,arequitedifferent.Notethatwelimitboth
thex­andy­axesforbetterreadability:
In [38]: plt.hist([pv1, pv2], bins=25,call', 'American put']);
label=['European
plt.axvline(pv1.mean(),
lw=1.5, color='r', ls='dashed',
label='callmean =%4.2f'% pv1.mean())
plt.axvline(pv2.mean(),
lw=1.5, label='put
color='r',meanls='dotted',
= %4.2f' % pv2.mean())
plt.xlim(0, 80); plt.ylim(0,
plt.grid(); plt.legend() 10000)

Figure18­3.Portfoliofrequencydistributionofpresentvalues
Thefollowingfigurefinallyshowsthefullfrequencydistributionoftheportfoliopresent
values.You can clearlyseeinFigure18­4theoffsettingdiversificationeffectsof
combiningacallwithaputoption:
In [39]: plt.hist(pvs,
pvs = pv1 + pv2bins=50, label='portfolio');
plt.axvline(pvs.mean(),
lw=1.5, color='r', ls='dashed',
label='mean=%4.2f'%
plt.xlim(0, 80); plt.ylim(0, 7000) pvs.mean())
plt.grid(); plt.legend()

Figure18­4.Portfoliofrequencydistributionofpresentvalues
as follows:
portfolio,measured in thestandarddeviationofthepresentvalues?Thestatisticsforthe
Whatimpactdoesthecorrelationbetweenthetworiskfactorshaveontherisk
portfoliowithcorrelationareeasilycalculated of the
In [40]: #pvs.std()
portfolio with correlation
Out[40]: 16.736290069957963
Similarly,fortheportfoliowithoutcorrelation,wehave:
In [41]: #pv1portfolio without correlation
= 5 * portfolio.valuation_objects['eur_call_pos'].\
(pv1 present_value(full=True)[1]
pv2 =+3pv2).std()
Out[41]: 21.71542409437863
* portfolio.valuation_objects['am_put_pos'].\
present_value(full=True)[1]
Althoughthe mean valuestaysconstant(ignoringnumericaldeviations),correlation
obviouslysignificantlydecreasestheportfolioriskwhenmeasuredinthisway.Again,
thisisaninsightthatitisnotreallypossibletogainwhenusingalternativenumerical
methodsorvaluationapproaches.
Conclusions
derivativespositionsdependentonmultiple,possiblycorrelated,riskfactors.Tothis
Thischapteraddressesthevaluationandriskmanagementofaportfolioofmultiple
end,anewclasscalled derivatives_position isintroduced to modelan
derivatives_portfolio
example,theclasstakescareof:
options/derivativesposition.Themainfocus,however,liesonthe
class,whichimplementssomerathercomplextasks.For
randomnumbersforthesimulationofallriskfactors)
Correlationsbetweenriskfactors(theclassgeneratesasingle,consistentsetof
generalvaluationenvironment, as well as thederivativespositions
Instantiationofsimulationobjectsgiventhesinglemarketenvironmentsandthe
Generationofportfoliostatisticsbased
terms on
involved,andthe ofthederivativespositionsalltheassumptions,theriskfactors
Theexamplespresentedinthischapter can onlyshowsomesimpleversionsof
derivativesportfoliosthatcanbemanagedandvaluedwiththe DX libraryandthe
derivatives_portfolio class.NaturalextensionstotheDXlibrarywouldbethe
additionofmoresophisticatedfinancialmodels,likeastochasticvolatilitymodel,and
option, to namejusttwo.Atthisstage,themodularmodelingandtheapplication ofa
multipleriskfactors,likeaEuropeanbasketoptionoranAmericanmaximumcall
theadditionofmultiriskvaluationclassestomodelandvaluederivativesdependenton
factors andthe accountingforthecorrelationsbetweenthemwillthenalsohaveadirect
valuationframework as generalastheFundamentalTheoremofAssetPricing(or
“GlobalValuation”)playsoutitsstrengths:thenonredundantmodelingoftherisk
Example18­3isafinal,briefwrappermodulebringingall
influenceonthevaluesandGreeksofmultiriskderivatives.
analyticslibrarytogetherforasingle import statement. componentsofthe DX
Example18­3.ThefinalwrappermodulebringingallDXcomponentstogether
#
## DXdx_library.py
Library Simulation
fromdx_valuationimport
from derivatives_position *import derivatives_position
derivatives_portfolio import derivatives_portfolio
Also,thenow­complete initfilefor the dx directoryisinExample18­4.
Example18­4.FinalPythonpackagingfile
#
#DXLibraryfile
#packaging
# __init__.py
#iimport
mport numpy
pandasasasnppd
import datetime as dt
#fromframeget_year_deltas import get_year_deltas
from
from constant_short_rate
market_environment importmarket_environment
import constant_short_rate
fromplot_option_stats import plot_option_stats
#simulation
from sn_random_numbers import sn_random_numbers
from simulation_class import simulation_class
fromgeometric_brownian_motion import geometric_brownian_motion
fromjump_diffusionimport import
fromsquare_root_diffusion jump_diffusion
square_root_diffusion
#fromvaluation
valuation_class import valuation_class
fromvaluation_mcs_european
fromvaluation_mcs_american import
import valuation_mcs_european
valuation_mcs_american
#fromportfolio
derivatives_position import
from derivatives_portfolio importderivatives_position
derivatives_portfolio
FurtherReading
Asfortheprecedingchaptersonthe DX derivativesanalyticslibrary,Glasserman(2004)
isa comprehensiveresourceforMonteCarlosimulationinthecontextoffinancial
engineeringandapplications.Hilpisch(2015)alsoprovides
implementationsofthemostimportantMonteCarloalgorithms: Python­based
Glasserman,Paul(2004):MonteCarloMethodsinFinancialEngineering.Springer,
New
Hilpisch,Yves(2015):DerivativesAnalyticswithPython.WileyFinance,
York.
Chichester,England.http://derivatives­analytics­with­python.com.
However,thereishardlyanyresearchavailablewhenitcomestothevaluationof
briefarticlebyAlbanese,Gimonet,
Carlosimulation.Anotableexception,atleastfromaconceptualpointofview,isthe
(complex)portfoliosofderivativesinaconsistent,nonredundantfashionbyMonte
paperbythesameteamofauthors: andWhite(2010a).Abitmoredetailedisthewhite
GlobalValuationModel.”RiskMagazine,Mayissue.http://bit.ly/risk_may_2010.
Albanese,Claudio,GuillaumeGimonet,andSteveWhite(2010a):“Towardsa
Albanese,Claudio,GuillaumeGimonet,andSteveWhite(2010b):“Global
ValuationandDynamicRiskManagement.”
http://www.albanese.co.uk/Global_Valuation_and_Dynamic_Risk_Management.pdf.
Inpractice,theapproachwechoosehereissometimescalledglobalvaluationinstead of
instrument­specificvaluation.Cf.thearticlebyAlbanese,Gimonet,andWhite(2010a)inRisk
[80]
Magazine.
Chapter19.VolatilityOptions
Wearefacingextremev
—CarlosGhosn o l a t i l i t y .
vtradingt
Volatilityderivativeshavebecomeanimportantriskmanagementand
olatilityoasjustoneofanumberofinputparameters,second­generation
l. Whilefirst­generationfinancialmodelsforoptionpricingtake
example,theVIXv olatility index(olaticlfi.ty asanassetclassofits own.For
modelsandproductsconsiderv
hassince2003beencalculatedandascaweightedimpliedv
daysontheS&P500index.Generally,thefixed30­daymaturitymain
certainout­of­the­moneyput olatility measureof30
http://en.wikipedia.org/wiki/CBOE_Volatility_Index),introducedin1993,
al optionswithaconstantmaturityof
varyingmaturity.
longermaturityvaluefortheindex—i.e.,betweentwosubindiceswith
indexvaluescanonlybecalculatedbyinterpolatingbetweenashorteranda
TheVSTOXXv olatility index—introducedin2005byEurex,the cf.
derivativesexchangeoperatedbyDeutscheBörseAGinGermany(
STOXX50index.[81]
however,itisbasedonimpliedvolatilitiesfromoptionsontheEURO
http://www.eurexchange.com/advanced­services/)—iscalculatedsimilarly;
Thischapterisabouttheuseofthe DX derivativesanalyticslibrary
developedinChapters15to18tovalueaportfolio of Americanputoptions
ontheVSTOXXv o l a t i l i t y index.Asoftoday,Eurexonlyoffersfutures
contractsandEuropeancal andputoptionsontheVSTOXX.Thereareno
AmericanoptionsontheVSTOXXavailableonpublicmarkets.
Thisis quiteatypicalsituationforabankmarketingandwritingoptionson
indicesthatarenotofferedbytherespectiveexchangesthemselves.For
simplicity,weassumethatthematurityoftheAmericanputoptions
coincideswiththematurityofoneofthetradedoptionss
square_root_diffusion
AsamodelfortheVSTOXXv
themajorrequirementswheni DX ofe r i e s
volatility—i.e.,meanreversionandpositivity(seealsoChapters10,14,and
classfromthe
tolcomestothemodeling
atility index,wetakethe
library.Thismodels .
aquantitylike
atisfies
16).[82]
Inparticular,thischapterimplementsthefollowingmajortasks:
Datacollection
futuresontheindex, and optionsdata.for theVSTOXXindexitself,the
Weneedthreetypesofdata,namely
Modelcalibration
Tovaluethenontradedoptions in amarket­consistentfashion,one
generallyfirstcalibratesthechosenmodeltoquotedoptionpricesin
themarketpricesaswellaspossible.
suchawaythatthemodelbasedontheoptimalparametersreplicates
Portfoliovaluation
Equippedwithaolatillitthedataandamarket­calibratedmodelforthe
nontradedoptions. y index,thefinaltaskthenistomodelandvaluethe
VSTOXXv
TheVSTOXXData
ThissectioncollectsstepbystepthenecessarydatatovaluetheAmerican
putoptionsontheVSTOXX.F i r s t , l e t usimportourl
whenitcomestothegatheringandmanagementofdata: i b r a r i e s ofchoice
In [1]: import
import numpy
pandasasasnppd
InVSTOXXIndexData
Chapter6,there isa regressionexamplebasedontheVSTOXXand
EUROSTOXX50indices.There,
forVSTOXXdailyclosingdata: wealsousethefollowingpublicsource
In[2]:url=
'http://www.stoxx.com/download/historical_values/h_vstoxx.txt'
vstoxx_index = pd.read_csv(url, index_col=0, header=2,
parse_dates=True, dayfirst=True)
In [3]:vstoxx_index.info()
Out[3]: <class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 4010 entries, 1999-01-04 00:00:00to 2014-09
26 00:00:00Data columns (total 9 columns):
V2TX
V6I1 4010
3591 non-null
non-null float64
float64
V6I2
V6I3 4010
3960 non-null
non-null float64
float64
V6I4
V6I5 4010
4010 non-null
non-null float64
float64
V6I6
V6I7 3995
4010 non-null
non-null float64
float64
V6I8 3999 non-null float64
dtypes: float64(9)
Fortheoptionsanalysistofollow,weonlyneedVSTOXXindexdatafor
irst quarter of 2014.Therefore,wecandeletebotholderandnewer
thefdatacontainednowinthe DataFramevstoxx_index:
Invstoxx_index.index)
[4]: vstoxx_index = vstoxx_index[('2013/12/31' <
'2014/4/1')] & (vstoxx_index.index <
Takingalook a t the datarevealsthatthedata
closingvaluesforthemainindex set notonlycontainsdaily
V2TX,butalso forallsubindicesfrom
V6I1 to V6I8,where thelastfigurerepresentsthematurity (1
maturity,8=longestmaturity).Aspointedoutbefore,themainindex =closest
generallyi s aninterpolationoftwosubindices,inparticular V6I1
representinginthefirst caseamaturityofunder30daysandinthesecond and V6I2,
caseofbetween30and60days:
In [5]: np.round(vstoxx_index.tail(), 2)
Out[5]:V6I8 V2TX V6I1 V6I2 V6I3 V6I4 V6I5 V6I6
V6I7 Date
2014-03-25 18.26 18.23 18.31 19.04 19.84 20.31 18.11
20.83 21.20
2014-03-26 17.59 17.48 17.70 18.45 19.42 20.00 20.26
20.45 20.86
2014-03-27
2014-03-28 17.64
20.49 20.94 17.03 17.50
16.68 17.76
17.29 18.62
18.33 19.49
19.30 20.05
19.83 20.11
20.14
20.38 20.82
2014-03-31 17.66 17.61 17.69 18.57 19.43 20.04 19.98
20.44 20.90
VSTOXXFuturesData
Thedatasetweuseforthefuturesandoptionsdatais notpubliclyavailable
inttradedontheVSTOXXv
his form.It is acompletedatasetwithdailypricesfora l instruments
olatility indexprovidedbyEurex.Thedataset
coversthecompletefirst quarterof2014:
In[6]:vstoxx_futures=
pd.read_excel('./source/vstoxx_march_2014.xlsx',
'vstoxx_futures')
In [7]: vstoxx_futures.info()
Out[7]: <class 'pandas.core.frame.DataFrame'>
Int64Index:
Data 504 entries, 0to 503
columns
A_DATE
A_EXP_YEAR (total 8 columns):
504 non-null datetime64[ns]
A_EXP_MONTH 504 non-null
non-null int64
int64
A_CALL_PUT_FLAG
A_EXERCISE_PRICE 504
504 non-null
non-null object
int64
A_SETTLEMENT_PRICE_SCALED
A_PRODUCT_ID 504
504 non-null
non-null int64
object
SETTLE 504 non-null float64object(2)
dtypes: datetime64[ns](1), float64(1), int64(4),
Severalcolumnsarenotpopulatedor not needed,suchthatwecandelete
themwithoutlossofanyrelevantinformation:
In [8]: deldelvstoxx_futures['A_CALL_PUT_FLAG']
vstoxx_futures['A_SETTLEMENT_PRICE_SCALED']
delvstoxx_futures['A_EXERCISE_PRICE']
del vstoxx_futures['A_PRODUCT_ID']
Forbrevity,werenametheremainingcolumns:
In [9]: vstoxx_futures.columns
columns = ['DATE', 'EXP_YEAR',
= columns'EXP_MONTH', 'PRICE']
Asis commonmarketpractice,exchange­tradedoptionsexpireonthethird
third_friday availablethatgives,foragivenyearandmonth,
Fridayoftheexpirymonth.Tothisend,i
function tis helpfultohaveahelper
thedateofthethirdFriday:
In [10]:import
import datetime
calendar as dt
def third_friday(date):
day = 21 - (calendar.weekday(date.year, date.month, 1) +
2) % 7 return dt.datetime(date.year, date.month, day)
ForbothVSTOXXfuturesandoptions,therearea
currentmonth(beforet t anytimeeightrelevant
his thirdFriday)or on thethirdFridayofthe
maturitieswithmonthlydifferencesstartingeitheronthethirdFridayofthe next
there are 11relevantmaturities,rangingfromJanuary2014toNovember
month(onedaybefore,on, or after this thirdFriday).[83]Inourdataset,
2014:
In [11]: set(vstoxx_futures['EXP_MONTH'])
Out[11]: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}
Wecalculatethespecificdatesofa l thirdFridaysoncetoreusethemlateirs.
NotethatApril18,2014wasapublicholidayinGermany,althoughthat
irrelevant for thefollowinganalysis:
In [12]: third_fridays
for month in = {}
set(vstoxx_futures['EXP_MONTH']):
month, 1)) third_fridays[month] = third_friday(dt.datetime(2014,
In [13]: third_fridays
Out[13]: {1:2: datetime.datetime(2014,
datetime.datetime(2014, 1,2, 17,21, 0,0, 0),0),
3:4: datetime.datetime(2014,
datetime.datetime(2014, 3,4, 21,18, 0,0, 0),0),
5:6:datetime.datetime(2014,
datetime.datetime(2014, 6,5, 20,16, 0,0, 0),0),
7:8: datetime.datetime(2014,
datetime.datetime(2014, 7,8, 18,15, 0,0, 0),0),
datetime.datetime(2014,9,10,19,17,0,0,0),0),
10:9: datetime.datetime(2014,
11: datetime.datetime(2014, 11, 21, 0, 0)}
Wrappingthematuritydate dict EXP_MONTH
easyapplicationtotherespective objectina lambda
columnofthe DataFrame
functionallowsfor
object.Forconvenience,westorethematuritydatesalongsidetheother
futuresdata:
In [14]: tfvstoxx_futures['MATURITY']
=lambda x: third_fridays[x]
vstoxx_futures['EXP_MONTH'].apply(tf)=
In [15]: vstoxx_futures.tail()
Out[15]: 499 2014-03-31
DATE EXP_YEAR
2014 EXP_MONTH7 20.40
PRICE 2014-07-18
MATURITY
500501 2014-03-31
2014-03-31 2014
2014 89 20.70
20.95 2014-08-15
2014-09-19
502503 2014-03-31
2014-03-31 20142014 1110 21.25 21.05 2014-11-21
2014-10-17
VSTOXXOptionsData
Atanytime,thereareeightfuturestradedontheVSTOXX.Incomparison,
thereareofcoursemanymoreoptions,suchthatweexpectamuchlarger
datasetforthev
quotesforthefirosltatquarterof2014:
ility options.Infact, wehavealmost47,000option
In[16]:vstoxx_options=
pd.read_excel('./source/vstoxx_march_2014.xlsx',
'vstoxx_options')
In [17]: vstoxx_options.info()
Out[17]: <class 'pandas.core.frame.DataFrame'>
Int64Index: 46960 entries, 0 to 46959
Data columns
A_DATE (total 8 columns):
46960 non-null datetime64[ns]
A_EXP_YEAR
A_EXP_MONTH 46960
46960 non-null
non-null int64
int64
A_CALL_PUT_FLAG
A_EXERCISE_PRICE 46960
46960 non-null
non-null object
int64
A_SETTLEMENT_PRICE_SCALED
A_PRODUCT_ID 46960
46960 non-null
non-null int64
object
SETTLE 46960 non-null float64object(2)
dtypes: datetime64[ns](1), float64(1), int64(4),
Asbefore,notal columnsareneeded:
In [18]: deldel vstoxx_options['A_SETTLEMENT_PRICE_SCALED']
vstoxx_options['A_PRODUCT_ID']
Arenamingofthecolumnssimplifieslater queriesabit:
In [19]: columns
'STRIKE', = ['DATE', 'EXP_YEAR', 'EXP_MONTH', 'TYPE',
'PRICE']
vstoxx_options.columns =columns
Weusethe tf
optionsdata: functiontoagainstorethematuritydatesalongsidethe
In[20]:vstoxx_options['MATURITY']
vstoxx_options['EXP_MONTH'].apply(tf)=
In [21]: vstoxx_options.head()
Out[21]: DATE EXP_YEAR EXP_MONTH TYPE STRIKE PRICE
MATURITY 0 2014-01-02
01-17 1 2014-01-02 2014 2014 1 C 1000 7.95 2014-
01-17 2 2014-01-02 2014 11 CC 1500 3.05 2014-
01-17 3 2014-01-02 2014 1 C 1600 2.20 2014-
01-17 4 2014-01-02 2014 1 C 1700 1.60 2014-
1800 1.15 2014-
01-17
Asingleoptionscontracti s on100timestheindexvalue.Therefore,the
werescalethestrikepricebydividingitby100:
strikepriceisalsoscaledupaccordingly.Tohaveaviewofasingleunit,
In [22]: vstoxx_options['STRIKE'] = vstoxx_options['STRIKE'] / 100.
Alldatafromtheexternalresourceshasnowbeencollectedandprepared.
If needed,onecansavethethree DataFrame objectsforlaterreuse:
In [23]: save = False
if saveimport
is True:
warnings
warnings.simplefilter('ignore')
h5= pd.HDFStore('./source/vstoxx_march_2014.h5',
complevel=9, complib='blosc')
h5['vstoxx_index']=
h5['vstoxx_futures'] vstoxx_index
== vstoxx_futures
h5['vstoxx_options']
h5.close() vstoxx_options
ModelCalibration
Thenextimportantstepisthecalibrationofthefinancialmodelusedto
valuetheVSTOXXoptionstoavailablemarketdata.Foranin­depth
discussionofthis topicandexamplecodein Python seeHilpisch(2015),in
particularChapter11.
RelevantMarketData
Thefirst stepwhencalibratingamodel is todecide on therelevantmarket
to beused.For the example,
dataPricingdateshallbe31 l e t
March 2014.usassumethefollowing:
OptionmaturityshallbeOctober2014.
Thefollowing
readsthe Python codedefinesthe pricing_date
initial_value fortheVSTOXXfromtherespectiveand maturity,
DataFrame
object,andalsoreadsthecorrespondingvalue
futurewiththeappropriatematurity. forward fortheVSTOXX
In [24]: pricing_date
#initial_value
last = dt.datetime(2014,
trading day in March 3, 31)
2014
# Octobermaturity
maturity= third_fridays[10]
= vstoxx_index['V2TX'][pricing_date]
#VSTOXX onpricing_date
forward = vstoxx_futures[(vstoxx_futures.DATE ==
pricing_date)
['PRICE'].values[0] & (vstoxx_futures.MATURITY == maturity)]
Outofthemanyoptionsquotesinthedatas
Fromthepricingdate e t , wetakeonlythosethatare:
Fortherightmaturitydate
Forc
moneyal optionsthatarelessthan20%out­of­the­moneyorin­the­
Wethereforehave:
In[25]: toloption_selection
=0.20 = \
vstoxx_options[(vstoxx_options.DATE == pricing_date)
&& (vstoxx_options.MATURITY == maturity)
forward) (vstoxx_options.TYPE == 'C')
& (vstoxx_options.STRIKE > (1 - tol) *
forward)] & (vstoxx_options.STRIKE < (1 + tol) *
Thisleavesthefollowingoptionquotesforthecalibrationprocedure:
In [26]: option_selection
Out[26]: DATE EXP_YEAR EXP_MONTH TYPE STRIKE PRICE
MATURITY 46482 2014-03-31
2014-10-1746483 2014-03-31 2014 10 C 17 4.85
2014-10-1746484 2014-03-31 2014 10 C 18 4.30
2014 10 C 19 3.80
2014-10-1746485 2014-03-31 2014 10 C 20 3.40
2014-10-1746486 2014-03-31 2014 10 C 21 3.05
2014-10-1746487 2014-03-31 2014 10 C 22 2.75
2014-10-1746488 2014-03-31 2014 10 C 23 2.50
2014-10-1746489 2014-03-31 2014 10 C 24 2.25
2014-10-1746490 2014-03-31 2014 10 C 25 2.10
2014-10-17
OptionModeling
Forthecalibrationofthe square_root_diffusion model,theoptions
analyticslibrarycomesintoplay;everythingelsesofar was “just”DX
selectedbeforehavetobemodeled.Thisis thefirst timethatthe
importingthelibrary:
preparationforthefollowingderivativesanalyticstasks.Webeginby
In [27]: from dx import *
Thefirst taskis thenthedefinitionofa market_environment objectforthe
VSTOXXindex,inwhichwemainlystorethepreviouslycollectedand/or
defineddata:
In [28]: me_vstoxx = market_environment('me_vstoxx', pricing_date)
In [29]: me_vstoxx.add_constant('initial_value',
me_vstoxx.add_constant('final_date', initial_value)
maturity)
me_vstoxx.add_constant('currency','EUR')
In [30]: me_vstoxx.add_constant('frequency',
me_vstoxx.add_constant('paths', 10000)'B')
In [31]: csr# =somewhat
constant_short_rate('csr',
arbitrarily chosen here0.01)
In [32]: me_vstoxx.add_curve('discount_curve', csr)
Themajorgoalofthecalibrationprocedurei s to deriveoptimalparameters
forthesquare_root_diffusionsimulationc las , namely kappa,theta,
offers.Allotherparametersareingeneraldictatedbythemarketorthetask
andvolatility.Thesearethe,sotosay,degreesoffreedomthatt his class
aneedtoprovidesomedummyvaluestoinstantiatethesimulationclass.For
tAlthoughthethree(optimal)parametersaretobenumericallyderived,we
hand.
thevolatilityparameter,wetakethehistoricalv
set: olatility givenourdata
In [33]: #me_vstoxx.add_constant('kappa',
parameters to be calibrated later1.0)
me_vstoxx.add_constant('theta', 1.2* \initial_value)
vol_est =vstoxx_index['V2TX'].std()
*np.sqrt(len(vstoxx_index['V2TX']) / 252.)
me_vstoxx.add_constant('volatility',vol_est)
In [34]: vol_est
Out[34]: 1.0384283035169406
Thenweprovidethe market_environment objecttothesimulationclass:
In [35]: vstoxx_model = square_root_diffusion('vstoxx_model',
me_vstoxx)
AlthoughtheDXlibraryi s designedtobecompletelymodular,tomodelrisk
factorsindependently(andnonredundantly)fromthederivativestobe
market_environment
valued,thisdoesnotnecessarilyhaveto be thecasewheni
object.Asinglesuchobject can beusedforboththe
t comesto a
underlyingriskfactorandtheoption tobe valued. To completethemarket
maturity: las , just addvaluesforthe strike
environmentforusewithavaluationc
andtheoption
In [36]: me_vstoxx.add_constant('strike',
me_vstoxx.add_constant('maturity',forward)
maturity)
Ofcourse,apayofffunctioni
class: s alsoneededtoinstantiatethevaluation
In [37]: payoff_func = 'np.maximum(maturity_value- strike, 0)'
In [38]: vstoxx_eur_call = valuation_mcs_european('vstoxx_eur_call',
vstoxx_model, me_vstoxx,
payoff_func)
Abriefsanitychecktoseeif themodelingsofarworks“inprinciple”:
In [39]: vstoxx_eur_call.present_value()
Out[39]: 0.379032
Tocalibratethemodeltothepreviouslyselectedoptionquotes,weneedto
modela l relevantEuropeanc a l options.Theyonlydifferentiate
themselvesbytherelevantstrikeprice;everythingelseinthemarket
environmenti s thesame.Westorethesinglevaluationobjectsina
object.Askeysforthe dict dict
object,wetaketheindexvaluesoftheoption
identification:DataFrame object option_selection forunique
quotesinthe
In [40]: option_models
foroption ={}
strike=inoption_selection.index:
me_vstoxx.add_constant('strike',
option_selection['STRIKE'].ix[option]
strike)
=\
option_models[option]valuation_mcs_european(
'eur_call_%d'%
vstoxx_model, strike,
me_vstoxx,
payoff_func)
Asinglestepinthecalibrationroutine
objectsandarevaluation of makes
thisfunctionalityintoaseparatefunction: theupdatingofal valuation
al optionsnecessary.Forconvenience,weput
In [41]: def calculate_model_values(p0):
'''Returnsallrelevant option values.
Parameters
===========
p0 :tupleof
tuple/listkappa, theta, volatility
Returns
=======
model_values
dictionary: dict
with model values
'''kappa,theta, volatility = p0
vstoxx_model.update(kappa=kappa,
theta=theta,
{} volatility=volatility)
for optionin option_models:
model_values=
model_values[option] = \
option_models[option].present_value(fixed_seed=True)
return model_values
Providingaparametertupleofkappa, theta, and volatility to the
function calculate_model_values
optionvaluesfora l relevantoptions:givesback,ceterisparibus,model
In [42]: calculate_model_values((0.5, 27.5, vol_est))
Out[42]: {46482:
46483: 3.206401,
2.412354,
46484:
46485: 1.731028,
1.178823,
46486:
46487: 0.760421,
0.46249,
46488:
46489: 0.263662,
46490: 0.142177,
0.07219}
CalibrationProcedure
Calibrationofanoptionpricingmodelis, ingeneral,aconvexoptimization
problem.Themostwidelyusedfunctionusedforthecalibration—i.e.,the
areN relevant
minimization—isthemean­squarederror(MSE) for themodeloption
valuesgiventhemarketquotesoftheoptions.Assumethere
options,andalsomodel
Equation19­1.There, and
andmarketquotes.Theproblem
are
ofthenthoption,respectively.pistheparameter set
financialmodeltothemarketquotesbasedontheMSEi of calibratinga
themarketpriceandthemodelprice
providedasinputto
s thengivenin
theoptionpricingmodel.
Equation19­1.Modelcalibrationbasedonmean­squarederror

ThePythonfunction mean_squared_error implementst


modelcalibrationtechnically.Aglobalvariablei s usedtocontrolthe
his approachto
outputofintermediateparameter tuple objectsandtheresultingMSE:
In[43]: idefmean_squared_error(p0):
= 0''' Returns the mean-squared error given
the model and market values.
Parameters
===========
p0 :tupleof
tuple/listkappa, theta, volatility
Returns
=======
MSE :mean-squared
float error
'''global i
model_values =
np.array(calculate_model_values(p0).values())
market_values = option_selection['PRICE'].values
option_diffs=model_values - market_values
MSE= np.sum(option_diffs**2)/len(option_diffs)
if#iifvectorized '%4sMSE calculation
i==print0:0:('i',
% 20== 'kappa',
%6s%6s'theta',
%6s-->'vola',
%6s'%\'MSE')
print '%4d(i, %6.3f p0[0], %6.3f
p0[1], %6.3f-->
p0[2], %6.3f' % \
MSE)
i+=
return1 MSE
Again,abriefchecktoseeif thefunctionworksinprinciple:
In [44]: mean_squared_error((0.5, 27.5, vol_est))
Out[44]: i0 kappa theta 1.038
0.500 27.500 vola -->--> 4.390MSE
4.3899900376937779
Chapter9introducesthe Python and SciPy functionsforconvex
optimizationproblems.Wewillapplythesehereaswell,sowebeginwith
animport:
In [45]: import scipy.optimize as spo
Thefollowingcalibrationusesbothglobaloptimizationviathe brute
functionandlocaloptimizationviathefminfunction.First,theglobal
optimization:
In [46]: %%time
iopt_global
= 0 = spo.brute(mean_squared_error,
((0.5, 3.01, 0.5), # range for kappa
(15.,30.1,
(0.5, 5.51, 5.),
1)), ## range
range forfor theta
volatility
finish=None)
Out[46]: i0 kappa 0.500 theta 0.500
15.000 vola -->--> 10.393MSE
2040 0.500
1.000 30.000
25.000 1.500
3.500 -->--> 2.071
0.180
6080 1.500
2.000 20.000
20.000 5.500
1.500 -->
--> 0.718
5.501
100120 2.500
2.500 15.000
30.000 3.500
5.500 --> 5.571
-->--> 22.992
140 3.000
CPUWalltimes: 30.000
time: user s s, sys: 1.6814.493
20.3 18.6 1.500 s, total: 20.3 s
low:Theintermediateoptimalresults areas follows.TheMSEisalreadyquite
In [47]: imean_squared_error(opt_global)
=0
Out[47]: i0 kappa theta 4.500
1.500 20.000 vola -->--> 0.008MSE
0.0076468730485555626
Next,weusetheintermediateoptimalparametersasinputforthelocal
optimization:
In [48]: %%time
iopt_local
= 0 = spo.fmin(mean_squared_error, opt_global,
xtol=0.00001, ftol=0.00001,
maxiter=100, maxfun=350)
kappa 20.000
Out[48]: 0i 1.500 theta 4.500vola -->--> 0.008MSE
2040 1.510
1.563 19.235
18.926 4.776
4.844 -->--> 0.008
0.005
6080 1.555
1.556 18.957
18.947 4.828
4.832 -->
--> 0.005
0.005
1001201.556
1.556 18.948
18.948 4.831-->0.005
4.831 --> 0.005
OptimizationCurrentterminated
functionsuccessfully.
value:1380.004654
CPU times:Function
Iterations: 64s, sys: 1.67 s, total: 19.3 s
user 17.7evaluations:
Wall time: 19.4 s
Thistimetheresultsare:
In[49]: imean_squared_error(opt_local)
=0
Out[49]: i0 kappa vola
1.556 18.948 4.831 -->--> 0.005MSE
theta
0.0046542736439999875
Theresultingmodelvaluesare:
In [50]: calculate_model_values(opt_local)
Out[50]: {46482:
46483: 4.746597,
4.286923,
46484:
46485: 3.863346,
3.474144,
46486:
46487: 3.119211,
46488: 2.793906,
2.494882,
46489:
46490: 2.224775,
1.98111}
Letusstoretheseinthe option_selectionDataFrame andcalculatethe
differencesfromthemarketprices:
In [51]: option_selection['MODEL']=\
np.array(calculate_model_values(opt_local).values())
option_selection['ERRORS']=\
option_selection['MODEL'] -
option_selection['PRICE']
Wegetthefollowingresults:
In[52]: option_selection[['MODEL', 'PRICE', 'ERRORS']]
MODEL PRICE
Out[52]: 46482 4.746597 4.85 ERRORS
-0.103403
46483
46484 4.286923
3.863346 4.30
3.80 -0.013077
0.063346
46485
46486 3.474144
3.119211 3.40
3.05 0.074144
0.069211
46487
46488 2.793906
2.494882 2.75
2.50 0.043906
-0.005118
46489
46490 2.224775
1.981110 2.25
2.10 -0.025225
-0.118890
Theaveragepricingerroris relativelylow,at lessthan1cent:
In [53]: round(option_selection['ERRORS'].mean(), 3)
Out[53]: -0.002
Figure19­1showsalltheresultsgraphically.Thelargestdifferencei
observedforthecal optionthatis farthestoutofthemoney: s
In [54]: import matplotlib.pyplot
%matplotlib inline as plt
8)) fix, (ax1, ax2) = plt.subplots(2, sharex=True, figsize=(8,
strikes= option_selection['STRIKE'].values
ax1.plot(strikes, option_selection['PRICE'], label='market
quotes') ax1.plot(strikes, option_selection['MODEL'], 'ro',
label='modelax1.set_ylabel('option
values') values')
ax1.grid(True)
ax1.legend(loc=0)
wiax2.bar(strikes
=0.25 - wi / 2., option_selection['ERRORS'],
label='market quotes', width=wi)
ax2.grid(True)
ax2.set_ylabel('differences')
ax2.set_xlabel('strikes')
Figure19­1.CalibratedmodelvaluesforVSTOXXcalloptionsvs.market
quotes
AmericanOptionsontheVSTOXX
Amajorprerequisiteforvaluingandmanagingoptionsnottradedat
exchangesi s acalibratedmodelthatis asconsistentaspossiblewithmarket
realities—i.e.,quotesforliquidlytradedoptionsintherelevantmarket.
Thisis whattheprevioussectionhasasthemainresult. Thismainresultis
usedinthissectiontovalueAmericanputoptionsontheVSTOXX,akind
consistingofAmericanputoptionswiththesamematurity andstrikesas
ofderivativeinstrumentnottradedinthemarket.Weassumeaportfolio
theEuropeancal optionsusedforthemodelcalibration.
ModelingOptionPositions
Thef i r s t stepwhenvaluingaderivativesportfoliowiththe DX analytics
libraryis todefinetherelevantriskfactorsbya market_environment
object.Atthisstage,itdoesnotnecessarilyhavetobecomplete;missing
dataandobjectsmightbeaddedduringtheportfoliovaluation(
orfrequency): e.g., paths
In[55]: me_vstoxx = market_environment('me_vstoxx',initial_value)
me_vstoxx.add_constant('initial_value', pricing_date)
me_vstoxx.add_constant('final_date', pricing_date)
me_vstoxx.add_constant('currency', 'NONE')
Ofcourse,weusetheoptimalparametersfromthemodelcalibration:
In [56]: #me_vstoxx.add_constant('kappa',
adding optimal parameters to environment
me_vstoxx.add_constant('theta', opt_local[0])
me_vstoxx.add_constant('volatility',
opt_local[1])
opt_local[2])
Inaportfoliocontext,thespecificationofasimulationclass/modeli
necessary: s
In [57]: me_vstoxx.add_constant('model', 'srd')
TodefinethevaluationclassesfortheAmericanputoptions,wearemainly
missinganappropriatepayofffunction:
In [58]: payoff_func = 'np.maximum(strike - instrument_values, 0)'
Asbefore,allAmericanoptionsdifferonlywithrespecttotheirstrike
prices.Itthereforemakessensetodefineashared market_environment
objectfirst:
In [59]: shared =market_environment('share',
shared.add_constant('maturity', pricing_date)
maturity)
shared.add_constant('currency', 'EUR')
defineonederivatives_position after theother,usingthedefining
Itremainstoloopoveral relevantoptions,picktherelevantstrike,and
market_environment object:
In [60]: option_positions
#dictionary = {}option positions
for
#dictionaryfor option
option_environments ={} environments
for optionin option_selection.index:
option_environments[option]=\
pricing_date) market_environment('am_put_%d'
# define
% option,
new option environment, one for each option
strike = option_selection['STRIKE'].ix[option]
# pick therelevantstrike
option_environments[option].add_constant('strike',
strike) option_environments[option].add_environment(shared)
#addittotheenvironment
#add the shared data % strike]=\
option_positions['am_put_%d'
derivatives_position(
'am_put_%d'%strike,quantity=10 .,
mar_env=option_environments[option],
underlying='vstoxx_model',
otype='American',
payoff_func=payoff_func)
Notethatweuse100asthepositionquantitythroughout,whichi
typicalcontractsizeforVSTOXXoptions. s the
TheOptionsPortfolio
byallobjectsintheportfolio:
togetherdefineourvaluationenvironment—i.e.,thoseparametersshared
Tocomposetheportfolio,weneedtospecifyacoupleofparametersthat
In [61]: val_env.add_constant('starting_date',
val_env =market_environment('val_env',pricing_date)
pricing_date)
val_env.add_constant('final_date', pricing_date)
val_env.add_curve('discount_curve', csr)valuation
#val_env.add_constant('frequency','B')
temporaryvalue,isupdated during
val_env.add_constant('paths',25000)
Themarketis rathersimple;it consistsofasingleriskfactor:
In [62]: underlyings = {'vstoxx_model' : me_vstoxx}
Takingal this togetherallowsustodefinea derivatives_portfolio
object:
In [63]: portfolio = derivatives_portfolio('portfolio',
option_positions, val_env, underlyings)
Thevaluationtakesquiteabitoftime,sincemultipleAmericanoptionsare
valuedbytheLeast­SquaresMonteCarloapproachandmultipleGreeks
alsohavetobeestimatedbyrevaluationsusingthesamecomputationally
demandingalgorithm:
In [64]: %time results = portfolio.get_statistics(fixed_seed=True)
Out[64]: CPUWalltimes:
time: user
40.6 38.6
s s, sys: 1.96 s, total: 40.6 s
Theresults DataFrameofobjecti
bettercomparativeview stics: name columntohavea
thestastibestsortedbythe
In [65]: results.sort(columns='name')
Out[65]: name quant.100 4.575197
pos_vega 8 am_put_17 value curr.EUR pos_value
457.5197 pos_delta
-24.85
102.77 1 am_put_18 100 5.203648 EUR 520.3648 -30.62
107.93 0 am_put_19 100 5.872686 EUR 587.2686 -33.31
107.79 2 am_put_20 100 6.578714 EUR 657.8714 -34.82
110.01 6 am_put_21 100 7.320523 EUR 732.0523 -39.46
105.20 7 am_put_22 100 8.081625 EUR 808.1625 -40.61
102.38 3 am_put_23 100 8.871962 EUR 887.1962 -43.26
104.37
101.04 45 am_put_24 100 9.664272 EUR 966.4272 -40.14
102.81 am_put_25 100 10.475168 EUR 1047.5168 -45.74
Thisportfoliois,asexpectedforaportfoliooflongAmericanputoptions,
short(negative)Deltaandlong(positive)Vega:
In [66]: results[['pos_value','pos_delta','pos_vega']].sum()
Out[66]: pos_value
pos_delta 6664.3795
-332.8100
pos_vega
dtype: float64944.3000
Conclusions
Thischapterpresentsalarger,realistic use case fortheapplicationofthe DX
analyticslibrarytothevaluationofaportfolioofnontradedAmerican
optionsontheVSTOXXv olatility index.Thechapteraddressesthreemain
tasksinvolvedinanyreal­worldapplication:
Datagathering
Current,correctmarketdatabuildsthebasisofanymodelingand
in derivativesanalytics; we needindexdataandfutures
valuationeffort
data,aswell as optionsdatafortheVSTOXX.
Modelcalibration
Tovalue,manage,andhedgenontradedoptionsandderivativesina
market­consistentfashion,oneneedstocalibratethemodelparameters
andstrikes).Ourmodelofchoicei
totherelevantoptionmarketquotes(relevantwithregardtomaturity
s thesquare­rootdiffusion,whichis
appropriateformodelingav
quitegoodalthoughthemodelonlyolatilityoffers threedegreesoffreedomare
index;thecalibrationresults
(kappaasthemean­reversionfactor, theta asthelong­termv
and volatility asthevolatility ofthevolatility, orso­called“vol­o l a t i l i t y ,
vol”).
Portfoliovaluation
Basedonthemarketdataandthecalibratedmodel,aportfoliowiththe
AmericanputoptionsontheVSTOXXi s modeledandmajorstatistics
(positionvalues,Deltas,andVegas)aregenerated.
TherDXealliibsrtaircyusecaseint his chaptershowstheflexibility andthepowerof
theapplicationlargelycomparabletothebenchmarkcaseofaBlack­Scholes­
; itessentiallyallowsustoaddressanyanalyticaltaskwith
regardtoderivatives.Theveryapproachandarchitecturemakethe
despitethefactthatunderneaththesurface,heavynumericalroutinesand
algorithmsareapplied.
MertonanalyticalformulaforEuropeanoptions.Oncethevaluationobjects
aredefined,youcanusethemsimilarlytoananalyticalformula—andthis

FurtherReading
Eurex’s“VSTOXXAdvancedServices”tutorial pagesprovideawealthof
informationabouttheVSTOXXindexandrelatedv
Thesepagesalsoprovidelotsofreadilyusable Pythonolatiscriptstoreplicate
lity derivatives.
theresultsandanalysespresentedinthet
TheVSTOXXAdvancedServicestutorialpages utorials: fromEurexare
bavailableathttp://www.eurexchange.com/advanced­services/vstoxx/,
whilea
acktestingap licationisprovidedatht p:/ w w.eurexchange.com/advanced­services/ap 2/.
Thefollowingbookisagoodgeneralreferenceforthetopicscoveredin
this chapter,especiallywhenit comestothecalibrationofoptionpricing
models:
Hilpisch,Yves(2015):DerivativesAnalyticswithPython.Wiley
Finance,Chichester,England.http://derivatives­analytics­with­
python.com.
Withregardtotheconsistentvaluationandmanagementofderivatives
portfolios,seealsothehintsat theendofChapter18.
[81]Fordetails onhowtheVSTOXXi s calculatedandhowyoucancalculateitby
to collectthenecessarydataandtodothecalculations—
yourself—using Python
seethe Python­based tutorial.
considerEuropeanoptions,forwhichtheycomeupwithaclosed­formsolution.
Longstaff(1996)isalsobasedonthesquare­rootdiffusion.However,theyonly
[82] Oneoftheearliervolatility optionpricingmodelsbyGruenbichlerand
Forareviewofthemodelanda Python implementationofit, referto
http://www.eurexchange.com/advanced­services/vstoxx/.Seealsothewebservice
exampleinChapter14,whichi
formula. s basedontheir modelandanalyticalvaluation
expiry. volatility derivativeshavetheir last tradingdaytwodaysbefore
[83]VSTOXX
AppendixA.SelectedBest
Practices
Bestpracticesingeneralarethoserules,eitherwrittendownformallyorj
practicedindailyl i f e , thatmaydistinguishtheexpert Python developer ust
fromthecasual Python user.Therearemanyofthese,andthis appendix
willintroducesomeofthemoreimportantones.
PythonSyntax
Onereallyhelpfulfeatureof Spyder asan integrateddevelopment
environment i s itsautomaticsyntaxandcodechecking,whichchecks
Python codeforcompliancewiththePEP8recommendationsfor Python
syntax.Butwhatiscodifiedin“PythonEnhancementProposal8”?
Principally,therearesomecodeformattingrulesthatshouldbothestablisha
twherecertainsyntaxrulesalsoapply.
commonstandardandallowforbetterreadabilityofthecode.Inthatsense,
Forexample,considerthecodein
his approachisnottoodissimilarfromawrittenorprintednaturallanguage
valuationofaEuropean are c a l Example1­1ofChapter1forthe
optionviaMonte Carlo simulation.F i r s t , have
PEP8. aIt
(sometimesthere
alook theversionoft
isratherpacked,becausethereareblanklinesandspacesmissing
alsotoomanyspacesorblankl
his codeinExampleA­1thatdoesnotconform
ines). to
ExampleA­1.APythonscriptthatdoesnotconformtoPEP8
# Monte Carlo valuation of European call option
## inbsm_mcs_euro_syntax_false.py
Black-Scholes-Merton model
import numpy
#Parameter as np
Valuesindex level
S0=100.#initial
K=105.#strikeprice
T=1.0#time-to-maturity
r=0.05#risklessshortrate
sigma #number
I=100000 =0.2#volatility
# Valuation Algorithmof0.5*sigma**
simulations2)+sigma*sqrt(T)*
z=np.random.standard_normal(I)#pseudorandom
ST=S0*np.exp((r- numbers z)#index values at
maturity
hT=np.maximum(ST-K,0)#inner
C0=np.exp(-r*T)*sum(hT)/I# valuesCarloat maturity
Monte estimator
#Result Output
print"Value of the European Call Option %5.3f"%C0
Now,takealook a t theversion in ExampleA­2thatconformstoPEP8(
exactlytheonefoundinExample1­1).Themaindifferenceinreadability i . e . ,
stemsfromtwof a c t s :
Useofblanklinestoindicatecodeblocks
Useofspacesaround Python operators(
anyhashcharacterforcomments e.g., = or *) aswellasbefore
(here: twospaces)
ExampleA­2.APythonscriptthatconformstoPEP8
#
## Monte
in Carlo valuation ofmodel
Black-Scholes-Merton European call option
## bsm_mcs_euro_syntax_correct.py
import numpy as np
S0=
# Parameter
100. #initialindex
Values price level
TK = 1.0105. ##strike
time-to-maturity
r = 0.05= 0.2# risklessshort
sigma # volatility rate
I = 100000 # number of simulations
ST=
z# =Valuation Algorithm-0.5* # pseudorandom numbers
np.random.standard_normal(I)
S0 * np.exp((r
#indexvalues at maturitysigma ** 2) * T + sigma * np.sqrt(T) * z)
C0hT ==np.maximum(ST 0) # inner/ Ivalues
np.exp(-r * T)-*K,np.sum(hT) # Monteat Carlo
maturityestimator
#Result
print"ValueOutputofthe European Call Option %5.3f" % C0
Althoughthefirstversioni s
anyotherswhomaytrytounderstandiperfectlyexecutablebythe
more
thesecondversionforsureis readableforboththeprogrammerand
t. Python interpreter,
formatting.Ingeneral,therearesupposedtobetwoblanklinesbeforeany
Somespecialrulesapplytofunctionsandclasseswhenit comesto
achievedthroughspacesandnotthroughtabulators.Asageneralrule,take
functions,indentationalsocomesintoplay.Ingeneral,indentationis
newfunction(method)definitionaswellasanynewclassdefinition.With
fourspacesperlevelofi n d e n t a t i o n . [ 8 4
ExampleA­3.APythonfunctionwithmultipleindentations ] ConsidernowExampleA­3.
## Function to check prime characteristic of integer
## is_prime_no_doc.py
def is_prime(I):
if type(I)
raise != int: has not theright type.")
TypeError("Input
if Iraise
<= 3:ValueError("Number too small.")
else:if I%print2 ==0:
"Number is even, therefore not prime."
else:efornd =i int(I/2.) +1 2):
in range(3, end,
if I%i
print==0:
"Number is not prime, it is divided by %d." %
i
if ibreak
>= end"Number
print - 2: is prime."
Python. Thereare
multiplelevelsofindentationtoindicatecodeblocks,heremainly“caused”
Weimmediatelynoticetheroleindentationplaysin
bycontrolstructureelements(
are e . g . , if or else) or loops(e.g.,the for loop).
Controlstructureelements
ofthefunctionshouldbe clear explainedinChapter4,butthebasicworking
even if you are notyetusedto Python
syntax.TableA­1l ists anumberofheavilyused Python operators.
WheneverthereisaquestionmarkinthedescriptioncolumnofTableA­1,
theoperationreturnsaBooleanobject(i.e., TrueorFalse).
TableA­1.SelectedPythonoperators
Symbol Description
+ Addition
­ Subtraction
/ Division
* Multiplication
% Modulo
== Is equal?
!= Is notequal?
< Is smaller?
<= Is equalorsmaller?
> Is larger?
>= Is equalorlarger?
Documentation
Thetwomainelementsof Python documentationare:
Inlinedocumentation
Inlinedocumentationcaninprinciplebeplacedanywhereinthecode;i
igeneral,thereshouldbea
s indicatedbytheuseofoneormoreleadinghashcharacters( t
t leasttwospacesbeforeahash. # ) . In
Documentationstrings
Suchstringsareusedtoprovidedocumentationfor
(methods)andclasses,andaregenerallyplacedwithint
(at thebeginning of theindentedcode). Python
h e i rfunctions
A­4 showsthesamefunctiondefinitiondefinition
documentation.Example
ThecodeinExampleA­2containsmultipleexamplesofinline
ExampleA­3,butt h i s timewithadocumentationstringadded. asin
ExampleA­4.ThePythonfunctionis_primewithdocumentationstring
## Function to check
is_prime_with_doc.py prime characteristic of integer
#
def is_prime(I):
'''Function to test for prime characteristic of an integer.
Parameters
==========
I : intnumber to be checked for prime characteristc
Returns
=======
output:statesstringwhether number is prime or not;
if not,provide aprime factor
Raises
======
TypeError
if argument is not an integer
ValueError
ifthe integer is too small(2 or smaller)
Examples
========
>>> is_prime(11)
Number isprime.
>>> is_prime(8)
iseven, therefore not prime.
>>> is_prime(int(1e8
Number is prime. + 7))
>>>
'''iftype(I)
raise TypeError("Input
!=int: has not the right type.")
ifI<=raise3:ValueError("Number too small.")
else:if I%print2 ==0:
"Number is even, therefore not prime."
else:end =int(I / 2.) + 1
for ifi inI %range(3,
i == 0:end, 2):
i print "Number is not prime, it is dividedby %d." %
if ibreak - 2: is prime."
>= end"Number
print
Ingeneral,suchadocumentationstringprovidesinformationaboutthe
followingelements:
InputWhichparameters/argumentstoprovide,andinwhichformat(e.g., int)
OutputWhatthefunction/methodreturns,andinwhichformat
ErrorsWhich(“special”)errorsmightberaised
Examples
Exampleusageofthefunction/methods
Theuseofdocumentationstringsisnotonlyhelpfulforthosewhotakea
lookatthecodeitself.ThemajorityofPythontools,like IPythonand
Spyder,
is_prime allowdirectaccesstot
FigureA­1showsascreenshot h i s documentationandhelpsource.
functionintheobjectinspector(upperright).Thisi l ustrates howhelpful it
of Spyder,this timewiththefunction
shownintheeditorandtherendereddocumentationstringofthe
istoalwaysincludemeaningfuldocumentationstringsinfunctionsand
classes.

FigureA­1.ScreenshotofSpyderwithcustomfunctionandnicelyrendered
documentationstring
UnitTesting
Asafinalbestpractice,wewanttoconsiderunittesting.Amongthe
practicebecause it tests Python codeonaratherfundamentallevel—i.e.,the
differenttestingapproaches,unittestingcanindeedbeconsideredabest
singleunits.Whatitdoesnottest,however,istheintegrationof
uaprettysimpleexampleofa are
nits. Typically,suchunits Python the single
functions,classes,ormethodsofclasses.As
considertheoneinExampleA­5. functionthatis alsoeasilytestable,
ExampleA­5.ArathersimplePythonfunction
#
## Simple function
thesquare of tosquare
the calculateroot
## ofsimple_function.py
apositive number
#from mathimport sqrt
def f(x):
''' Function to calculate the square of the square root.
Parameters
==========
x : float
input ornumberint
Returns
=======
fx :square
float of the square root, i.e. sqrt(x) ** 2
Raises
======
TypeError
if argument is neither float nor integer
ValueError
if argument is negative
Examples
========
1>>>>>> f(10.5)
f(1)
10.5
'''iftype(x)
raise !=floatand type(x)
TypeError("Input has not
!= int:right type.")
the
if x<raise0: ValueError("Number negative.")
fxreturn= sqrt(x)
fx ** 2
simplefunction ffrom ExampleA­5.
useofnoseinwhatfollows.ExampleA­6containsasmalltestsuiteforthe
Therearemanytoolsavailablethathelpsupportunittests.Wewillmake
ExampleA­6.At
# e s t suiteforthefunctionf
### nose_test.py
Test suite for simple function f
import nose.tools as ntimport f
from simple_function
def test_f_calculation():
'''Testsiffunction
nt.assert_equal(f(4.),f4.)calculates
nt.assert_equal(f(1000),
nt.assert_equal(f(5000.5), 1000)5000.5) correctly. '''
def test_f_type_error():
'''nt.assert_raises(TypeError,
Tests if type error is raised.
f, 'test'''string')
nt.assert_raises(TypeError, f, [3, 'string'])
def test_f_value_error():
'''Testsif valueerroris raised. '''
nt.assert_raises(ValueError, f, -1)
nt.assert_raises(ValueError, f, -2.5)
def '''Testsiffunction
test_f_test_fails(): test fails. '''
nt.assert_equal(f(5.), 10)
TableA­2describesthetest functionsthatareimplemented.
TableA­2.Testfunctionsforsimplefunctionf
Function Description
test_f_calculation Testsifthefunctiongeneratescorrectresults
test_f_type_error Checksif thefunctionraisesatypeerrorwhenexpected
test_f_value_error Checksif thefunctionraisesavalueerrorwhenexpected
test_f_test_fails Testsif thecalculationtest fails asexpected(foril ustration)
Fromthecommandline/shel , youcanrunthefollowingtests:
$...Fnosetests nose_test.py
======================================================================
FAIL: Test if function test fails.
---------------------------------------------------------------------
-Traceback (most recent call last):
File "/Library/anaconda/lib/python2.7/site-packages/nose/case.py",
line197,inrunTestself.test(*self.arg)
File
"//Users/yhilpisch/Documents/Work/Python4Finance/python/nose_test.py",
linent.assert_equal(f(5.),
30,in test_f_test_fails
10) != 10
AssertionError: 5.000000000000001
-Ran---------------------------------------------------------------------
4 tests in 0.002s
FAILED
$ (failures=1)
expected.Usingsuchtools—andmoreimportantly,implementingarigorous
Obviously,thefirstthreetests aresuccessful,whilethelastonefails as
thoseworkingwithyourcodewillbenefitinthelongrun.
approachtounittesting—mayrequiremoreeffortupfront,butyouand
Themajorityof(Python)editorsallowTabustokey.Someeditorsalsoallow
[84]numberofspacesevenwhenpushingthe configuretheuseofacertain
semiautomaticreplacementoftabswithspaces.
AppendixB.CallOptionClass
ExampleB­1containsaclassdefinitionforaEuropeanc
Black­Scholes­Merton(1973)model( a l optioninthe
cf. Chapter3,andinparticular
Example3­1).
ExampleB­1.ImplementationofaBlack­Scholes­Mertoncalloptionclass
#
#Valuation
incl. VegaofEuropean
## --class-based function callimplied
and
implementation
optionsvolatility
in Black-Scholes-Merton
estimation Model
#bsm_option_class.py
#from import log, sqrt,
from math
scipy import stats exp
class'''ClassforEuropean
call_option(object): call options in BSM model.
Attributes
==========
S0:float
:initial
float stock/index level
TK : float
strike price
r : maturity
float (in year fractions)
sigma:constant
floatrisk-free shortrate
volatility factor in diffusion term
Methods
= = = =vreturn
alue: floatpresent value of call option
vega:returnfloatVega of call option
imp_vol:return floatimplied volatility given option quote
''' S0,
def __init__(self,
self.S0 = float(S0)K, T, r, sigma):
self.K
self.T == KT
self.r
self.sigma= r =sigma
defvalue(self):
'''Returns option
+ (self.r + 0.5*value.
d1 =((log(self.S0
/(self.sigma ''' ** 2) * self.T)
self.sigma
*self.K)
/sqrt(self.T)))
d2 =+(self.r
((log(self.S0
- 0.5/self.K)
* self.sigma ** 2) * self.T)
value/ =(self.S0
(self.sigma**stats.norm.cdf(d1,
sqrt(self.T))) 0.0, 1.0)
1.0)) return- self.K value
* exp(-self.r * self.T) * stats.norm.cdf(d2, 0.0,
def vega(self):
'''d1 =Returns Vega of/option. '''
((log(self.S0 self.K)
+ (self.r + 0.5* sqrt(self.T)))
* self.sigma ** 2) * self.T)
vega /(self.sigma
=self.S0 * stats.norm.cdf(d1, 0.0, 1.0)* sqrt(self.T)
returnvega
def '''imp_vol(self,
Returns C0, sigma_est=0.2,
implied volatility it=100):
given option price. '''sigma_est)
option=
for iin call_option(self.S0, self.K, self.T, self.r,
range(it):-= (option.value() - C0) / option.vega()
option.sigma
return option.sigma
Thisclasscanbeusedinaninteractive IPython sessionasfollows:
In [1]: from bsm_option_class import call_option
In [2]: otype(o)
= call_option(100., 105., 1.0, 0.05, 0.2)
Out[2]: bsm_option_class.call_option
value = o.value()
In [3]:value
Out[3]: 8.0213522351431763
In [4]: o.vega()
Out[4]: 54.222833358480528
In [5]: o.imp_vol(C0=value)
Out[5]: 0.20000000000000001
Theoptionclasscanbeeasilyusedtovisualize,forexample,thevalueand
Vegaoftheoptionfordifferentstrikesandmaturities.Thisi s , intheend,
oneofthemajoradvantagesofhavingsuchformulaeavailable.The
following Python codegeneratestheoptions
strikecombinations: t a t i s t i c s fordifferentmaturity­
In [6]: maturities
import numpy= np.linspace(0.05,
as np 2.0, 20)
T,VCstrikes =np.linspace(80, 120,20)
K= np.zeros_like(C)
np.zeros_like(K) maturities)
=np.meshgrid(strikes,
for tforinkenumerate(maturities):
o.To.Kin==enumerate(strikes):
t[1]
k[1]k[0]] = o.value()
C[t[0],
V[t[0], k[0]] = o.vega()
Fsomelibrariesandfunctions:
irst, let ushavealookattheoptionvalues.Forplotting,weneedtoimport
In [7]: from
importmpl_toolkits.mplot3d
matplotlib.pyplot asimport
plt Axes3D
from pylab importcm
%matplotlib inline
Theoutputofthefollowingcodeis presentedinFigureB­1:
In [8]: figax ==fig.gca(projection='3d')
plt.figure(figsize=(12, 7))
surf = ax.plot_surface(T,K,
cmap=cm.coolwarm, C,linewidth=0.5,
rstride=1, cstride=1,
antialiased=True)
ax.set_xlabel('strike')
ax.set_ylabel('maturity')
ax.set_zlabel('European call option value')
fig.colorbar(surf, shrink=0.5, aspect=5)
Second,wehavetheresultsfortheVegaofthec
FigureB­2: al option,asshownin
In [9]: figsurf
ax ===fig.gca(projection='3d')
plt.figure(figsize=(12,
ax.plot_surface(T, K, V,7))rstride=1, cstride=1,
antialiased=True) cmap=cm.coolwarm, linewidth=0.5,
ax.set_xlabel('strike')
ax.set_ylabel('maturity')
ofEuropean call
ax.set_zlabel('Vega
fig.colorbar(surf, shrink=0.5, option')
aspect=5)
plt.show()

FigureB­1.ValueofEuropeancalloption
FigureB­2.VegaofEuropeancalloption
ExampleB­1oft
ComparedwiththecodeinExample3­1ofChapter3,thecodein
his appendixshowsanumberofadvantages:
Betteroverallcodestructureandreadability
Avoidanceofredundantdefinitionsasfaraspossible
Betterreusabilityandmorecompactmethodcal s
Theoptionclassalsolendsi
statistics. t s e l f prettywelltothevisualizationofoption
AppendixC.DatesandTimes
Asinthemajority in of
importantrolet comesto s c i e n t i f i c disciplines,datesandtimesplay
finance.Thisappendixintroducesdifferentaspectsoft an
Python programming. It cannot,ofcourse,notbehis
exhaustive.However,itprovidesanintroductionintothemainareasofthe
topicwheni
Python ecosystemthatsupportthemodelingofdateandtimeinformation.
Python
Thedatetimemodulefromthe Python standardlibraryallowsfor the
implementation of themostimportantdateandtime­relatedtasks.[85]We
start byimportingthemodule:
In [1]: import datetime as dt
Twodifferentfunctionsprovidetheexactcurrentdateandtime:
In [2]: dt.datetime.now()
Out[2]: datetime.datetime(2014, 9, 14, 19, 22, 24, 366619)
In [3]: toto = dt.datetime.today()
Out[3]: datetime.datetime(2014, 9, 14, 19, 22, 24, 491234)
Theresultingobjectis a datetime object:
In [4]: type(to)
Out[4]: datetime.datetime
datetime weekday providesthenumberforthedayoftheweek,givena
Themethodobject:
In [5]: dt.datetime.today().weekday()
#zero-based numbering; 0 = Monday
Out[5]: 6
Suchanobjectcan,ofcourse, be directlyconstructed:
In [6]: dd = dt.datetime(2016, 10, 31, 10, 5, 30, 500000)
Out[6]: datetime.datetime(2016, 10, 31, 10, 5, 30, 500000)
In [7]: print d
Out[7]: 2016-10-31 10:05:30.500000
In [8]: str(d)
Out[8]: '2016-10-31 10:05:30.500000'
Fromsuchanobjectyoucaneasilyextract,forexample,
information,andsoforth: year,month,day
In [9]: d.year
Out[9]: 2016
In [10]: d.month
Out[10]: 10
In [11]: d.day
Out[11]: 31
In [12]: d.hour
Out[12]: 10
Viathemethod toordinal, youcantranslatethedateinformationto
ordinalnumberrepresentation:
In [13]: oo = d.toordinal()
Out[13]: 736268
Thisalsoworkstheotherwayaround.However,youlosethetime
informationduringthis process:
In [14]: dt.datetime.fromordinal(o)
Out[14]: datetime.datetime(2016, 10, 31, 0, 0)
Ontheotherhand,you can separateoutthetimeinformationfromthe
datetime object,whichthengivesyoua time object:
In [15]: tt = dt.datetime.time(d)
Out[15]: datetime.time(10, 5, 30, 500000)
In [16]: type(t)
Out[16]: datetime.time
date object:
Similarly,youcanseparateoutthedateinformationonly,endingupwitha
In [17]: dddd = dt.datetime.date(d)
Out[17]: datetime.date(2016, 10, 31)
Often,acertaindegreeofprecisioni s s u f i c i e n t . Tot h i
simplyreplacecertainattributesofthe datetime objectwithliteral:s end,youcan
In [18]: d.replace(second=0, microsecond=0)
Out[18]: datetime.datetime(2016, 10, 31, 10, 5)
timedelta isanotherclassofobjectsthatresultfromarithmeticoperations
ontheotherdate­time­relatedobjects:
In [19]: tdtd = d - dt.datetime.now()
Out[19]: datetime.timedelta(777, 52983, 863732)
In [20]: type(td)
Out[20]: datetime.timedelta
Again,youcanaccesstheattributesdirectlytoextractdetailedinformation:
In[21]: td.days
Out[21]: 777
In [22]: td.seconds
Out[22]: 52983
In [23]: td.microseconds
Out[23]: 863732
In [24]: td.total_seconds()
Out[24]: 67185783.863732
Therearemultiplewaystotransforma
representations,aswellastogenerate datetime objectintodifferent
datetimeobjects outo f , say,a
string object.Detailsarefoundinthedocumentationofthe datetime
module.Hereareafewexamples:
In [25]: d.isoformat()
Out[25]: '2016-10-31T10:05:30.500000'
In [26]: d.strftime("%A, %d. %B %Y %I:%M%p")
Out[26]: 'Monday, 31. October 2016 10:05AM'
In[27]: dt.datetime.strptime('2017-03-31',
#year first and four-digit year '%Y-%m-%d')
Out[27]: datetime.datetime(2017, 3, 31, 0, 0)
In [28]: dt.datetime.strptime('30-4-16',
#dayfirstand two-digityear'%d-%m-%y')
Out[28]: datetime.datetime(2016, 4, 30, 0, 0)
In [29]: dsds = str(d)
Out[29]: '2016-10-31 10:05:30.500000'
In [30]: dt.datetime.strptime(ds, '%Y-%m-%d %H:%M:%S.%f')
Out[30]: datetime.datetime(2016, 10, 31,10, 5,30,500000)
Inadditiontothe now and today functions,thereisalsothe
function,whichgivestheexactdateandtimeinformationinUTC utcnow
(CoordinatedUniversalTime,formerlyknownasGreenwichMeanTime,
orGMT).Thisrepresentsatwo­hourdifferencefromtheauthor’stime
zone(CET):
In [31]: dt.datetime.now()
Out[31]: datetime.datetime(2014, 9, 14, 19, 22, 28, 123943)
In [32]: dt.datetime.utcnow()
# Coordinated Universal Time
Out[32]: datetime.datetime(2014, 9, 14, 17, 22, 28, 240319)
In [33]: dt.datetime.now()
# UTC + 2h = CET-(summer)
dt.datetime.utcnow()
Out[33]: datetime.timedelta(0, 7199, 999982)
Anotherclassofthe datetime
zoneclasswithmethods modulei
utcoffset,dst,s the
andtzinfo class,agenerictime
tzname.dst standsfor
DaylightSavingTime(DST).AdefinitionforUTCtimemightlookas
follows:
In [34]: classdefUTC(dt.tzinfo):
utcoffset(self, d):
return
def return dt.timedelta(hours=0)
dst(self,dt.timedelta(hours=0)
d):
def tzname(self,
return "UTC" d):
Thiscanbeusedasanattributetoa
replace method: datetime objectandbedefinedviathe
In [35]: uu == dt.datetime.utcnow()
#u u.replace(tzinfo=UTC())
attach time zone information
Out[35]: datetime.datetime(2014, 9, 14, 17, 22, 28, 597383, tzinfo=
<__main__.UTC
object at 0x7f59e496ec10>)
Similarly,thefollowingdefinitionis forCETduringthesummer:
In[36]: classdefCET(dt.tzinfo):
utcoffset(self, d):
return dt.timedelta(hours=2)
def dst(self, d):d):
return dt.timedelta(hours=1)
def tzname(self,
return"CET + 1"
Makinguseofthe astimezone methodthenmakesi t straightforwardto
transformtheUTC­based datetime object u intoaCET­basedone:
In [37]: u.astimezone(CET())
Out[37]: datetime.datetime(2014, 9, 14, 19, 22, 28, 597383, tzinfo=
<__main__.CET
object at 0x7f59e79d8f10>)
Thereis a Python moduleavailablecalled pytz thatimplementsthemost
importanttimezonesfromaroundtheworld:
In [38]: import pytz
country_names and country_timezones aredictionariescontainingthe
countriesandtimezonescovered:
In [39]: pytz.country_names['US']
Out[39]: u'United States'
In [40]: pytz.country_timezones['BE']
Out[40]: [u'Europe/Brussels']
In[41]: pytz.common_timezones[-10:]
Out[41]: ['Pacific/Wake',
'Pacific/Wallis',
'US/Alaska',
'US/Arizona','US/Central',
'US/Eastern',
'US/Hawaii',
'US/Mountain',
'US/Pacific',
'UTC']
With pytz, thereis generallynoneedtodefineyourown tzinfo objects:
In [42]: uu == dt.datetime.utcnow()
u u.replace(tzinfo=pytz.utc)
Out[42]: datetime.datetime(2014, 9, 14, 17, 22, 29, 503702, tzinfo=
<UTC>)
In [43]: u.astimezone(pytz.timezone("CET"))
Out[43]:
<DstTzInfodatetime.datetime(2014,
'C 9, 14, 19, 22, 29, 503702, tzinfo=
ET' CEST+2:00:00 DST>)
In [44]: u.astimezone(pytz.timezone("GMT"))
Out[44]: datetime.datetime(2014, 9, 14, 17, 22, 29, 503702, tzinfo=
<StaticTzInfo
'GMT'>)
In[45]: u.astimezone(pytz.timezone("US/Central"))
Out[45]: datetime.datetime(2014,
<DstTzInfoS/Central' 9, 14,
'U CDT-1 day, 19:00:00 DST>) 12, 22, 29, 503702, tzinfo=
NumPy
Since NumPy1.7, therehasbeennativedate­timeinformationsupportin
NumPy.The basicclassis called datetime64:
In [46]: import numpy as np
In [47]: ndnd = np.datetime64('2015-10-31')
Out[47]: numpy.datetime64('2015-10-31')
Likeobjects:datetime objects, datetime64 objectscanberepresentedas string
In [48]: np.datetime_as_string(nd)
Out[48]: '2015-10-31'
Everysuchobject has metadatastoredwithit, whichcanbeaccessedvia
thedatetime_datamethod.Thetwomaincomponentsarethefrequency
information( e.g., D forday)andtheunit(e.g., 1 foronedayinourcase):
In [49]: np.datetime_data(nd)
Out[49]: ('D', 1)
Adatetime64object can easilybeconstructedfroma datetime object:
In [50]: d
Out[50]: datetime.datetime(2016, 10, 31, 10, 5, 30, 500000)
In [51]: ndnd = np.datetime64(d)
Out[51]: numpy.datetime64('2016-10-31T11:05:30.500000+0100')
Similarly,usingthe astype
intoa datetime object: method,a datetime64 objectcanbeconverted
In [52]: nd.astype(dt.datetime)
Out[52]: datetime.datetime(2016, 10, 31, 10, 5, 30, 500000)
string object,
e.g.,withyearandmonth,andthefrequencyinformation.Notethatinthe
Anotherwaytoconstructsuchanobjectisbyprovidinga
followingcase,theobjectvaluedefaultstothefirst dayofthemonth:
In [53]: ndnd = np.datetime64('2015-10', 'D')
Out[53]: numpy.datetime64('2015-10-01')
informationgivenisthesame—eveni
Comparingtwodatetime64objectsyieldsa True valuewheneverthe
f thelevelofdetailisdifferent:
In[54]: np.datetime64('2015-10') == np.datetime64('2015-10-01')
Out[54]: True
Ofcourse,youcanalsodefine
datetime64 objects: ndarray objectscontainingmultiple
Indtype='datetime64')
[55]: np.array(['2016-06-10', '2016-07-10', '2016-08-10'],
Out[55]: array(['2016-06-10', '2016-07-10', '2016-08-10'],
dtype='datetime64[D]')
In [56]: np.array(['2016-06-10T12:00:00',
'2016-08-10T12:00:00'],'2016-07-10T12:00:00',
dtype='datetime64[s]')
Out[56]: array(['2016-06-10T12:00:00+0200', '2016-07-
10T12:00:00+0200','2016-08-10T12:00:00+0200'], dtype='datetime64[s]')
Youcanalsogeneraterangesofdatesbyusingthefunction arange.
Differentfrequencies(e.g., days,months,orweeks)areeasilytakencare
of:
In [57]: np.arange('2016-01-01', '2016-01-04',
# daily frequency as default in thisdtype='datetime64')
case
Out[57]: array(['2016-01-01', '2016-01-02', '2016-01-03'],
dtype='datetime64[D]')
In[58]: np.arange('2016-01-01',
#monthly frequency '2016-10-01', dtype='datetime64[M]')
Out[58]:'2016-06',
05', array(['2016-01', '2016-02', '2016-03', '2016-04', '2016
'2016-07', '2016-08', '2016-09'],
dtype='datetime64[M]')
In [59]: np.arange('2016-01-01', '2016-10-01', dtype='datetime64[W]')
[:10] # weekly frequency
Out[59]:
21', array(['2015-12-31',
'2016-01-28', '2016-01-07',
'2016-02-04', '2016-01-14',
'2016-02-11', '2016-01-
'2016-02-
18', '2016-02-25', '2016-03-03'], dtype='datetime64[W]')
Youcanalsoeasilyusesubdayfrequencies,likehoursorseconds(referto
thedocumentationforal options):
In02T00:00:00',
[60]: dtl = np.arange('2016-01-01T00:00:00', '2016-01-
#dtl[:10] dtype='datetime64[h]')
hourly frequency
Out[60]:
01T02+0100',array(['2016-01-01T00+0100', '2016-01-01T01+0100', '2016-01-
01T05+0100', '2016-01-01T03+0100', '2016-01-01T04+0100', '2016-01-
01T08+0100', '2016-01-01T06+0100', '2016-01-01T07+0100', '2016-01-
'2016-01-01T09+0100'], dtype='datetime64[h]')
Plottingdate­timeand/ortimeseriesdatacansometimesbetricky.
matplotlib hasgoodsupportforstandard datetime objects.
Transforming datetime64as thefollowingexample,whoseresultisshownin
FigureC­1,illustrates:
generallydoesthetrick, informationinto datetime information
In [61]: import matplotlib.pyplot as plt
%matplotlibinline
In [62]: np.random.seed(3000)
rnd= np.random.standard_normal(len(dtl)).cumsum() ** 2
In [63]: plt.plot(dtl.astype(dt.datetime),
fig = plt.figure() to rnd)
#convert np.datetime datetime.datetime
fig.autofmt_xdate()
plt.grid(True)
# autoformatting of datetime x-ticks
FigureC­1.Plotwithdatetime.datetimex­ticksautoformatted
Finally,wealsohaveanil ustration ofusing arange withsecondsand
millisecondsasfrequencies:
In [64]: np.arange('2016-01-01T00:00:00',
# secondsasfrequency '2016-01-02T00:00:00',
dtype='datetime64[s]')[:10]
Out[64]: array(['2016-01-01T00:00:00+0100', '2016-01-
01T00:00:01+0100',
01T00:00:03+0100','2016-01-01T00:00:02+0100', '2016-01-
01T00:00:05+0100','2016-01-01T00:00:04+0100', '2016-01
01T00:00:07+0100','2016-01-01T00:00:06+0100', '2016-01
'2016-01-01T00:00:08+0100', '2016-01
01T00:00:09+0100'], dtype='
datetime64[s]')
In [65]: np.arange('2016-01-01T00:00:00', '2016-01-02T00:00:00',
dtype='datetime64[ms]')[:10]
# milliseconds as frequency
Out[65]: array(['2016-01-01T00:00:00.000+0100', '2016-01-
01T00:00:00.001+0100',
'2016-01-01T00:00:00.002+0100', '2016-01-
01T00:00:00.003+0100',
'2016-01-01T00:00:00.004+0100', '2016-01-
01T00:00:00.005+0100',
'2016-01-01T00:00:00.006+0100', '2016-01
01T00:00:00.007+0100',
'2016-01-01T00:00:00.008+0100', '2016-01
01T00:00:00.009+0100'],
dtype='datetime64[ms]')
pandas
The pandas librarywasspecificallydesignedwithtimeseriesdatainmind.
date­timeinformation,likethe DatetimeIndex classfortimeindices(cf.
Therefore,thelibraryprovidesclassesthatareabletoefficientlyhandle
thedocumentation):
In [66]: import pandas as pd
Date­timeinformationin pandas is generallystoredasa Timestamp object:
In [67]: tsts = pd.Timestamp('2016-06-30')
Out[67]: Timestamp('2016-06-30 00:00:00')
Suchobjectsareeasilytransformedintoregular
to_datetime method: datetime objectswiththe
In [68]: dd = ts.to_datetime()
Out[68]: datetime.datetime(2016, 6, 30, 0, 0)
datetime
Similarly,aTimestampobjecti
object: s straightforwardlyconstructedfroma
In[69]: pd.Timestamp(d)
Out[69]: Timestamp('2016-06-30 00:00:00')
orfroma NumPydatetime64 object:
In [70]: pd.Timestamp(nd)
Out[70]: Timestamp('2015-10-01 00:00:00')
Anotherimportantclassi s the DatetimeIndex clas , whichis acollection
and http://bit.ly/datetimeindex_doc).Such(cf.an
Timestamp objectswithanumberofpowerfulmethodsattached
ofhttp://bit.ly/date_range_doc
flexibleandpowerfulforconstructingtimeindices(seeChapter6formore
objectcanbeinstantiatedwiththedate_rangefunction,whichis rather
detailsonthisfunction):
In [71]: dtidti = pd.date_range('2016/01/01', freq='M', periods=12)
Out[71]: <class 'pandas.tseries.index.DatetimeIndex'>
[2016-01-31, ..., 2016-12-31]
Length: 12, Freq: M, Timezone: None
Singleelementsoftheobjectareaccessedbytheusualindexingoperations:
In [72]: dti[6]
Out[72]: Timestamp('2016-07-31 00:00:00', offset='M')
DatetimeIndex objectscanbetransformedintoarraysof
throughthemethod to_pydatetime: datetimeobjects
In [73]: pdipdi = dti.to_pydatetime()
Out[73]: array([datetime.datetime(2016,
datetime.datetime(2016, 1,2, 31,29, 0,0, 0),0),
datetime.datetime(2016, 3,4, 31,30, 0,0, 0),0),
5,6, 31,30, 0,0, 0),0),
datetime.datetime(2016,
datetime.datetime(2016, 7,8, 31,31, 0,0, 0),0),
datetime.datetime(2016,
datetime.datetime(2016, 9,10,30,31,0,0,0),0),
datetime.datetime(2016,
datetime.datetime(2016, 11,12, 31,30, 0,0, 0),0)], dtype=object)
Usingthe DatetimeIndex constructoralsoallowstheoppositeoperation:
In [74]: pd.DatetimeIndex(pdi)
Out[74]: <class 'pandas.tseries.index.DatetimeIndex'>
[2016-01-31, ..., 2016-12-31]
Length: 12, Freq: None, Timezone: None
Inused:thecaseof NumPydatetime64 objects,the astype methodhastobe
In [75]: pd.DatetimeIndex(dtl.astype(pd.datetime))
Out[75]: <class 'pandas.tseries.index.DatetimeIndex'>
[2015-12-31 23:00:00,None,...,Timezone:
2016-01-01None22:00:00]
Length: 24, Freq:
pandas takes care ofproperplotting of date­timeinformation(see
FigureC­2andalsoChapter6):
In [77]:
[76]: dfrnd==pd.DataFrame(rnd,
np.random.standard_normal(len(dti)).cumsum()
columns=['data'], index=dti)** 2
In [78]: df.plot()
FigureC­2.pandasplotwithTimestampx­ticksautoformatted
pandas alsointegrateswellwiththe pytz moduletomanagetimezones:
In [79]: pd.date_range('2016/01/01', freq='M', periods=12,
tz=pytz.timezone('CET'))
Out[79]: <class 'pandas.tseries.index.DatetimeIndex'>
[2016-01-31 00:00:00+01:00, ...,2016-12-31
Length: 12, Freq:M,Timezone: CET 00:00:00+01:00]
Intz='US/Eastern')
[80]: dti = pd.date_range('2016/01/01', freq='M', periods=12,
dti
Out[80]: <class 'pandas.tseries.index.DatetimeIndex'>
[2016-01-31 00:00:00-05:00, ..., 2016-12-31 00:00:00-05:00]
Length: 12, Freq: M, Timezone: US/Eastern
Usingthe tz_convert method, DatetimeIndex objectscanbetransformed
fromonetimezonetoanother:
In [81]: dti.tz_convert('GMT')
Out[81]: <class
[2016-01-31
'pandas.tseries.index.DatetimeIndex'>
05:00:00+00:00, ..., 2016-12-31 05:00:00+00:00]
Length: 12, Freq: M, Timezone: GMT
[85] Formoreinformationonthis module,seetheonlinedocumentation.
Index
A NOTE ON THE DIGITAL INDEX
Alinkinanindexentryisdisplayedasthesectiontitleinwhich
markers, it isnotunusualforanentrytohaveseverallinkstothe
thatentryappears.Becausesomesectionshavemultipleindex
samesection.Clickingonanylinkwilltakeyoudirectlytothe
placeinthetextinwhichthemarkerappears.
Symbols
64­bitdoubleprecisionstandard,Floats
A
absoluteminimumvarianceportfolio,PortfolioOptimizations
actualcontinuationvalue,TheValuationClass
adaptivequadrature,NumericalIntegration
Americanexercise
definitionof,DerivativesValuation,AmericanExercise
Least­SquaresMonteCarlo(LSM)algorithm,Least­Squares
MonteCarlo
usecase,AUseCase–AUseCase
valuationclass,TheValuationClass
Americanoptions
definitionof,Valuation
ontheVSTOXX,AmericanOptionsontheVSTOXX–TheOptions
Portfolio
valuationofcontingentclaims,AmericanOptions
Anaconda,Anaconda–Anaconda
benefitsof, Anaconda
condapackagemanager,Anaconda
downloading,Anaconda
installing,Anaconda
libraries/packagesavailable,Anaconda
multiplePythonenvironments,Anaconda
analytics
basic,BasicAnalytics
derivativesanalyticslibrary
derivativesvaluation,DerivativesValuation–AUseCase
extensionsto, Conclusions
modularizationofferedby,PortfolioValuation
portfoliovaluation,PortfolioValuation–AUseCase
simulationoffinancialmodels,SimulationofFinancial
Models–AUseCase
valuationframework,ValuationFramework–Market
Environments
volatilityoptions,VolatilityOptions–TheOptionsPortfolio
financial
definitionof,TheRiseofReal­TimeAnalytics
impliedv o
Volatilitiesl a t i l i t i e s example,ImpliedVolatilities–Implied
MonteCarlosimulationexample,MonteCarlo
Simulation–GraphicalAnalysis
retrievingdata,FinancialData–FinancialData
sizeofdatasets, Input/OutputOperations
technicalanalysisexample,TechnicalAnalysis–Technical
Analysis
interactive
benefitsofPythonfor,Shortertime­to­results–Ensuringhigh
performance
publishingplatformforsharing,Basicusage
toolsfor,Tools–Spyder
real­time,TheRiseofReal­TimeAnalytics
annualizedperformance,TheData
antitheticpaths,RandomNumberGeneration
antitheticvariates,VarianceReduction
applicationdevelopment
benefitsofPythonforend­to­end,FromPrototypingtoProduction
documentationbestpractices,Documentation
rapidwebapplications,RapidWebApplications–Styling
syntaxbestpractices,PythonSyntax
toolsfor,Tools–Spyder
unittestingbestpractices,UnitTesting
approximationoffunctions,Approximation–Interpolation
interpolation,Interpolation
regression,Regression–Multipledimensions
arbitraryprecisionfloats,Floats
arrays
DataFramesand,SecondStepswithDataFrameClass
input­outputoperationswithPyTables,WorkingwithArrays
memorylayoutand,MemoryLayout
regularNumPyarrays,RegularNumPyArrays–RegularNumPy
Arrays
structureof,NumPyDataStructures
structuredarrays,StructuredArrays
withPythonlists, ArrayswithPythonLists
writing/readingNumPy,WritingandReadingNumPyArrays
averagelosslevel,CreditValueAdjustments
B
basicanalytics,BasicAnalytics
Bayesianregression,BayesianRegression–RealData
diachronicinterpretationofBayes’sformula,Bayes’sFormula
introductoryexample,IntroductoryExample
overviewof, Statistics,Conclusions
PyMC3library,PyMC3
realdata,RealData–RealData
beliefsofagents,Statistics
Bermudanexercises,AmericanOptions,AmericanExercise
bestpractices
documentation,Documentation
functionalprogrammingtools,Excursion:Functional
Programming
syntax,PythonSyntax
unittesting,UnitTesting
bfil parameter,RegressionAnalysis
bigdata,TheRiseofReal­TimeAnalytics,Input/OutputOperations
binomialmodel,Least­SquaresMonteCarlo
binomialoptionpricing,BinomialOptionPricing–BinomialOption
Pricing
Black­Scholes­Mertonmodel
classdefinitionforEuropeancalloption,CallOptionClass–Call
OptionClass
Europeancalloption,FinanceandPythonSyntax–Financeand
PythonSyntax
formulafor,ImpliedVolatilities
LaTeXcode for, MarkdownandLaTeX
parametersmeanings,ImpliedVolatilities
simulatingfutureindexlevel,RandomVariables
stochasticdifferentialequation,MonteCarloSimulation
VegaofaEuropeanoption,ImpliedVolatilities
Bokehlibrary
benefitsof, StaticPlots
defaultoutput,StaticPlots
interactiveplots,InteractivePlots
plottingstyles,StaticPlots
real­timeplots,Real­TimePlots
stand­alonegraphicsfiles, InteractivePlots
boxplots,OtherPlotStyles
broadcasting,BasicVectorization
brutefunction,ConvexOptimization,CalibrationProcedure
C
calloptions
classdefinitionforEuropean,CallOptionClass–CallOptionClass
definitionof, Valuation
candlestickplots,FinancialPlots
capitalassetpricingmodel,NormalityTests
capitalmarketline, CapitalMarketLine
cashflowseries,CashFlowSeriesClass,CashFlowSeriesClasswith
GUI
cel s
inDataNitro,ScriptingwithDataNitro
inExcelspreadsheets,ReadingfromWorkbooks
inIPython,Basicusage
characters,symbolsfor,One­DimensionalDataSet
classes
accessingattributevalues,Basics of PythonClasses
assigningnewattributevalues,BasicsofPythonClasses
attributesand,BasicsofPythonClasses
cashflowseriesexample,CashFlowSeriesClass
defining,BasicsofPythonClasses
definingobjectattributes,BasicsofPythonClasses
forrisk­neutraldiscounting,ConstantShortRate
genericsimulationclass,GenericSimulationClass
genericvaluationclass,GenericValuationClass
geometricBrownianmotion,GeometricBrownianMotion
inheritancein,BasicsofPythonClasses
iterationover,BasicsofPythonClasses
jumpdiffusion,JumpDiffusion
privateattributes,BasicsofPythonClasses
readabilityandmaintainabilityof,BasicsofPythonClasses
reusabilityand,BasicsofPythonClasses
simpleshortrateclassexample,SimpleShortRateClass
square­rootdiffusion,Square­RootDiffusion
tomodelderivativesportfolios,TheClass
tomodelderivativespositions,TheClass
valuationclassforAmericanexercise,TheValuationClass
valuationclassforEuropeanexercise,TheValuationClass
coefficientofdetermination,Multipledimensions
colorabbreviations,One­DimensionalDataSet
comma­separatedvalue(CSV)files
generatingExcelspreadsheetswith,GeneratingWorkbooks(.xls)
input­outputoperationswithpandas,DataasCSVFile
parametersofread_csvfunction,RegressionAnalysis
reading/writing,ReadingandWritingTextFiles
regularexpressionsand,Strings
retrievingviatheWeb,urllib
communicationprotocols
filetransferprotocol,WebBasics
hypertexttransferprotocol,httplib
providingwebservicesvia,WebServices–TheImplementation
secureconnections,ftplib
uniformresourcelocators,urllib
compilation
dynamic,DynamicCompiling–BinomialOptionPricing
static, StaticCompilingwithCython–StaticCompilingwithCython
compiledlanguages,BasicDataTypes
compressedtables,workingwith,WorkingwithCompressedTables
concatenatefunction,VarianceReduction
condapackagemanager,Anaconda
configure_traitsmethod,ShortRateClasswithGUI
constantshortrate,ConstantShortRate
constrainedoptimization,ConstrainedOptimization
contingentclaims,valuationof,Valuation–AmericanOptions
Americanoptions,AmericanOptions
Europeanoptions,EuropeanOptions
continuationvalue,AmericanOptions,Least­SquaresMonteCarlo
controlstructures,Excursion:ControlStructures
conveniencemethods,BasicAnalytics
convexoptimization,ConvexOptimization–ConstrainedOptimization
constrained,ConstrainedOptimization
functionsfor,CalibrationProcedure
global,GlobalOptimization
local,LocalOptimization
covariancematrix,TheData
covariances,NormalityTests
Cox­Ingersoll­RossSDE,Square­rootdiffusion
Cox­Ross­Rubinsteinbinomialmodel,Least­SquaresMonteCarlo
creditvalueadjustment(CVA),CreditValueAdjustments
creditvalue­at­risk(CVaR),CreditValueAdjustments
CSS(CascadingStyleSheets),Styling
cubicsplines,Interpolation
Cythonlibrary,BasicDataTypes,StaticCompilingwithCython
D
data
basicdatastructures,BasicDataStructures–Sets
basicdatatypes,BasicDataTypes–Strings
bigdata,TheRiseofReal­TimeAnalytics,Input/Output
Operations
formatssupportedbypandaslibrary,I/Owithpandas
highfrequency,High­FrequencyData
high­frequency,Real­timestockpricequotes
missingdata,FirstStepswithDataFrameClass,BasicAnalytics
noisydata,Noisydata
NumPydatastructures,NumPyDataStructures–Structured
Arrays
provision/gatheringwith web technology,WebIntegration
qualityofwebsources,FinancialPlots,FinancialData
real­timeforeignexchange,Real­timeFXdata
real­timestockpricequotes,Real­timestockpricequotes
resamplingof, High­FrequencyData
retrieving,FinancialData–FinancialData
sourcesof,FinancialData
storageof, Input/OutputOperations
unsorteddata,Unsorteddata
VSTOXXdata,TheVSTOXXData–VSTOXXOptionsData
datavisualization
3Dplotting,3DPlotting
Bokehlibraryfor,StaticPlots
financialplots,FinancialPlots
forimpliedvolatilities, ImpliedVolatilities
graphicalanalysisofMonteCarlosimulation,GraphicalAnalysis
interactiveplots,InteractivePlots
panning/zooming,InteractivePlots
plot_surfaceparameters,3DPlotting
plt.axisoptions,One­DimensionalDataSet
plt.candlestickparameters,FinancialPlots
plt.histparameters,OtherPlotStyles
plt.legendoptions,Two­DimensionalDataSet
real­timeplots,Real­TimePlots
standardcolorabbreviations,One­DimensionalDataSet
standardstylecharacters,One­DimensionalDataSet
staticplots,StaticPlots
Styles
two­dimensionalplotting,Two­DimensionalPlotting–OtherPlot
DataFrameclass,FirstStepswithDataFrameClass–SecondStepswith
DataFrameClass
arraysand,SecondStepswithDataFrameClass
featuresof, FirstStepswithDataFrameClass
frequencyparametersfordate­rangefunction,SecondStepswith
DataFrameClass
lineplotofDataFrameobject,BasicAnalytics
parametersofDataFramefunction,SecondStepswithDataFrame
Class,I/Owithpandas
parametersofdate­rangefunction,SecondStepswithDataFrame
Class
similaritytoSQLdatabasetable,FirstStepswithDataFrameClass
vectorizationwith,FinancialData
DataNitro
benefitsof,ScriptingExcelwithPython
cellattributes,ScriptingwithDataNitro
cel methods,ScriptingwithDataNitro
cel typesettingoptions,ScriptingwithDataNitro
combiningwithExcel,WorkingwithDataNitro
installing,InstallingDataNitro
optimizingperformance,ScriptingwithDataNitro
plottingwith,PlottingwithDataNitro
scriptingwith,ScriptingwithDataNitro
user­definedfunctions,User­definedfunctions
DataReaderfunction,FinancialData
datesandtimes
describedbyregularexpressions,Strings
impliedvolatilities example,ImpliedVolatilities–Implied
Volatilities
inrisk­neutraldiscounting,ModelingandHandlingDates
MonteCarlosimulationexample,MonteCarlo
Simulation–GraphicalAnalysis
NumPysupportfor,NumPy–pandas
pandassupportfor,pandas–pandas
Pythondatetimemodule,Python–NumPy
technicalanalysisexample,TechnicalAnalysis–TechnicalAnalysis
(see alsofinancialtimeseriesdata)
datetimemodule,Python–NumPy
datetime64class,NumPy–pandas
date_rangefunction,SecondStepswithDataFrameClass
default,probabilityof,CreditValueAdjustments
Deltas,GenericValuationClass
dependentobservations,Regression
deployment
Anaconda,Anaconda–Anaconda
PythonQuantplatform,PythonQuantPlatform
viawebbrowser,PythonQuantPlatform
derivativesanalyticslibrary
derivativesvaluation,DerivativesValuation–AUseCase
extensionsto,Conclusions
goalsfor,DerivativesAnalyticsLibrary
modularizationofferedby,PortfolioValuation
portfoliovaluation,PortfolioValuation–AUseCase
simulationoffinancialmodels,SimulationofFinancialModels–A
UseCase
valuationframework,ValuationFramework–Market
Environments
volatilityoptions,VolatilityOptions–TheOptionsPortfolio
derivativesportfolios
classforvaluation,TheClass
relevantmarketfor,DerivativesPortfolios
usecase,AUseCase–AUseCase
derivativespositions
definitionof, DerivativesPositions
modelingclass,TheClass
usecase,AUseCase
derivativesvaluation
Americanexercise,AmericanExercise–AUseCase
Europeanexercise,EuropeanExercise–AUseCase
genericvaluationclass,GenericValuationClass
methodsavailable,DerivativesValuation
deserialization,WritingObjectstoDisk
diachronicinterpretation(ofBayes’sformula),Bayes’sFormula
dicts,Dicts
differentiation,Differentiation
discounting,SimpleShortRateClass,Risk­NeutralDiscounting
discretizationerror,RandomVariables
diversification,TheData
documentation
bestpractices,Documentation
documentationstrings,Documentation
IPythonNotebookfor,Basicusage
dotfunction,Individualbasisfunctions,TheBasicTheory
DX(DerivativesAnalytiX)library,DerivativesAnalyticsLibrary
dynamiccompiling,DynamicCompiling–BinomialOptionPricing
binomialoptionpricing,BinomialOptionPricing–BinomialOption
Pricing
exampleof,IntroductoryExample
dynamicallytypedlanguages,BasicDataTypes
E
earlyexercisepremium,AmericanOptions
editors
configuring,Spyder
Spyder,Spyder
efficiency,EfficiencyandProductivityThroughPython–Ensuringhigh
performance
efficientfrontier,EfficientFrontier
efficientmarketshypothesis,NormalityTests
encryption,ftplib
errors
discretizationerror,RandomVariables
mean­squarederror(MSE),CalibrationProcedure
samplingerror,RandomVariables
estimatedcontinuationvalue,TheValuationClass
Eulerscheme,FullVectorizationwithLogEulerScheme,Square­root
diffusion,Square­RootDiffusion
Europeanexercise
definitionof,DerivativesValuation
MonteCarloestimatorforoptionvalues,EuropeanExercise
usecase,AUseCase–AUseCase
valuationclass,TheValuationClass
Europeanoptions
definitionof, Valuation
valuationofcontingentclaims,EuropeanOptions
Excel
basicspreadsheetinteraction,BasicSpreadsheetInteraction–Using
pandasforReadingandWriting
benefitsof,ExcelIntegration
cel typesin, ReadingfromWorkbooks
drawbacksof, ExcelIntegration
featuresof, ExcelIntegration
file input­outputoperations,DataasExcelFile
integrationwithPython,ExcelIntegration
integrationwithxlwings,xlwings
scriptingwithPython,ScriptingExcelwithPython–User­defined
functions
excursion
controlstructures,Excursion:ControlStructures
functionalprogramming,Excursion:FunctionalProgramming
expectedportfolioreturn,TheBasicTheory
expectedportfoliovariance,TheBasicTheory
F
fattails, Value­at­Risk,Real­WorldData
f il parameter,RegressionAnalysis
file transferprotocol,WebBasics
fillnamethod,RegressionAnalysis
finance
mathematicaltoolsfor,MathematicalTools–Differentiation
roleofPythonin, PythonforFinance–Conclusions
roleoftechnologyi
Analytics n , TechnologyinFinance–TheRiseofReal­Time
roleofwebtechnologiesin,WebIntegration
financialanalytics
basicanalytics,BasicAnalytics
(seealsofinancialtimeseriesdata)
definitionof,TheRiseofReal­TimeAnalytics
impliedv o
Volatilitiesl a t i l i t i e s example,ImpliedVolatilities–Implied
MonteCarlosimulationexample,MonteCarlo
Simulation–GraphicalAnalysis
retrievingdata,FinancialData–FinancialData
sizeofdatasets, Input/OutputOperations
technicalanalysisexample,TechnicalAnalysis–TechnicalAnalysis
financialplots,FinancialPlots–FinancialPlots
financialtimeseriesdata
definitionof, FinancialTimeSeries
financialdata,FinancialData–FinancialData
highfrequencydata,High­FrequencyData
pandaslibrary,pandasBasics–GroupByOperations
regressionanalysis,RegressionAnalysis–RegressionAnalysis
first in,first out(FIFO)principle,WritingObjectstoDisk
fixedGaussianquadrature,NumericalIntegration
flashtrading,Ever­IncreasingSpeeds,Frequencies,DataVolumes
Flaskframework
benefitsof,RapidWebApplications
commentingfunctionality,Corefunctionality
connection/login,Corefunctionality
datamodeling,DataModeling
databaseinfrastructure,Importsanddatabasepreliminaries
importinglibraries,ThePythonCode
librariesrequired,RapidWebApplications
securityissues,Corefunctionality
styling web pagesin, Styling
templatingin,Templating
traders’chatroomapplication,Traders’ChatRoom
floats,Floats–Floats
fminfunction,ConvexOptimization,CalibrationProcedure
frequencydistribution,AUseCase
ftpliblibrary,WebBasics
ful truncation,Square­rootdiffusion,Square­RootDiffusion
functionalprogramming,Excursion:FunctionalProgramming
FundamentalTheoremofAssetPricing,Valuation,Fundamental
TheoremofAssetPricing,DerivativesPortfolios
FX(foreignexchange)data,Real­timeFXdata
G
generalmarketmodel,TheGeneralResults,DerivativesPortfolios
RandomNumbersonGPUs
GeneralPurposeGraphicalProcessingUnits(GPGPUs),Generationof
generate_payoffmethod,TheValuationClass
geometricBrownianmotion,SimulationofFinancialModels,
GeometricBrownianMotion
get_infomethod,TheClass
globaloptimization,GlobalOptimization,CalibrationProcedure
graphicalanalysis,GraphicalAnalysis
(seealsomatplotliblibrary)
graphicaluserinterfaces(GUIs)
cashflowserieswith,CashFlowSeriesClasswithGUI
librariesrequired,GraphicalUserInterfaces
MicrosoftExcelas,ExcelIntegration
shortrateclasswith,ShortRateClasswithGUI
updatingvalues,UpdatingofValues
Greeks,estimationof, GenericValuationClass
groupbyoperations,GroupByOperations
GruenbichlerandLongstaffmodel,TheFinancialModel
Guassianquadrature,NumericalIntegration
H
HDF5databaseformat,WorkingwithArrays
Hestonstochasticvolatilitymodel,Stochasticvolatility
highfrequencydata,High­FrequencyData
histograms,OtherPlotStyles
HTML­basedwebpages,httplib
httpliblibrary,httplib
hypertexttransferprotocol,httplib
I
immutability,Lists
impliedvolatilities
Black­Scholes­Mertonformula,ImpliedVolatilities
definitionof, ImpliedVolatilities
futuresdata,ImpliedVolatilities
Newtonschemefor,ImpliedVolatilities
optionquotes,ImpliedVolatilities
visualizingdata,ImpliedVolatilities
volatilitysmile,ImpliedVolatilities
importing,definitionof, ThePythonEcosystem
independentobservations,Regression
inlinedocumentation,Documentation
input­outputoperations
withpandas
dataasCSVfile, DataasCSVFile
dataasExcelfile, DataasExcelFile
fromSQLtopandas,FromSQLtopandas
SQLdatabases,SQLDatabase
withPyTables
out­of­memorycomputations,Out­of­MemoryComputations
workingwitharrays,WorkingwithArrays
workingwithcompressedtables,WorkingwithCompressed
Tables
workingwithtables,WorkingwithTables
withPython
reading/writingtextfiles, ReadingandWritingTextFiles
SQLdatabases,SQLDatabases
writingobjectstodisk,WritingObjectstoDisk
writing/readingNumpyarrays,WritingandReadingNumPy
Arrays
input/outputoperations
importanceof, Input/OutputOperations
integerindex,ImpliedVolatilities
integers,Integers
integratesublibrary,NumericalIntegration
integration
bysimulation,IntegrationbySimulation
numerical,NumericalIntegration
scipy.integratesublibrary,Integration
symboliccomputation,Integration
interactiveanalytics
benefitsofPythonfor,Shortertime­to­results–Ensuringhigh
performance
publishingplatformforsharing,Basicusage
riseofreal­time,TheRiseofReal­TimeAnalytics
toolsfor,Tools–Spyder
interactivewebplots,InteractivePlots
interpolation,Interpolation–Interpolation
interpreters
IPython,IPython–Systemshellcommands
standard,Python
IPython,IPython–Systemshellcommands
basicusage,Basicusage
benefitsof, ThePythonEcosystem
documentationwith,Basicusage
helpfunctionsin, Magiccommands
importinglibraries,Fromshelltobrowser
invoking,Fromshelltobrowser
IPython.parallellibrary,ParallelComputing–Performance
Comparison
magiccommands,Magiccommands
Markdowncommands,MarkdownandLaTeX
renderingcapabilities,MarkdownandLaTeX
systemshellcommands,Systemshellcommands
versionsof,Fromshelltobrowser
itermethod,BasicsofPythonClasses
J
Jinja2library,RapidWebApplications
jumpdiffusion,Jumpdiffusion,SimulationofFinancialModels,Jump
Diffusion
K
KernelPCAfunction,PrincipalComponentAnalysis
killerapplication,ThePythonEcosystem
kurtosistest,BenchmarkCase
L
largeintegers,Integers
LaTeX
commands,MarkdownandLaTeX
IPythonNotebookcel s and,Basicusage
least­squaresfunction,Individualbasisfunctions
Least­SquaresMonteCarlo(LSM)algorithm,AmericanOptions,
DerivativesValuation,Least­SquaresMonteCarlo
leverageeffect,FinancialData,Stochasticvolatility
libraries
availableinAnaconda,Anaconda
Cythonlibrary,BasicDataTypes
importing,ThePythonEcosystem,BasicVectorization,
Approximation
importingtoIPython,Fromshelltobrowser
standard,ThePythonEcosystem
listcomprehensions,Excursion:ControlStructures
lists, Lists,ArrayswithPythonLists
LLVMcompilerinfrastructure,DynamicCompiling
localmaximumaposterioripoint,IntroductoryExample
localoptimization,LocalOptimization,CalibrationProcedure
lognormalfunction,RandomVariables
Longstaff­Schwartzmodel,Least­SquaresMonteCarlo,AUseCase
losslevel,CreditValueAdjustments
M
magiccommands/functions,Magiccommands
Markdowncommands,MarkdownandLaTeX
marketenvironments,MarketEnvironments
(MarkovChain)MonteCarlo(MCMC)sampling,Introductory
Example
Markovproperty,StochasticProcesses
martingalemeasures,FundamentalTheoremofAssetPricing,Least­
SquaresMonteCarlo
mathematicalsyntax,FinanceandPythonSyntax
mathematicaltools
approximationoffunctions,Approximation–Differentiation
convexoptimization,ConvexOptimization–Constrained
Optimization
integration,Integration
symboliccomputation,SymbolicComputation
matplotliblibrary
3Dplotting,3DPlotting
benefitsof,TheScientificStack
financialplots,FinancialPlots–FinancialPlots
importingmatplotlib.pyplot,Approximation
NumPyarraysand,One­DimensionalDataSet
pandaslibrarywrapperfor,BasicAnalytics
strengthsof,WebPlotting
two­dimensionalplotting,Two­DimensionalPlotting–OtherPlot
Styles
maximizationofSharperatio,PortfolioOptimizations
meanreturns,NormalityTests
mean­squarederror(MSE),CalibrationProcedure
mean­variance,TheData
mean­varianceportfoliotheory(MPT),PortfolioOptimization
memorylayout,MemoryLayout
memory­lessprocesses,StochasticProcesses
MicrosoftExcel(seeExcel)
minimizationfunction,PortfolioOptimizations
missingdata,FirstStepswithDataFrameClass,BasicAnalytics
modelcalibration
optionmodeling,OptionModeling
procedurefor,CalibrationProcedure
relevantmarketdata,RelevantMarketData
modernportfoliotheory(MPT),Statistics,PortfolioOptimization
momentmatching,VarianceReduction,RandomNumberGeneration
MonteCarlosimulation
approachesto, MonteCarloSimulation
benefitsof, MonteCarloSimulation
BSMstochasticdifferentialequation,MonteCarloSimulation
drawbacksof,MonteCarloSimulation,Least­SquaresMonteCarlo
forEuropeancalloption,MonteCarloSimulation
fLogEulerScheme
ul vectorizationwithlogEulerscheme,FullVectorizationwith
graphicalanalysisof, GraphicalAnalysis
importanceof, Simulation
integrationbysimulation,IntegrationbySimulation
Least­SquaresMonteCarlo(LSM)algorithm,AmericanOptions,
DerivativesValuation
purePythonapproach,PurePython
valuationofcontingentclaims,Valuation–AmericanOptions
vectorizationwithNumPy,VectorizationwithNumPy
movingaverages,FinancialData
multipledimensions,Multipledimensions
multiprocessingmodule,multiprocessing
mutability,Lists
N
ndarrayclass,VectorizationwithNumPy
Newtonscheme,ImpliedVolatilities
noisydata,Noisydata
normalitytests, NormalityTests–Real­WorldData
benchmarkcase,BenchmarkCase
importanceof, NormalityTests
normalityassumption,BenchmarkCase
overviewof, Statistics,Conclusions
real­worlddata,Real­WorldData
Numbalibrary,DynamicCompiling–BinomialOptionPricing
NumbaProlibrary,GenerationofRandomNumbersonGPUs
numexprlibrary,PythonParadigmsandPerformance
NumPy
benefitsof,TheScientificStack
concatenatefunction,VarianceReduction
datastructures,NumPyDataStructures–StructuredArrays
date­timeinformationsupportin,NumPy
importing,Approximation
MonteCarlosimulationwith,VectorizationwithNumPy
numpy.linalgsublibrary,Individualbasisfunctions
numpy.randomsublibrary,RandomNumbers
universalfunctions,BasicAnalytics
writing/readingarrays,WritingandReadingNumPyArrays
NUTSalgorithm,IntroductoryExample
O
OANDAonlinebroker,Real­timeFXdata
objectorientation,ObjectOrientation–CashFlowSeriesClass
cashflowseriesclassexample,CashFlowSeriesClass
definitionof, ObjectOrientation
Pythonclasses,BasicsofPythonClasses
simpleshortrateclassexample,SimpleShortRateClass
observationpoints,Regression,Unsorteddata
OpenPyxllibrary,UsingOpenPyxl
operators,Documentation
optimaldecisionstep,TheValuationClass
optimalstoppingproblems,AmericanOptions,Least­SquaresMonte
Carlo
optimization
constrained,ConstrainedOptimization
convex,ConvexOptimization–ConstrainedOptimization
global,GlobalOptimization
local,LocalOptimization
optionpricingtheory,NormalityTests
ordinaryleast­squaresregression(OLS),RegressionAnalysis,Multiple
dimensions,Least­SquaresMonteCarlo
out­of­memorycomputations,Out­of­MemoryComputations
P
pandaslibrary,pandasBasics–GroupByOperations
basicanalytics,BasicAnalytics
benefitsof, TheScientificStack,TechnicalAnalysis
dataformatssupported,I/Owithpandas
datasourcessupported,FinancialData
DataFrameclass,FirstStepswithDataFrameClass–SecondSteps
withDataFrameClass
date­timeinformationsupportin,pandas–pandas
developmentof, FinancialTimeSeries
errortolerancein,BasicAnalytics
groupbyoperations,GroupByOperations
hierarchicallyindexeddatasetsand,ImpliedVolatilities
input­outputoperations
dataasCSVfile,DataasCSVFile
dataasExcelfile, DataasExcelFile
fromSQLtopandas,FromSQLtopandas
SQLdatabases,SQLDatabase
reading/writingspreadsheetswith,UsingpandasforReadingand
Writing
Seriesclass,SeriesClass
workingwithmissingdata,FirstStepswithDataFrameClass
wrapperformatplotliblibrary,BasicAnalytics
parallelcomputing,ParallelComputing–PerformanceComparison
MonteCarloalgorithm,TheMonteCarloAlgorithm
parallelcalculation,TheParallelCalculation
performancecomparison,PerformanceComparison
sequentialcalculation,TheSequentialCalculation
PEP(PythonEnhancementProposal)20,WhatIsPython?
PEP(PythonEnhancementProposal)8,PythonSyntax
performancecomputing
benefitsofPythonfor,Ensuringhighperformance
dynamiccompiling,DynamicCompiling–BinomialOptionPricing
memorylayoutand,MemoryLayoutandPerformance
multiprocessingmodule,multiprocessing
parallelcomputing,ParallelComputing–PerformanceComparison
Pythonparadigmsand,PythonParadigmsandPerformance
randomnumbergenerationonGPUs,GenerationofRandom
NumbersonGPUs
staticcompilingwithCython,StaticCompilingwithCython
petascaleprocessing,Input/OutputOperations
picklemodule,WritingObjectstoDisk
plotfunction,One­DimensionalDataSet
plotmethod,BasicAnalytics
plot_surfacefunction,3DPlotting
plt.axismethod,One­DimensionalDataSet
plt.candlestick,FinancialPlots
plt.histfunction,OtherPlotStyles
plt.legendfunction,Two­DimensionalDataSet
PNG(portablenetworkgraphics)format,StaticPlots
Poissondistribution,RandomNumbers
polyfitfunction,Monomials as basisfunctions
portfoliotheory/portfoliooptimization
basicideaof,TheData
basictheory,TheBasicTheory
capitalmarketline, CapitalMarketLine
datacollectionfor,TheData
efficientfrontier,EfficientFrontier
importanceof, PortfolioOptimization
overviewof, NormalityTests,Conclusions
portfoliocovariancematrix,TheBasicTheory
portfoliooptimizations,PortfolioOptimizations
portfoliovaluation
benefitsofanalyticslibraryfor,PortfolioValuation
derivativesportfolios,DerivativesPortfolios–AUseCase
derivativespositions,DerivativesPositions–AUseCase
requirementsforcomplexportfolios,PortfolioValuation
precisionfloats,Floats
presentation,IPythonNotebookfor,Basicusage
present_valuemethod,TheValuationClass
principalcomponentanalysis(PCA),PrincipalComponent
Analysis–ConstructingaPCAIndex
applying,ApplyingPCA
constructingPCAindices,ConstructingaPCAIndex
DAXindexstocks,TheDAXIndexandIts30Stocks
definitionof, PrincipalComponentAnalysis
overviewof,Statistics,Conclusions
print_statisticshelperfunction,RandomVariables
privateattributes,BasicsofPythonClasses
probabilityofdefault,CreditValueAdjustments
productivity,EfficiencyandProductivityThroughPython–Ensuring
highperformance
pseudocode,FinanceandPythonSyntax
pseudorandomnumbers,RandomNumbers,VarianceReduction
publishingplatform,Basicusage
putoptions,definitionof, Valuation
PyMC3library,PyMC3
pyplotsublibrary,One­DimensionalDataSet
PyTables
benefitsof, TheScientificStack,FastI/OwithPyTables
importing,FastI/OwithPyTables
input­outputoperations
out­of­memorycomputations,Out­of­MemoryComputations
workingwitharrays,WorkingwithArrays
workingwithcompressedtables,WorkingwithCompressed
Tables
workingwithtables,WorkingwithTables
Python
asecosystemvs.language,ThePythonEcosystem
benefitsforfinance,PythonforFinance–Conclusions,Input/Output
Operations,WebIntegration
benefitsof, WhatIsPython?
classesin,Basics
ConstantShortRateof PythonClasses–CashFlowSeriesClass,
deployment,PythonDeployment–PythonQuantPlatform
featuresof,WhatIsPython?
historyof,BriefHistoryofPython
input­outputoperationstext
reading/writing files, ReadingandWritingTextFiles
SQLdatabases,SQLDatabases
writingobjectstodisk,WritingObjectstoDisk
writing/readingNumpyarrays,WritingandReadingNumPy
Arrays
invokinginterpreter,Python
multipleenvironmentsfor,Anaconda
Quantplatform,PythonQuantPlatform,DerivativesAnalytics
Library
rapidwebapplicationdevelopment,RapidWeb
Applications–Styling
scientificstack,TheScientificStack,TechnicalAnalysis
userspectrum,PythonUserSpectrum
zero­basednumberingin, Tuples
PythonQuantsGmbH
benefitsof,PythonQuantPlatform
featuresof, PythonQuantPlatform,DerivativesAnalyticsLibrary
Q
quadratures,fixedGaussianandadaptive,NumericalIntegration
Quantplatform
benefitsof,PythonQuantPlatform
featuresof, PythonQuantPlatform,DerivativesAnalyticsLibrary
quantile­quantile(qq)plots,BenchmarkCase
queries,WorkingwithTables
R on
randomnumbergeneration,GenerationofRandomNumbers
GPUs,RandomNumbers–RandomNumbers,RandomNumber
Generation
functionsaccordingtodistributionlaws,RandomNumbers
functionsforsimple,RandomNumbers
randomvariables,RandomVariables
rapidwebapplicationdevelopment
benefitsofPythonfor,RapidWebApplications
commentingfunctionality,Corefunctionality
connection/login,Corefunctionality
datamodeling,DataModeling
databaseinfrastructure,Importsanddatabasepreliminaries
Flaskframeworkfor,RapidWebApplications
importinglibraries,ThePythonCode
popularframeworksfor,RapidWebApplications
securityissues,Corefunctionality
stylingwebpages,Styling
templating,Templating
traders’chatroom,Traders’ChatRoom
read_csvfunction,RegressionAnalysis
real­timeanalytics,TheRiseofReal­TimeAnalytics
real­timeeconomy,TheRiseofReal­TimeAnalytics
real­timeplots,Real­TimePlots
real­timestockpricequotes,Real­timestockpricequotes
regressionanalysis
mathematicaltoolsfor
individualbasisfunctions,Individualbasisfunctions
monomialsasbasisfunctions,Monomialsasbasisfunctions
multipledimensionsand,Multipledimensions
noisydataand,Noisydata
strengthsof,Regression
unsorteddataand,Unsorteddata
offinancialtimeseriesdata,RegressionAnalysis–Regression
Analysis
regularexpressions,Strings
reg_funcfunction,Multipledimensions
requestslibrary,Real­timeFXdata
resampling,High­FrequencyData
riskmanagement,DerivativesValuation
(seealsoderivativesvaluation;riskmeasures)
riskmeasures,RiskMeasures–CreditValueAdjustments
creditvalueadjustments,CreditValueAdjustments
value­at­risk(VaR),Value­at­Risk
risk­neutraldiscounting,Risk­NeutralDiscounting
risk­neutralvaluationapproach,TheGeneralResults
rollingfunctions,FinancialData
Rombergintegration,NumericalIntegration
S
samplingerror,RandomVariables
scatterplots,OtherPlotStyles
scientificstack,TheScientificStack,TechnicalAnalysis
scikit­learnlibrary,PrincipalComponentAnalysis
SciPy
benefitsof, TheScientificStack
scipy.integratesublibrary,Integration
scipy.optimizesublibrary,ConvexOptimization
scipy.optimize.minimizefunction,ConstrainedOptimization
scipy.statssublibrary,RandomVariables,BenchmarkCase
sensitivityanalysis,CashFlowSeriesClass
serialization,WritingObjects to Disk
Seriesclass,SeriesClass
sets, Sets
Sharperatio,PortfolioOptimizations
shortrates,SimpleShortRateClass,ShortRateClasswithGUI,
ConstantShortRate
simplerandomnumbergeneration,RandomNumbers
Simpson’srule,NumericalIntegration
simulation
discretizationerrorin, RandomVariables
genericsimulationclass,GenericSimulationClass
geometricBrownianmotion,RandomVariables–Random
Variables,GeometricBrownianMotion
jumpdiffusion,JumpDiffusion
noisydatafrom,Noisydata
numericalintegrationby,IntegrationbySimulation
randomnumbergeneration,RandomNumberGeneration
randomvariables,RandomVariables
samplingerrorin,RandomVariables
square­rootdiffusion,Square­RootDiffusion
stochasticprocesses,StochasticProcesses–VarianceReduction,
SimulationofFinancialModels
variancereduction,VarianceReduction
skewnesstest, BenchmarkCase
Software­as­a­Service(SaaS),WebIntegration
splevfunction,Interpolation
splineinterpolation,Interpolation
splrepfunction,Interpolation
spreadsheets
Excelcelltypes,Reading from Workbooks
generatingxlsworkbooks,GeneratingWorkbooks(.xls)
generatingxlsxworkbooks,GeneratingWorkbooks(.xslx)
OpenPyxllibraryfor,UsingOpenPyxl
Pythonlibrariesfor,BasicSpreadsheetInteraction
readingfromworkbooks,ReadingfromWorkbooks
reading/writingwithpandas,UsingpandasforReadingand
Writing
Spyder
benefitsof, Spyder
featuresof, Spyder
SQLdatabases
input­outputoperationswithpandas,SQLDatabase
input­outputoperationswithPython,SQLDatabases
square­rootdiffusion,Square­rootdiffusion,SimulationofFinancial
Models,Square­RootDiffusion,OptionModeling
standardcolorabbreviations,One­DimensionalDataSet
standardinterpreter,Python
standardnormallydistributedrandomnumbers,RandomNumber
Generation
standardstylecharacters,One­DimensionalDataSet
starimport,ThePythonEcosystem,BasicVectorization
staticplots,StaticPlots
staticallytypedlanguages,BasicDataTypes
statistics, Statistics–RealData
Bayesianregression,Statistics,BayesianRegression–RealData
focusareascovered,Statistics
normalitytests, Statistics–Real­WorldData
portfoliotheory,Statistics,PortfolioOptimization–CapitalMarket
Line
principalcomponentanalysis,Statistics,PrincipalComponent
Analysis–ConstructingaPCAIndex
statmodelslibrary,Multipledimensions
stochasticdifferentialequation(SDE),GeometricBrownianmotion
stochasticprocesses,StochasticProcesses–VarianceReduction
definitionof, StochasticProcesses
geometricBrownianmotion,GeometricBrownianmotion,
SimulationofFinancialModels,GeometricBrownianMotion
importanceof, Stochastics
jumpdiffusion,Jumpdiffusion,SimulationofFinancialModels,
JumpDiffusion
square­rootdiffusion,Square­rootdiffusion,Simulationof
FinancialModels,Square­RootDiffusion
stochasticvolatilitymodel,Stochasticvolatility
strings
documentationstrings,Documentation
Pythonstringclass,Strings–Strings
selectedstringmethods,Strings
stringobjects,Strings
structuredarrays,StructuredArrays
Symbolclass,Basics
symboliccomputation
basicsof,Basics
differentiation,Differentiation
equations,Equations
integration,Integration
SymPylibrary
benefitsforsymboliccomputations,Differentiation
differentiationwith,Differentiation
equationsolvingwith,Equations
integrationwith,Integration
mathematicalfunctiondefinitions,Basics
Symbolclass,Basics
syntax
benefitsofPythonforfinance,FinanceandPythonSyntax–Finance
andPythonSyntax
bestpractices,WhatIsPython?,PythonSyntax
mathematical,FinanceandPythonSyntax
Python2.7vs.3.x,Anaconda
T
tables
compressed,WorkingwithCompressedTables
workingwith,WorkingwithTables
tail risk,Value­at­Risk
technicalanalysis
backtestingexample,TechnicalAnalysis
definitionof, TechnicalAnalysis
retrievingtimeseriesdata,TechnicalAnalysis
testinginvestmentstrategy,TechnicalAnalysis
tradingsignalrules,TechnicalAnalysis
trendstrategy,TechnicalAnalysis
technology,roleinfinance,TechnologyinFinance–TheRiseofReal­
TimeAnalytics
templating,Templating
testing,unittesting,UnitTesting
text
reading/writingtextfiles, ReadingandWritingTextFiles
representationwithstrings,Strings
three­dimensionalplotting,3DPlotting
tools,Tools–Spyder
IPython,IPython–Systemshellcommands
Pythoninterpreter,Python
Spyder,Spyder–Spyder
(seealsomathematicaltools)
traders’chatroomapplication
basicideaof,Traders’ChatRoom
commentingfunctionality,Corefunctionality
connection/login,Corefunctionality
datamodeling,DataModeling
databaseinfrastructure,Importsanddatabasepreliminaries
importinglibraries,ThePythonCode
securityissues,Corefunctionality
styling,Styling
templating,Templating
traitslibrary,GraphicalUserInterfaces
traitsui.apilibrary,UpdatingofValues
trapezoidalrule,NumericalIntegration
tuples,Tuples
two­dimensionalplotting
importinglibraries,Two­DimensionalPlotting
one­dimensionaldatas
DimensionalDataSet e t , One­DimensionalDataSet–One­
otherplotstyles,OtherPlotStyles–OtherPlotStyles
two­dimensionaldatas
DimensionalDataSet e t , Two­DimensionalDataSet–Two­
U
unittestingbestpractices,UnitTesting
universalfunctions,BasicVectorization,BasicAnalytics
unsorteddata,Unsorteddata
updatingofbeliefs,Statistics
urlliblibrary,urllib
URLs(uniformresourcelocators),urllib
user­definedfunctions(UDF),User­definedfunctions
V
valuationframework
Fundamental
AssetPricing TheoremofAssetPricing,FundamentalTheoremof
overviewof, ValuationFramework
risk­neutraldiscounting,Risk­NeutralDiscounting
valuationofcontingentclaims,Valuation–AmericanOptions
Americanoptions,AmericanOptions
Europeanoptions,EuropeanOptions
valuationtheory,Least­SquaresMonteCarlo
value­at­risk(VaR),Value­at­Risk
values,updatinginGUI,UpdatingofValues
varianceofreturns,NormalityTests
variancereduction,VarianceReduction
vectorization
basic,BasicVectorization
fScheme
ul withlogEulerscheme,FullVectorizationwithLogEuler
fundamentalideaof, VectorizationofCode
memorylayout,MemoryLayout
speedincreaseachievedby,VectorizationwithNumPy
withDataFrames,FinancialData
withNumPy,VectorizationwithNumPy
Vega
definitionof, GenericValuationClass
ofaEuropeanoptioninBSMmodel,ImpliedVolatilities
visualization(seedatavisualization)
VIXvolatilityindex,VolatilityOptions
volatilityclustering,FinancialData
volatilityindex,TheFinancialModel
volatilityoptions
AmericanontheVSTOXX,AmericanOptionsonthe
VSTOXX–TheOptionsPortfolio
mainindex,VolatilityOptions
modelcalibration,ModelCalibration–CalibrationProcedure
tasksundertaken,VolatilityOptions
VSTOXXdata,TheVSTOXXData–VSTOXXOptionsData
volatilitysmile,ImpliedVolatilities
volatility, stochasticmodel,Stochasticvolatility
VSTOXXdata
futuresdata,VSTOXXFuturesData
indexdata,VSTOXXIndexData
librariesrequired,TheVSTOXXData
optionsdata,VSTOXXOptionsData
W
webbrowserdeployment,PythonQuantPlatform
webtechnologies
communicationprotocols,WebBasics–urllib
rapidwebapplications,RapidWebApplications–Styling
roleinfinance,WebIntegration
webplotting,WebPlotting–Real­timestockpricequotes
webservices, Web Services–TheImplementation
Werkzeuglibrary,Rapid Web Applications
workbooks
generatingxlsworkbooks,GeneratingWorkbooks(.xls)
generatingxlsxworkbooks,GeneratingWorkbooks(.xslx)
OpenPyxllibraryfor,UsingOpenPyxl
pandasgenerated,UsingpandasforReadingandWriting
readingfrom,ReadingfromWorkbooks
X
xlrdlibrary,BasicSpreadsheetInteraction
xlsxwriterlibrary,BasicSpreadsheetInteraction
xlwingslibrary,xlwings
xlwtlibrary,BasicSpreadsheetInteraction
Y
Yahoo!Finance,FinancialPlots,FinancialData
Z
ZenofPython,WhatIsPython?
zero­basednumberingschemes,Tuples
AbouttheAuthor
anYvesHilpischisthefounder and managingpartnerofThePythonQuants,
analyticssoftwareproviderandfinancialengineeringgroup.ThePython
platform.com)andDXAnalytics(http://dx­analytics.com).Yvesalso
Quantsoffer,amongothers,thePythonQuantPlatform(http://quant­
lecturesonmathematicalfinance and organizesmeetupsandconferences
aboutPythonforQuantitativeFinanceinNewYorkandLondon.
Colophon
TheanimalonthecoverofPythonforFinanceisaHispaniolansolenodon.
mammal thatlivesontheCaribbeanislandofHispaniola,whichcomprises
TheHispaniolansolenodon(Solenodonparadoxus)isanendangered
HaitiandtheDominicanRepublic. I
morecommonintheDominicanRepublic. t ’ s particularlyrareinHaitiandab i t
Solenodonsareknowntoeatarthropods,worms,s nails, andrtesptundersides,
furry,withreddish­browncoloringontopandlighterfuroni
take.Thisancientmammallookssomewhatlikeabigshrew.It’squite
alsoconsumeroots,fruit, andleavesonoccasion.Asolenodonweighsa
poundortwoandhasafoot­longheadandbodyplusaten­inchtail,giveor iles. They
whilei t s t a i l ,
Icomeout,itsmovementstendl e g s , andprominentsnout
trunning.However,beinganightcreature,i lack h a i r .
hasarathersedentarylifestytobe le andoftenstaysoutofs ight. Whenitdoes
andit sometimestripswhen
awkward,t hasdevelopedanacutesense
ofhearing,smell,andtouch.Its owndistinctivescentissaid tobe
“goatlike.”
Iparalyzeandattacki
t getstoxicsalivafromagrooveinthesecondlowerincisorandusesi
ts invertebrateprey.Assuch,itisoneoffew t to
venomousmammals.Sometimesthevenomisreleasedwhenfighting
amongeachother,andcanbefataltothesolenodonitself.Often,a
itial conflict, theyestablishadominancerelationship and getalonginthe
isamelivingquarters.Familiestendtolivetogetherforalongtime.
nApparently,itonlydrinkswhilebathing. fter
ManyoftheanimalsonO’Reillycoversareendangered;a
importanttotheworld.Tolearnmoreabouthowyoucanhelp,goto l ofthemare
animals.oreilly.com.
ThecoverimageisfromWood’sIllustratedNaturalHistory.Thecover
MinionPro;theheadingfontisAdobe
fontsareURWTypewriterandGuardianSans.Thetextfonti s Adobe
fontis DaltonMaag’sUbuntuMono. MyriadCondensed;andthecode
PythonforFinance
YvesHilpisch
Editor
BrianMacDonald
Editor
MeghanBlanchette
RevisionHistory
2014­12­09 First release
Copyright©2014YvesHilpisch
editions are alsoavailableformost titles (http://safaribooksonline.com).Formoreinformation,
O’Reillybooksmaybepurchasedforeducational,business,orsalespromotionaluse.Online
contactourcorporate/institutionalsalesdepartment:800­998­[email protected].
related tradedressaretrademarksofO’Reilly
TheO’Reillylogoi
coverimageofaHispaniolansolenodon,and
s aregisteredtrademarkofO’ReillyMedia,Inc.PythonforFinance,the
Media,Inc.
Manyofthedesignationsusedbymanufacturersands el ers todistinguishtheirproductsare
Inc.wasawareofatrademarkclaim,thedesignationshavebeenprintedincapsorinitialcaps.
claimedastrademarks.Wherethosedesignationsappearinthisbook,andO’ReillyMedia,
Whilethepublisherandtheauthorhaveusedgoodfaitheffortstoensurethattheinformation
andinstructionscontainedinthiswork are
responsibilityforerrors i s
or isk. Iaccurate,thepublisherandtheauthordisclaimall
resultingfromtheuseoforrelianceonthiswork.Useoftheinformationandinstructions
containedinthis work atyourownr omissions,includingwithoutlimitationresponsibilityfordamages
f anycodesamplesorothertechnologythiswork
containsordescribesi
. s subjecttoopensourcelicensesortheintel ectual propertyrightsof
others,iitgihstsyourresponsibilitytoensurethatyourusethereofcomplieswithsuchlicenses
and/orr
Thisbook
is notintendedasfinancialadvice.Pleaseconsultaqualifiedprofessionalif you
requirefinancialadvice.
O’ReillyMedia
1005GravensteinHighwayNorth
Sebastopol,CA95472
2014­12­10T07:08:11­08:00

You might also like