Python For Actuaries
Python For Actuaries
A New Paradigm!
Moderator:
David L. Snell, ASA, MAAA
Presenters:
Brian D. Holland, FSA, MAAA
Dihui Lai, Ph.D.
Sheamus Kee Parkes, FSA, MAAA
Python for Actuaries
2
Why learn Python?
• We hear a ton about machine learning, data science, big data.
• To actually do these things personally, you have to have the technical skills –
programming / hacking skills included.
• Python has a lot of traction in data science applications and is now quite
popular. You don’t have to look long before seeing it.
• Some data science companies are Python shops.
3
Purpose today: shake hands with Python
See what you might want to dig into
What is Python?
• an object-oriented language
• with extensive scientific, numeric libraries
• with many special-purpose libraries
• with an expanding user base
• that is designed for readability
• Forced tabbing; many places to comment work in accessible ways
• around since 1991
• in two active versions: 2 and 3
For new work: not much case for sticking with 2 now, big libraries are ported to 3.
• named after Monte Python, not the snake
4
Applications for actuaries
5
Ways to use Python
System command: for scripts
6
Ways to use Python: IPython notebooks
Convert the notebooks easily to slides, HTML, plain Python files; on to MS Word
7
Ways to use Python: IPython notebooks
8
What is “knowing Python” ?
9
Graphics libraries:
Death by choice
• Bokeh for interactive plots in browser
• Seaborn
• GGPLOT port for R fans and experts;
• VisPy – bleeding edge, GPU, interactive,
2d, 3d, wow
• Matplotlib – the main one
10
Data I/O with Pandas
The Pandas library can import many
document types directly into a
DataFrame object (similar to R’s)
• Fixed-width text
• Delimited text
• Spreadsheets
• HTML, JSON
• SQL queries, using an open
connection to the DB
11
Machine learning: scikit-learn – the “killer app”?
Many examples at http://scikit-learn.org/stable/auto_examples/index.html.
A very small sample from the page:
12
Cooperation with other software:
RPy2 in a Notebook
“R Magic”: (are many “magic” functions in IPython or Jupyter notebooks)
• Allow commands to other tools directly in the notebook
13
More on RPy2: accessing R objects
14
PypeR: another way to talk to R
PypeR uses pipes to communicate with R.
15
Good luck, have fun!
16
R for Actuarial Science
R Demo
Database
Visualization tools
Use R for Actuarial Science
Example: Term Tail Lapse Study
load("LapseData.Rdata")
head(LapseData)
## STUDY_YEAR ISSUE_AGE POLICY_YEAR EXPOSURE LAPSE_CNT FA_BAND
## 9 2009-2010 33-37 10 1 1 B. 100k-249k
## 71 2009-2010 63-67 10 1 0 B. 100k-249k
## 121 2008-2009 28-32 10 2 2 C. 250k-999k
## 210 2008-2009 53-57 10 2 1 B. 100k-249k
## 223 2009-2010 38-42 10 1 1 C. 250k-999k
## 237 2008-2009 23-27 10 1 0 B. 100k-249k
summary(LapseData)
## STUDY_YEAR ISSUE_AGE POLICY_YEAR EXPOSURE
## 2010-2011:98630 33-37 :92930 Min. :10.00 Min. : 0.002732
## 2011-2012:88353 38-42 :91723 1st Qu.:10.00 1st Qu.: 1.000000
## 2009-2010:83321 43-47 :76142 Median :10.00 Median : 1.000000
## 2008-2009:77505 28-32 :69777 Mean :10.87 Mean : 1.226270
## 2007-2008:59968 48-52 :57920 3rd Qu.:11.00 3rd Qu.: 1.000000
## 2006-2007:41000 53-57 :41278 Max. :19.00 Max. :26.000000
## (Other) :64476 (Other):83483
## LAPSE_CNT FA_BAND
## Min. : 0.000 A. < 100k : 39121
## 1st Qu.: 0.000 B. 100k-249k :230897
## Median : 1.000 C. 250k-999k :208131
## Mean : 0.615 D. 1M - 1.99M: 26042
## 3rd Qu.: 1.000 E. 2M+ : 7232
## Max. :24.000 D. 1M-1.99M : 1830
Use R for Actuarial Science
Example: Term Tail Lapse Study
Use R for Actuarial Science
Example: Term Tail Lapse Study
Model1 <- glm(LAPSE_CNT~offset(log(EXPOSURE))+FA_BAND, family=poisson(),data=
LapseData)
summary(Model1)
##
## Call:
## glm(formula = LAPSE_CNT ~ offset(log(EXPOSURE)) + FA_BAND, family = poisso
n(),
## data = LapseData)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -4.6517 -0.9669 -0.2003 0.6752 2.8462
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.987363 0.007434 -132.81 <2e-16 ***
## FA_BANDB. 100k-249k 0.226844 0.007926 28.62 <2e-16 ***
## FA_BANDC. 250k-999k 0.372967 0.007905 47.18 <2e-16 ***
## FA_BANDD. 1M - 1.99M 0.488017 0.010462 46.65 <2e-16 ***
## FA_BANDE. 2M+ 0.615627 0.015559 39.57 <2e-16 ***
## FA_BANDD. 1M-1.99M 0.857298 0.020445 41.93 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 413195 on 513252 degrees of freedom
## Residual deviance: 408135 on 513247 degrees of freedom
## AIC: 951877
Use R for Actuarial Science
Example: Hierarchical Clustering
Use R for Actuarial Science
Examples: Other Potentials
SVM
Have Fun
R Demo
Function Elapsed Time (s) Memory (Mb) Approach Elapsed Time (s) Memory (Mb)
Integrate R with
Memory clusters:
allocation: ff, RHadoop,
bigmemory SparkR
Database
Visualization tools
Questions ?
R vs Python
SOA Health Meeting – June 2015
Presented by
Shea Parkes, FSA, MAAA
Limitations
The views expressed in this presentation are those of the
presenter, and not those of Milliman. Nothing in this
presentation is intended to represent a professional opinion
or be an interpretation of actuarial standards of practice.
2
Data Science – A Useful Perspective
http://drewconway
.com/zia/2013/3/26
/the-data-science-
venn-diagram
http://drewconway
.com/zia/2013/3/26
/the-data-science-
venn-diagram
http://drewconway
.com/zia/2013/3/26
/the-data-science-
venn-diagram