0% found this document useful (0 votes)
98 views

Advance Python Programming

advance python programming

Uploaded by

yellow_bird
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views

Advance Python Programming

advance python programming

Uploaded by

yellow_bird
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Advanced

 Python  for  
Scien2fic  Compu2ng  
Michael  Milligan  
milligan@msi.umn.edu  

Follow  along!  
h=ps://www.msi.umn.edu/content/programming  
Files  in:  /home/support/public/tutorials/PythonSciComp/  
To  get  the  most  out  of  this…  
•  Basic  knowledge  of  Python  
•  Working  Python  install  (feel  free  to  use  ours!)  
–  Enthought  Python  DistribuCon  /  Canopy  provides  
scienCfic  and  math  libraries  pre-­‐installed  

•  MSI  login  +  SSH  or  NX  


•  Follow  along!  
Why  Python    
for  ScienCfic  CompuCng?  
•  Rapid  development  
–  Easy,  readable  syntax  
–  VersaCle  tools  for  experimentaCon/learning  
–  Comprehensive  libraries  
•  Powerful  Features  
–  Process  data  at  near  “naCve  code”  speeds  
–  Excellent  visualizaCon  packages  
–  Comprehensive  libraries  
When  you  leave  today,  you  should  be  able  to…  

•  Program  interacCvely  with  ipython  


•  understand  the  basics  of  numpy  and  scipy  
•  Efficiently  compute  with  large  arrays  of  data  
•  Load  and  save  data  to/from  files  on  disk  
•  Use  matplotlib  to  plot  data  
•  Take  advantage  of  supercompuCng  resources  with  
parallel  compu2ng  
•  Know  where  to  turn  for  more  help  with  these  topics  
Details  
•  We  are  describing  Enthought  Python  DistribuCon.  
–  EssenCally:  Pre-­‐assembled  compilaCon  of  Python  2.7  +  
numpy,  scipy,  other  useful  libraries  
–  Free  for  academic  use,  a  basic  version  is  free  for  non-­‐
commercial  use  
–  Your  computers,  departments,  etc  may  have  a  different  
version  of  Python  installed.  Everything  we  will  see  
today  is  open  source.  

•  In  MSI:    module  load  python-­‐epd  


Workshop  ConvenCons  
•  UNIX  shell  commands  are  
indicated  with  the  percent  
sign.      
•  IPython  interpreter  
commands  have  In/Out  
labels  
•  Neither  sign  indicates  
python  code  that  should  
be  entered  into  a  text  file.  
IPython:  InteracCve  Python  
•  Powerful  environment  
for  interacCve  work  
•  Run  as  “ipython”  from  
any  terminal  
•  -­‐-­‐pylab  opCon  auto-­‐
loads  numpy,  sets  up  
graphics  for  ploang  
•  Inspect  any  object  with  
“?”  or  help()  
IPython:  InteracCve  Python  
•  Build  up  a  workspace  of  
objects  and  funcCons  
•  Full  history  access  
through  Out[],  %recall,  
up/down  arrow  keys  
•  %load,  %edit,  or  %run  
external  files  
•  Lots  more,  type  %magic  
NumPy  and  SciPy  
•  NumPy  provides:  
•  the  basic  array  and  matrix  data  types  
•  Efficient  implementaCons  of  low-­‐level  math  operaCons  
•  A  large  library  of  high-­‐level  math  funcCons  built  from  
efficient  primiCves  
•  SciPy  provides:  
•  A  home  for  a  wide  variety  of  open-­‐source  mathemaCcal  
and  scienCfic  algorithms  
•  Modules  for  opCmizaCon,  signal  processing,  linear  
algebra,  staCsCcs,  interpolaCon,  and  more  
NumPy  arrays  
•  Array  data  type    with  
vectorized  operaCons(similar  
to  Matlab  or  IDL)  
•  Supports  same  operaCons  as  
Python  list  type  
•  …except  every  element  is  of  
same  data  type  
•  …so  they  can  be  stored  in  
memory  packed  like  C  arrays  
NumPy  arrays  are  fast  

Here we are comparing a “pure Python” loop to the


equivalent in numpy
MulCdimensional  arrays  
•  NumPy  arrays  are  
rectangles  in  
arbitrarily  many  
dimensions  
•  +  -­‐  *  /  operate  
element-­‐by-­‐
element  for  
same-­‐shape  
arrays  
Array  slicing  
•  Index  notaCon  gives  
access  to  any  “slice”  of  
an  array  
•  Array  slices  can  be  
assigned  –  this  changes  
the  original  array  
•  X  =  M[1,:,:].copy()  
would  avoid  changing  M  
Other  common  methods  
•  Numpy  arrays  
have  many  
useful  built-­‐in  
methods  
Other  common  methods  
•  …and  the  
numpy  module  
provides  more  
CondiCons  and  tests  
•  Vectorized  
logical  
operators  +  
indexing  
funcCons  
•  Output  of  index  
funcCons  can  
be  used  to  slice  
arrays  
CondiCons  and  tests  
•  Vectorized  
logical  
operators  +  
indexing  
funcCons  
•  Output  of  index  
funcCons  can  
be  used  to  slice  
arrays  
More  useful  numpy  modules  
•  numpy.fft  –  FFTs,  forward/inverse,  1-­‐D  and  N-­‐D  
•  numpy.random  –  generate  random  numbers,  many  
distribuCons  to  choose  from  
•  numpy.matrix  –  special  arrays  that  obey  matrix  math  
•  numpy.polynomial  –  module  for  represenCng  and  
manipulaCng  arbitrary  polynomials  
Ploang  made  easy  
•  Matplotlib  provides  high-­‐quality  2-­‐D  (and  
some  3-­‐D)  ploang  
•  Display  in  window  or  output  to  PDF,  SVG,  PNG,  etc  
•  Implemented  as  modular  object-­‐oriented  system  
•  Pylab  provides  a  Matlab-­‐ish  interacCve  
interface  to  Matplotlib  
•  Access  with  ipython  -­‐-­‐pylab  
•  Defaults  to  popping  up  plots  in  a  separate  window  
Some  basic  examples…  
Some  advanced  examples…  
•  These  examples  are  from  the  matplotlib.org  
examples  secCon…  
Some  advanced  examples…  
Some  advanced  examples…  
Some  advanced  examples…  
Some  advanced  examples…  
SciPy  expands  the  menu  
•  Clustering  algorithms  (scipy.cluster)  
•  IntegraCon  and  ODEs  (scipy.integrate)  
•  InterpolaCon  (scipy.interpolate)  
•  Input  and  output  (scipy.io)  
•  Linear  algebra  (scipy.linalg)  
•  MulC-­‐dimensional  image  processing  (scipy.ndimage)  
•  OpCmizaCon  and  root  finding  (scipy.opCmize)  
•  Signal  processing  (scipy.signal)  
•  Sparse  matrices  (scipy.sparse)  
•  SpaCal  algorithms  and  data  structures  (scipy.spaCal)  
•  Special  funcCons  (scipy.special)  
•  StaCsCcal  funcCons  (scipy.stats)  
•  And  then  some…  
SciPy  is  also  fast  
•  Most  SciPy  rouCnes  use  fast  NumPy  low-­‐level  
math  operaCons  
•  Some  SciPy  rouCnes  use  highly  opCmized  
external  libraries  
–  E.g.  scipy.linalg  links  to  BLAS,  LAPACK  or  MKL  
behind  the  scenes  
Data  on  disk  
•  Chances  are  you  want  to  load  and  save  data  
•  numpy  and  scipy.io  offer  a  variety  of  faciliCes  
Data  on  disk:  text  files  
•  Very  common  for  smaller  data  sets:  
simple  columns  of  numbers  
•  numpy.loadtxt()  –  simple  interface,  good  defaults  
•  numpy.genfromtxt()  –  more  complex,  handles  
unusual  formaang,  comments,  missing  values,  etc  
Data  on  disk:  text  files  
•  Numpy.savetxt()  –  write  to  columns  of  numbers  
Data  on  disk:  binary  formats  
•  Binary  data  is  much  more  scalable  
•  Smaller  files  on  disk  
•  Faster  to  load  and  save  
•  May  be  necessary  to  exchange  data  with  other  sopware  
•  SCck  to  portable  (machine-­‐independent)  formats  
Data  on  disk:  binary  formats  
•  NumPy  na2ve  format  (.npy)  
•  numpy.load()  and  numpy.save()  
•  Or  use  numpy.savez()  to  store  many  arrays  in  compressed  .npz  
•  Fast,  portable,  but  mostly  only  supported  by  Python  

•   scipy.io.matlab  –  support  for  Matlab  (.mat)  


•   scipy.io.loadmat()  and  scipy.io.savemat()  
•   scipy.io.idl  –  read  (no  save)  IDL  .sav  files  
•   scipy.io.readsav()  
Data  on  disk:  binary  formats  
•  Many  standard  formats  supported  
•  scipy.io.netcdf  –  NetCDF3  interface  
•  h5py  exposes  HDF5  API  
•  PyTables  is  an  excellent  high-­‐level  interface  to  HDF5  
•  pyfits  for  FITS  datasets  
•  Etc…  
Scaling  up  with  parallelizaCon  
•  For  big  jobs  you  will  eventually  want  to  
parallelize  your  code  
•  The  Python  interpreter  has  trouble  with  
mulCthreading  –  mulC-­‐process  is  usually  best  
•  Approach  depends  on  the  problem  you  need  
to  solve  
Parallel  processes  
•  Many  jobs  need  to  process  lots  of  data,  don’t  
need  to  communicate  amongst  themselves  
•  SomeCmes  called  “embarrassingly  parallel”  
•  GNU  Parallel  -­‐-­‐  a  simple  way  to  launch  jobs  
•  Launch  one  job  for  every  file  in  a  dir,  line  in  a  file,  etc  
•  Can  work  with  PBS  on  itasca  to  use  many  nodes  
GNU  Parallel  example  
GNU  Parallel  example  
•  -­‐j  should  match  ppn  (unless  you  know  what  
you’re  doing)  –  this  is  processes  per  node  
•  Will  run  one  job  per  line  of  input  on  stdin  or  in  
argfile  –  max  of  nodes  *  ppn  running  at  once  
•  See  “man  parallel”  for  more  features  
MPI  for  Python  
•  MPI  “Message  Passing  Interface”  enables  parallel  
processes  to  communicate  efficiently  
•  Commonly  one  process  will  be  “controller”  and  
manage  worker  processes  
•  Inherent  support  for  scaser-­‐gather  operaCons  
•  MPI  is  well-­‐supported  on  our  clusters  
•  mpi4py  interfaces  to  MPI  from  inside  Python  
•  Caveat  for  MPI  gurus:  numpy  does  not  have  
distributed  arrays  yet,  complicates  some  algorithms  
Example  with  mpi4py  
•  Simple  “Hello  world”  script  
Example  with  mpi4py  
•  Simple  “Hello  world”  script  
More  with  mpi4py  
•  Possible  to  pass  numpy  arrays  like  buffers  
More  with  mpi4py  
•  Also  works  with  (pickle-­‐able)  Python  objects  
•  Much  slower  than  C-­‐based  arrays,  but  very  convenient  
Too  much  to  cover…  
•  Ipython  notebook  –  connect  to  ipython  with  a  
browser  for  a  MathemaCca-­‐like  notebook  
interface  
•  PyCUDA  and  PyOpenCL  –  GPU  compuCng  
•  SymPy  –  MathemaCca-­‐style  symbolic  math  
•  Databases  are  easy  to  connect  to  Python;  or  
use  advanced  big  data  toolkit  like  Pandas  or  
PyTables  
IPython  notebook  example  
•  Example:  IPython  
notebook  with  pylab  and  
sympy  
•  notebook  creates  
graphical  log  in  a  browser  
•  sympy:  symbolic  CAS  
•  To  try  this:  
Community  and  DocumentaCon  
•  AcCvely  developed  and  supported  
•  Excellent  documentaCon  
•  www.python.org/doc  
•  Scipy.org  
•  wiki.scipy.org  
•  Ipython.org  
•  matplotlib.org  
•  Mpi4py.scipy.org  
Next  Step  
•  Hands-­‐on  
•  You  can  also  run  the  examples  on  your  
laptop’s  Python  distribuCon  
•  Enthought  is  installed  in  all  labs  and  on  
supercomputers  at  MSI  
•  Full  academic  version  of  Enthought  Canopy  
installed  (not  default  yet)  
•  QuesCons!  

You might also like