Menu

[r8989]: / trunk / py4science / workbook / why_python.tex  Maximize  Restore  History

Download this file

908 lines (705 with data), 34.1 kB

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
\chapter{Python for Scientific Computing}
\begin{flushright}
With material contributed by Perry Greenfield, Robert Jedrzejewski,
Vicki Laidler and John Hunter
\par\end{flushright}
\section{Who is using Python?}
The use of Python in scientific computing is as wide as the field
itself. A sampling of current work is provided here to indicate the
breadth of disciplines represented and the scale of the problems addressed.
The NASA Jet Propulsion Laboratory (JPL) uses Python as an interface
language to \textsc{FORTRAN} and C++ libraries which form a suite
of tools for plotting and visualization of spacecraft trajectory parameters
in mission design and navigation. The Space Telescope Science Institute
(STScI) uses Python in many phases of their pipeline: scheduling Hubble
data acquisitions, managing volumes of data, and analyzing astronomical
images \cite{BarrettEtal2004}. The National Oceanic Atmospheric Administration
(NOAA) uses Python for a wide variety of scientific computing tasks
including simple scripts to parse and translate data files, prototyping
of computational algorithms, writing user interfaces, web front ends,
and the development of models \cite{NOAA2000,BarkerHealy2001,ParkerHallBarker2001}.
At the Fundamental Symmetries Lab at Princeton University, Python
is used to efficiently analyze large data sets from an experiment
that searches for CPT and Lorentz Violation using an atomic magnetometer
\cite{Kornack2002,Kominis2003}. The Pediatric Clinical Electrophysiology
unit at The University of Chicago, which collects approximately 100\,GB
of data per week, uses Python to explore novel approaches to the localization
and detection of epileptic seizures \cite{HunterEtal2005}. The Enthought
Corporation is using Python to build customized applications for oil
exploration for the petroleum industry. At the world's largest radio
telescopes, e.g., Arecibo and the Green Bank Telescope, Python is
used for data processing, modelling, and scripting high-performance
computing jobs in order to search for and monitor binary and millisecond
pulsars in terabyte datasets \cite{Ransometal2004a,Ransom2005}. At
the Computational Genomics Laboratory at the Australian National University,
researchers are using Python to build a toolkit which enables the
specification of novel statistical models of sequence evolution on
parallel hardware \cite{Huttley2004,Butterfield2004}. Michel Sanner's
group at the Scripps Research Institute uses Python extensively to
build a suite of applications for molecular visualization and exploration
of drug/molecule interactions using virtual reality and 3D printing
technology \cite{Sanner2005a,Sanner2005b}. Engineers at Google use
Python in automation, control and tuning of their computational grid,
and use \texttt{SWIG} generated Python of their in-house C++ libraries
in virtually all facets of their work \cite{Beazley1998,Stein2005}.
Many other use cases -- ranging from animation at Industrial Light
and Magic, to space shuttle mission control, to grid monitoring and
control at Rackspace, to drug discovery, meteorology and air traffic
control -- are detailed in O'Reilly's two volumes of \emph{Python
Success Stories} \cite{PySuccess2002,PySuccess2005}.
\section{Advantages of Python}
\begin{quotation}
\textit{The canonical, \char`\"{}Python is a great first language\char`\"{},
elicited, \char`\"{}Python is a great last language!\char`\"{}} --
Noah Spurrier
\end{quotation}
This quotation summarizes an important reason scientists migrate to
Python as a programming language. As a {}``great first language''
Python has a simple, expressive syntax that is accessible to the newcomer.
{}``Python as executable pseudocode'' reflects the fact that Python
syntax mirrors the obvious and intuitive pseudo-code syntax used in
many journals \cite{Strous2001}. As a great first language, it does
not impose a single programming paradigm on scientists, as Java does
with object oriented programming, but rather allows one to code at
many levels of sophistication, including BASIC/FORTRAN/Matlab style
procedural programming familiar to many scientists. Here is the canonical
first program {}``hello world'' in Python:
\noindent {\small \begin{verbatim}
# Python
print 'hello world'
\end{verbatim}} Contrast the simplicity of that program with the complexity {}``hello
world'' in Java {\small \begin{verbatim}
// java
class myfirstjavaprog
{
public static void main(String args[])
{
System.out.println("Hello World!");
}
}
\end{verbatim}}
\noindent In addition to being accessible to new programmers and scientists,
Python is powerful enough to manage the complexity of large applications,
supporting functional programming, object orienting programming, generic
programming and metaprogramming. That Python supports these paradigms
suggests why it is also a {}``great last language'': as one increases
their programming sophistication, the language scales naturally. By
contrast, commercial languages like Matlab and IDL, which also support
a simple syntax for simple programs do not scale well to complex programming
tasks.
The built-in Python data-types and standard library provide a powerful
platform in every distribution \cite{PyLibRef,Lundh2001}. The standard
data types encompass regular and arbitrary length integers, complex
numbers, floating point numbers, strings, lists, associative arrays,
sets and more. In the standard library included with every Python
distribution are modules for regular expressions, data encodings,
multimedia formats, math, networking protocols, binary arrays and
files, and much more. Thus one can open a file on a remote web server
and work with it as easily as with a local file \begin{verbatim}
# this 3 line script downloads and prints the yahoo web site
from urllib import urlopen
for line in urlopen('http://yahoo.com').readlines():
print line
\end{verbatim}
Complementing these built-in features, Python is also readily extensible,
giving it a wealth of libraries for scientific computing that have
been in development for many years \cite{Dubois1996b,Dubois1996c}.
\texttt{NumPy} supports large array manipulations, math,
optimized linear algebra, efficient Fourier transforms and random
numbers. \texttt{scipy} is a collection of Python wrappers of high
performance FORTRAN code (eg LAPACK, ODEPACK) for numerical analysis
\cite{LAPACK}. \texttt{IPython} is a command shell ala Mathematica,
Matlab and IDL for interactive programming, data exploration and visualization
with support for command history, completion, debugging and more.
\texttt{Matplotlib} is a 2D graphics package for making publication
quality graphics with a Matlab compatible syntax that is also embeddable
in applications. \texttt{f2py}, \texttt{SWIG}, \texttt{weave}, and
\texttt{pyrex} are tools for rapidly building Python interfaces to
high performance compiled code, \texttt{MayaVi} is a user friendly
graphical user interface for 3D visualizations built on top of the
state-of-the-art Visualization Toolkit \cite{SchroederEtal2002}.
\texttt{pympi}, \texttt{pypar}, \texttt{pyro}, \texttt{scipy.cow},
and \texttt{pyxg} are tools for cluster building and doing parallel,
remote and distributed computations. This is a sampling of general
purpose libraries for scientific computing in Python, and does not
begin to address the many high quality, domain specific libraries
that are also available.
All of the infrastructure described above is open source software
that is freely distributable for academic and commercial use. In both
the educational and scientific arenas, this is a critical point. For
education, this platform provides students with tools that they can
take with them outside the classroom to their homes and jobs and careers
beyond. By contrast, the use commercial tools such as Matlab and IDL
limits access to major institutions. For scientists, the use of open
source tools is consistent with the scientific principle that all
of the steps in an analysis or simulation should be open for review,
and with the principle of reproducible research \cite{BuckheitDonoho1995}.
\section{Mixed Language Programming}
The programming languages of each generation evolve in part to fix
the problems of those that came before \cite{BerginEtal1996}. \textsc{FORTRAN},
the original high level language of scientific computing \cite{Rosen1967},
was designed to allow scientists to express code at a level closer
to the language of the problem domain. \textsc{ALGOL} and its successor
Pascal, widely used in education in the 1970s, were designed to alleviate
some of the perceived problems with \textsc{FORTRAN} and to create
a language with a simpler and more expressive syntax \cite{Backus1963,Naur1963}.
Object oriented programming languages evolved to allow a closer correspondence
between the code and the physical system it models \cite{GoldbergRobson1989},
and C++ provided a relatively high performance object orientated implementation
compatible with the popular C programming language \cite{Stroustrup1994,Stroustrup2000}.
But implementing object orientation efficiently requires programmers
stay close to the machine, managing memory and pointers, and this
created a lot of complexity in programs while limiting portability.
Interpreted languages such as Tcl, Perl, Python, and Java evolved
to manage some of the low-level and platform specific details, making
programs easier to write and maintain, but with a performance penalty
\cite{Ousterhout1998,ArnoldEtal2005}. For many scientists, however,
pure object oriented systems like Java are unfamiliar, and languages
like Matlab and Python provide the safety, portability and ease of
use of an interpreted language without imposing an object oriented
approach to coding \cite{VanRossumDrake2003,HanselmanLittlefield2004}.
The result of these several decades is that there are many platforms
for scientific computing in use today. The number of man hours invested
in numerical methods in \textsc{FORTRAN}, visualization libraries
in C++, bioinformatics toolkits in Perl, object frameworks in Java,
domain specific toolkits in Matlab, etc\dots requires an approach
that integrates this work. Python is the language that provides maximal
integration with other languages, with tools for transparently and
semi-automatically interfacing with \textsc{FORTRAN}, C, C++, Java,
.NET, Matlab, and Mathematica code \cite{Hugunin1997,Beazley1998}.
In our view, the ability to work seamlessly with code from many languages
is the present and the future of scientific computing, and Python
effectively integrates these languages into a single environment.
\section{Getting started}
We'll get started with python by introducing arrays and plotting by
working with a simple ASCII text file \texttt{mydata.dat} of two columns;
the first column contains the times that some measurement was acquired,
and the second column are the sampled voltages at that time. The file
looks like
\begin{lyxcode}
0.0000~0.4911
0.0500~0.5012
0.1000~0.7236
0.1500~1.1756
...~and~so~on
\end{lyxcode}
\noindent While it would be easy enough to process this file by writing
a python function to do it, there is no need to, since the matplotlib
pylab module has a matlab-compatible l\texttt{oad} function for loading
ASCII array data (Figure \ref{fig:load_ascii}). To complete these
exercises, you should have ipython and matplotlib installed, and start
ipython in pylab mode with
\begin{lyxcode}
>~ipython~-pylab
%
\begin{figure}
\begin{centering}
\includegraphics[width=4in]{fig/load_ascii}
\par\end{centering}
\caption{\label{fig:load_ascii}Loading \texttt{ASCII} data and displaying
with \texttt{plot}}
\end{figure}
\end{lyxcode}
\lstinputlisting[caption={Loading an ASCII text file and plotting the columns}]{snippets/load_data.ipy}
It is also easy to load data from binary files. In the example below,
we have some image data in raw binary string format. The image is
256x256 pixels, and each pixel is a 2 byte integer. We read this into
a string using python's \texttt{file} function -- the 'rb' flag says
to open the file in \texttt{read/binary} mode. We can then use the
numpy \texttt{fromstring} method to convert this to an array, passing
the type of the data (\texttt{int16}) as an argument. We reshape the
array by changing the array shape attribute to 256 by 256, and pass
this off to the matplotlib pylab command \texttt{imshow} for plotting.
matplotlib has a number of colormaps, and the default one is jet;
the data are automatically normalized and colormaps producing the
image in Figure \ref{fig:array_hothead}
\lstinputlisting[caption={Loading an binary image data and plotting it in matplotlib}]{snippets/load_binary.ipy}
%
\begin{figure}
\begin{centering}
\includegraphics[width=4in]{fig/hothead}
\par\end{centering}
\caption{\label{fig:array_hothead}Loading binary image data and displaying
with \texttt{imshow}}
\end{figure}
\section[Arrays]{An Introduction to Arrays}
\subsection{Creating arrays}
There are a few different ways to create arrays besides modules that
obtain arrays from data files such
\begin{lyxcode}
>\,{}>\,{}>~x~=~zeros((20,30))
\end{lyxcode}
creates a 20x30 array of zeros (default integer type; details on how
to specify other types will follow). Note that the dimensions ({}``shape''
in numpy parlance) are specified by giving the dimensions as a
comma-separated list within parentheses. The parentheses aren't necessary
for a single dimension. As an aside, the parentheses used this way
are being used to specify a Python tuple; more will be said about
those in a later tutorial. For now you only need to imitate this usage.
Likewise one can create an array of 1's using the \texttt{ones()}
function.
The \texttt{arange()} function can be used to create arrays with sequential
values. E.g.,
\begin{lyxcode}
>\,{}>\,{}>~arange(10)
array({[}0,~1,~2,~3,~4,~5,~6,~7,~8,~9])
\end{lyxcode}
Note that that the array defaults to starting with a 0 value and does
not include the value specified (though the array does have a length
that corresponds to the argument)
Other variants:
\begin{lyxcode}
>\,{}>\,{}>~arange(10.)
array({[}~0.,~1.,~2.,~3.,~4.,~5.,~6.,~7.,~8.,~9])
>\,{}>\,{}>~arange(3,10)
array({[}3,~4,~5,~6,~7,~8,~9])
>\,{}>\,{}>~arange(1.,~10.,~1.1)~\#~note~trickiness
array({[}1.~,~2.1,~3.2,~4.3,~5.4,~6.5,~7.6,~8.7,~9.8])
\end{lyxcode}
Finally, one can create arrays from literal arguments:
\begin{lyxcode}
>\,{}>\,{}>~print~array({[}3,1,7])
{[}3~1~7]
>\,{}>\,{}>~print~array({[}{[}2,3],{[}4,4]])
{[}{[}2~3]
~{[}4~4]]
\end{lyxcode}
The brackets, like the parentheses in the zeros example above have
a special meaning in Python which will be covered later (Python lists).
For now, just mimic the syntax used here.
\subsection{Array numeric types}
numpy supports all standard numeric types. The default integer
matches what Python uses for integers, usually 32 bit integers or
what numpy calls \texttt{int32}. The same is true for floats, i.e.,
generally 64-bit doubles called \texttt{float64} in numpy. The
default complex type is \texttt{complex64}. Many of the functions
accept a type argument. For example
\begin{lyxcode}
>\,{}>\,{}>~zeros(3,~int8)~\#~Signed~byte
>\,{}>\,{}>~zeros(3,~dtype=uint8)~\#~Unsigned~byte
>\,{}>\,{}>~array({[}2,3],~dtype=float32)
>\,{}>\,{}>~arange(4,~dtype=complex64)
\end{lyxcode}
The possible types are \texttt{int8, uint8, int16, uint16, int32,
uint32, int64, uint64, float32, float64, complex32, complex64.} To
find out the type of an array use the .dtype() method. E.g.,
\begin{lyxcode}
>\,{}>\,{}>~arr.dtype()
dtype('float32')
\end{lyxcode}
To convert an array to a different type use the \texttt{astype()}
method, e.g,
\begin{lyxcode}
>\,{}>\,{}>~a~=~arr.astype(float64)
\end{lyxcode}
\subsection{Printing arrays}
Interactively, there are two common ways to see the value of an array.
Like many Python objects, just typing the name of the variable itself
will print its contents (this only works in interactive mode). You
can also explicitly print it. The following illustrates both approaches:
\begin{lyxcode}
>\,{}>\,{}>~x~=~arange(10)
>\,{}>\,{}>~x
array({[}0,~1,~2,~3,~4,~5,~6,~7,~8~9])
>\,{}>\,{}>~print~x
{[}0~1~2~3~4~5~6~7~8~9]
\end{lyxcode}
By default the array module limits the amount of an array that is
printed out (to spare you the effects of printing out millions of
values). For example:
\begin{lyxcode}
>\,{}>\,{}>~x~=~arange(1000000)
print~x
{[}~~~~0~~~~~1~~~~~2~...,~999997~999998~999999]
\end{lyxcode}
\subsection{Indexing 1-D arrays}
As with IDL and Matlab, there are many options for indexing arrays.
\begin{lyxcode}
>\,{}>\,{}>~x~=~arange(10)
>\,{}>\,{}>~x
array({[}0,~1,~2,~3,~4,~5,~6,~7,~8,~9])
\end{lyxcode}
Simple indexing:
\begin{lyxcode}
>\,{}>\,{}>~x{[}2]~\#~3rd~element
2
\end{lyxcode}
Indexing is 0-based. The first value in the array is \texttt{x{[}0]}
Indexing from end:
\begin{lyxcode}
>\,{}>\,{}>~x{[}-2]~\#~-1~represents~the~last~element,~-2~next~to~last...
8
\end{lyxcode}
Slices
To select a subset of an array:
\begin{lyxcode}
>\,{}>\,{}>~x{[}2:5]
array({[}2,~3,~4])
\end{lyxcode}
Note that the upper limit of the slice is not included as part of
the subset! This is viewed as unexpected by newcomers and a defect.
Most find this behavior very useful after getting used to it (the
reasons won't be given here). Also important to understand is that
slices are views into the original array in the same sense that references
view the same array. The following demonstrates:
\begin{lyxcode}
>\,{}>\,{}>~y~=~x{[}2:5]
>\,{}>\,{}>~y{[}0]~=~99
>\,{}>\,{}>~y
array({[}99,~3,~4])
>\,{}>\,{}>~x
array({[}0,~1,~99,~3,~4,~5,~6,~7,~8,~9])
\end{lyxcode}
Changes to a slice will show up in the original. If a copy is needed
use \texttt{x{[}2:5].copy()}
Short hand notation
\begin{lyxcode}
>\,{}>\,{}>~x{[}:5]~\#~presumes~start~from~beginning
array({[}~0,~1,~99,~3,~4])
>\,{}>\,{}>~x{[}2:]~\#~presumes~goes~until~end
array({[}99,~3,~4,~5,~6,~7,~8,~9])
>\,{}>\,{}>~x{[}:]~\#~selects~whole~dimension
array({[}0,~1,~99,~3,~4,~5,~6,~7,~8,~9])
\end{lyxcode}
Strides:
\begin{lyxcode}
>\,{}>\,{}>~x{[}2:8:3]~\#~Stride~every~third~element
array({[}99,~5])
\end{lyxcode}
Index arrays:
\begin{lyxcode}
>\,{}>\,{}>~x{[}{[}4,2,4,1]]
array({[}4,~99,~4,~1])
\end{lyxcode}
Using results of logical indexing
\begin{lyxcode}
>\,{}>\,{}>~x~>~5
array({[}0,0,1,0,0,0,1,1,1,1],~type=Bool)
>\,{}>\,{}>~x{[}x>5]
array({[}99,~6,~7,~8,~9])
\end{lyxcode}
\subsection{Indexing multidimensional arrays}
Before describing this in detail it is very important to note an item
regarding multidimensional indexing that will certainly cause you
grief until you become accustomed to it: ARRAY INDICES USE THE OPPOSITE
CONVENTION AS FORTRAN REGARDING ORDER OF INDICES FOR MULTIDIMENSIONAL
ARRAYS.
\begin{lyxcode}
>\,{}>\,{}>~im~=~arange(24)
>\,{}>\,{}>~im.shape~=~4,6
>\,{}>\,{}>~im
array({[}{[}~0,~~1,~~2,~~3,~~4,~~5],
~~~~~~~{[}~6,~~7,~~8,~~9,~10,~11],
~~~~~~~{[}12,~13,~14,~15,~16,~17],
~~~~~~~{[}18,~19,~20,~21,~22,~23]])
\end{lyxcode}
To emphasize the point made in the previous paragraph, the index that
represents the most rapidly varying dimension in memory is the 2nd
index, not the first.
Partial indexing:
\begin{lyxcode}
>\,{}>\,{}>~im{[}1]
array({[}6,~7,~8,~9,~10,~11])
\end{lyxcode}
If only some of the indices for a multidimensional array are specified,
then the result is an array with the shape of the {}``leftover''
dimensions, in this case, 1-dimensional. The 2nd row is selected,
and since there is no index for the column, the whole row is selected.
All of the indexing tools available for 1-D arrays apply to \emph{n}-dimensional
arrays as well (though combining index arrays with slices is not currently
permitted). To understand all the indexing options in their full detail,
read sections 4.6, 4.7 and 6 of the numpy manual.
\subsection{Compatibility of dimensions}
In operations involving combining (e.g., adding) arrays or assigning
them there are rules regarding the compatibility of the dimensions
involved. For example the following is permitted:
\begin{lyxcode}
>\,{}>\,{}>~x{[}:5]~=~0
\end{lyxcode}
since a single value is considered {}``broadcastable'' over a 5
element array. But this is not permitted:
\begin{lyxcode}
>\,{}>\,{}>~x{[}:5]~=~array({[}0,1,2,3])~
\end{lyxcode}
since a 4 element array does not match a 5 element array.
\emph{The following explanation can probably be skipped by most on
the first reading;} it is only important to know that rules for combining
arrays of different shapes are quite general. It is hard to precisely
specify the rules without getting a bit confusing, but it doesn't
take long to get a good intuitive feeling for what is and isn't permitted.
Here's an attempt anyway: The shapes of the two involved arrays when
aligned on their trailing part must be equal in value or one must
have the value one for that dimension. The following pairs of shapes
are compatible:
\begin{lyxcode}
(5,4):(4,)~->~(5,4)
(5,1):(4,)~->~(5,4)
(15,3,5):(15,1,5)~->~(15,3,5)
(15,3,5):(3,5)~->~(15,3,5)
(15,1,5):(3,1)~->~(15,3,5)
\end{lyxcode}
so that one can add arrays of these shapes or assign one to the other
(in which case the one being assigned must be the smaller shape of
the two). For the dimensions that have a 1 value that are matched
against a larger number, the values in that dimension are simply repeated.
For dimensions that are missing, the sub-array is simply repeated
for those. The following shapes are not compatible:
\begin{lyxcode}
(3,4):(4,3)
(1,3):(4,)
\end{lyxcode}
Examples:
\begin{lyxcode}
>\,{}>\,{}>~x~=~zeros((5,4))
>\,{}>\,{}>~x{[}:,:]~=~{[}2,3,2,3]
>\,{}>\,{}>~x
array({[}{[}2,~3,~2,~3],
~~~~~~~{[}2,~3,~2,~3],
~~~~~~~{[}2,~3,~2,~3],
~~~~~~~{[}2,~3,~2,~3],
~~~~~~~{[}2,~3,~2,~3]])
>\,{}>\,{}>~a~=~arange(3)
>\,{}>\,{}>~b~=~a{[}:]~\#~different~array,~same~data~(huh?)
>\,{}>\,{}>~b.shape~=~(3,1)
>\,{}>\,{}>~b
array({[}{[}0],
~~~~~~~{[}1],
~~~~~~~{[}2]])
>\,{}>\,{}>~a{*}b~\#~outer~product
array({[}{[}0,~0,~0],
~~~~~~~{[}0,~1,~2],
~~~~~~~{[}0,~2,~4]])
\end{lyxcode}
\subsection{ufuncs}
A ufunc (short for Universal Function) applies the same operation
or function to all the elements of an array independently. When two
arrays are added together, the \texttt{add} ufunc is used to perform
the array addition. There are ufuncs for all the common operations
and mathematical functions. More specialized ufuncs can be obtained
from add-on libraries. All the operators have corresponding ufuncs
that can be used by name (e.g., \texttt{add} for \texttt{+}). These
are all listed in table below. Ufuncs also have a few very handy methods
for binary operators and functions whose use are demonstrated here.
\begin{lyxcode}
>\,{}>\,{}>~x~=~arange(9)
>\,{}>\,{}>~x.shape~=~(3,3)
>\,{}>\,{}>~x
array({[}0,~1,~2],
~~~~~~{[}3,~4,~5],
~~~~~~{[}6,~7,~8]])
>\,{}>\,{}>~add.reduce(x)~\#~sums~along~the~first~index
array({[}9,~12,~15])
>\,{}>\,{}>~add.reduce(x,~axis=1)~\#~sums~along~the~2nd~index
array({[}3,~12,~21])
>\,{}>\,{}>~add.accumulate(x)~\#~cumulative~sum~along~the~first~index
array({[}{[}0,~~1,~~2],
~~~~~~~{[}3,~~5,~~7],
~~~~~~~{[}9,~12,~15]])
>\,{}>\,{}>~multiply.outer(arange(3),arange(3))
array({[}{[}0,~0,~0],
~~~~~~~{[}0,~1,~2],
~~~~~~~{[}0,~2,~4]])
\end{lyxcode}
Standard Ufuncs (with corresponding symbolic operators, when they
exist, shown in parentheses)
\begin{tabular}{lll}
add (+) & log & greater (>)\tabularnewline
subtract (-) & log10 & greater\_equal (>=)\tabularnewline
multiply ({*}) & cos & less (<)\tabularnewline
divide (/) & arcos & less\_equal (<=)\tabularnewline
remainder (\%) & sin & logical\_and\tabularnewline
absolute, abs & arcsin & logical\_or\tabularnewline
floor & tan & logical\_xor\tabularnewline
ceil & arctan & bitwise\_and (\&)\tabularnewline
fmod & cosh & bitwise\_or (|)\tabularnewline
conjugate & sinh & bitwise\_xor (\textasciicircum{})\tabularnewline
minimum & tanh & bitwise\_not (\textasciitilde{})\tabularnewline
maximum & sqrt & rshift (>\,{}>)\tabularnewline
power ({*}{*}) & equal (==) & lshift (<\,{}<)\tabularnewline
exp & not\_equal (!=) & \tabularnewline
& & \tabularnewline
\end{tabular}
\emph{Note that there are no corresponding Python operators for} \texttt{logical\_and}
\emph{and} \texttt{logical\_or}\emph{. The Python} \texttt{and} \emph{and}
\texttt{or} \emph{operators are NOT equivalent to these respective
ufuncs!}
\subsection{Array functions}
There are many array utility functions. The following lists the more
useful ones with a one line description. See the numpy manual for
details on how they are used. Arguments shown with argument=value
indicate what the default value is if called without a value for that
argument.
\begin{description}
\item [{\texttt{all}\textmd{(}\textmd{\emph{a}}\textmd{):}}] are all elements
of array nonzero
\item [{\texttt{allclose}\textmd{(}\textmd{\emph{a1,~a2,~rtol=1.e-5,~atol=1.e-8}}\textmd{):}}] true
if all elements within specified amount (between two arrays)
\begin{spacing}{0.50}
\item [{\texttt{alltrue}\textmd{(}\textmd{\emph{a,~axis=0}}\textmd{):}}] are
all elements nonzero along specified axis true.\end{spacing}
\item [{\texttt{any}\textmd{(}\textmd{\emph{a}}\textmd{):}}] are any elements
of an array nonzero
\begin{spacing}{0.50}
\item [{\texttt{argmax}\textmd{(}\textmd{\emph{a,~axis=-1}}\textmd{),~argmin(}\textmd{\emph{a,axis=-1}}\textmd{):}}] return
array with min/max locations for selected axis
\item [{\texttt{argsort}\textmd{(}\textmd{\emph{a,~axis=-1}}\textmd{):}}] returns
indices of results of sort on an array
\item [{\texttt{choose}\textmd{(}\textmd{\emph{selector,~population,~clipmode=CLIP}}\textmd{):}}] fills
specified array by selecting corresponding values from a set of arrays
using integer selection array (population is a tuple of arrays; see
tutorial 2)
\item [{\texttt{clip}\textmd{(}\textmd{\emph{a,~amin,~amax}}\textmd{):}}] clip
values of array \emph{a} at values \emph{amin}, \emph{amax}
\item [{\texttt{dot}\textmd{(}\textmd{\emph{a1,~a2}}\textmd{):}}] dot
product of arrays \texttt{\emph{a1}} \& \texttt{\emph{a2}}
\item [{\texttt{compress}\textmd{(}\textmd{\emph{condition,~a~,axis=0}}\textmd{):}}] selects
elements from array \emph{a} based on boolean arraycondition
\item [{\texttt{concatenate}\textmd{(}\textmd{\emph{arrays,~axis=0}}\textmd{):}}] concatenate
arrays contained in sequence of arrays arrays
\item [{\texttt{cumproduct}\textmd{(}\textmd{\emph{a,~axis=0}}\textmd{):}}] net
cumulative product along specified axis
\item [{\texttt{cumsum}\textmd{(}\textmd{\emph{a,~axis=0}}\textmd{):}}] accumulate
array along specified axis
\item [{\texttt{diagonal}\textmd{(}\textmd{\emph{a,~offset=0,~axis1=0,~axis2=1}}\textmd{):}}] returns
diagonal of 2-d matrix with optional offsets. \end{spacing}
\item [{\texttt{fromfile}\textmd{(}\textmd{\emph{file,~type,~shape=None}}\textmd{):}}] Use
binary data in file to form new array of specified type.
\item [{\texttt{fromstring}\textmd{(}\textmd{\emph{datastring,~type,~shape=None}}\textmd{):}}] Use
binary data in \emph{datastring} to form new array of specified shape
and type
\item [{\texttt{identity}\textmd{(}\textmd{\emph{n,~type=None}}\textmd{):}}] returns
identity matrix of size nxn.
\begin{spacing}{0.50}
\item [{\texttt{indices}\textmd{(}\textmd{\emph{shape,~type=None}}\textmd{):}}] generate
array with values corresponding to position of selected index of the
array
\item [{\texttt{innerproduct}\textmd{(}\textmd{\emph{a1,~a2}}\textmd{):}}] guess
\item [{\texttt{matrixmultiply}\textmd{(}\textmd{\emph{a1,~a2}}\textmd{):}}] guess
\item [{\texttt{outerproduct}\textmd{(}\textmd{\emph{a1,~a2}}\textmd{):}}] guess
\item [{\texttt{product}\textmd{(}\textmd{\emph{a,~axis=0}}\textmd{):}}] net
product of elements along specified axis
\item [{\texttt{ravel}\textmd{(}\textmd{\emph{a}}\textmd{):}}] creates
a 1-d version of an array
\item [{\texttt{repeat}\textmd{(}\textmd{\emph{a,~repeats,~axis=0}}\textmd{):}}] generates
new array with repeated copies of input array \emph{a}
\item [{\texttt{resize}\textmd{(}\textmd{\emph{a,~shape}}\textmd{):}}] replicate
or truncate array to new shape
\item [{\texttt{searchsorted}\textmd{(}\textmd{\emph{bin,~a}}\textmd{):}}] return
indices of mapping values of an array \emph{a} into a monotonic array
\emph{bin}
\item [{\texttt{sometrue}\textmd{(}\textmd{\emph{a,~axis=0}}\textmd{):}}] are
any elements along specified axis true
\item [{\texttt{sort}\textmd{(}\textmd{\emph{a,~axis=-1}}\textmd{):}}] sort
array elements along selected axis
\item [{\texttt{sum}\textmd{(}\textmd{\emph{a,~axis=0}}\textmd{):}}] sum
array along specified axis
\item [{\texttt{swapaxes}\textmd{(}\textmd{\emph{a,~axis1,~axis2}}\textmd{):}}] switch
indices for axis of array (doesn't actually move data, just maps indices
differently)
\item [{\texttt{trace}\textmd{(}\textmd{\emph{a,~offset=0,~axis1=0,~axis2=1}}\textmd{):}}] compute
trace of matrix \emph{a} with optional offset.
\item [{\texttt{transpose}\textmd{(}\textmd{\emph{a,~axes=None}}\textmd{):}}] transpose
indices of array (doesn't actually move data, just maps indices differently)\end{spacing}
\item [{\texttt{where}\textmd{(}\textmd{\emph{a}}\textmd{):}}] find {}``true''
locations in array \emph{a}
\end{description}
\subsection{Array methods}
\begin{singlespace}
Arrays have several methods. They are used as methods are with any
object. For example (using the array from the previous example):
\end{singlespace}
\begin{lyxcode}
\begin{singlespace}
>\,{}>\,{}>~\#~sum~all~array~elements
>\,{}>\,{}>~x.sum()~\#~the~L~indicates~a~Python~Long~integer
36L~\end{singlespace}
\end{lyxcode}
\begin{singlespace}
The following lists all the array methods that exist for an array
object \texttt{a} (a number are equivalent to array functions; these
have no summary description shown):
\end{singlespace}
\begin{description}
\begin{spacing}{0.50}
\item [{\texttt{\emph{a}}\texttt{.argmax}\textmd{(}\textmd{\emph{axis=-1}})}]~
\item [{\texttt{\emph{a}}\texttt{.argmin}\textmd{(}\textmd{\emph{axis=-1}})}]~
\item [{\texttt{\emph{a}}\texttt{.argsort}\textmd{(}\textmd{\emph{axis=-1}})}]~
\item [{\texttt{\emph{a}}\texttt{.astype}\textmd{(}\textmd{\emph{type}}\textmd{):}}] copy
array to specified numeric type
\item [{\texttt{\emph{a}}\texttt{.byteswap}\textmd{():}}] perform byteswap
on data in place
\item [{\texttt{\emph{a}}\texttt{.byteswapped}\textmd{():}}] return byteswapped
copy of array
\item [{\texttt{\emph{a}}\texttt{.conjugate}\textmd{():}}] complex conjugate
\item [{\texttt{\emph{a}}\texttt{.copy}\textmd{():}}] produce copied version
of array (instead of view)
\item [{\texttt{\emph{a}}\texttt{.diagonal}\textmd{()}}]~
\item [{\texttt{\emph{a}}\texttt{.info}\textmd{():}}] print info about
array
\item [{\texttt{\emph{a}}\texttt{.isaligned}\textmd{():}}] are data elements
guaranteed aligned with memory?
\item [{\texttt{\emph{a}}\texttt{.isbyteswapped}\textmd{():}}] are data
elements in native processor order?
\item [{\texttt{\emph{a}}\texttt{.iscontiguous}\textmd{():}}] are data
elements contiguous in memory?
\item [{\texttt{\emph{a}}\texttt{.is\_c\_array}\textmd{():}}] are data
elements aligned, not byteswapped, and contiguous?
\item [{\texttt{\emph{a}}\texttt{.is\_fortran\_contiguous}\textmd{():}}] are
indicies defined to follow Fortran conventions?
\item [{\texttt{\emph{a}}\texttt{.is\_f\_array}\textmd{():}}] are indices
defined to follow Fortran conventions and data are aligned and not
byteswapped
\item [{\texttt{\emph{a}}\texttt{.itemsize}\textmd{():}}] size of data
element in bytes
\item [{\texttt{\emph{a}}\texttt{.max}\textmd{(type=None):}}] maximum value
in array
\item [{\texttt{\emph{a}}\texttt{.min}\textmd{():}}] minimum value in array
\item [{\texttt{\emph{a}}\texttt{.nelements}\textmd{():}}] total number
of elements in array
\item [{\texttt{\emph{a}}\texttt{.new}\textmd{():}}] returns new array
of same type and size (data uninitialized)
\item [{\texttt{\emph{a}}\texttt{.repeat}\textmd{(a,repeats,axis=0):}}]~
\item [{\texttt{\emph{a}}\texttt{.resize}\textmd{(shape):}}]~
\item [{\texttt{\emph{a}}\texttt{.size}\textmd{():}}] same as nelements
\item [{\texttt{\emph{a}}\texttt{.dtype}\textmd{():}}] returns type of array
\item [{\texttt{\emph{a}}\texttt{.tofile}\textmd{(}\textmd{\emph{file}}\textmd{):}}] write
binary data to file
\item [{\texttt{\emph{a}}\texttt{.tolist}\textmd{():}}] convert data to
Python list format
\item [{\texttt{\emph{a}}\texttt{.tostring}\textmd{():}}] copy binary data
to Python string
\item [{\texttt{\emph{a}}\texttt{.transpose}\textmd{(}\textmd{\emph{axes=-1}}\textmd{):}}] transpose
array
\item [{\texttt{\emph{a}}\texttt{.stddev}\textmd{():}}] standard deviation
\item [{\texttt{\emph{a}}\texttt{.sum}\textmd{():}}] sum of all elements
\item [{\texttt{\emph{a}}\texttt{.swapaxes}\textmd{(}\textmd{\emph{axis1,axis2}})}]~
\item [{\texttt{\emph{a}}\texttt{.togglebyteorder}\textmd{():}}] change
byteorder flag without changing actual data byteorder
\item [{\texttt{\emph{a}}\texttt{.trace}\textmd{()}}]~
\item [{\texttt{\emph{a}}\texttt{.view}\textmd{():}}] returns new array
object using view of same data\end{spacing}
\end{description}
\subsection{Array attributes:}
\begin{description}
\begin{spacing}{0.50}
\item [{\texttt{a.shape:}}] returns shape of array
\item [{\texttt{a.flat:}}] returns view of array treating it as 1-dimensional.
Doesn't work if array is not contiguous
\item [{\texttt{a.real:}}] return real component of array (exists for all
types)
\item [{\texttt{a.imag,~a.imaginary:}}] return imaginary component (exists
only for complex types)\end{spacing}
\end{description}
\section{Exercises}
\begin{xca}
Load the binary image shown in Figure\ref{fig:array_hothead}. What
is the mean pixel value, what are the standard deviation of pixel
values? Sum over the rows and make a bar plot for the summated intensity
across rows. Do the same for columns. Make a histogram of all the
data in the image. (Hint -- see n\texttt{x.mlab.mean}, \texttt{nx.mlab.std},
\texttt{pylab.bar} and \texttt{pylab.hist)}
\end{xca}
\begin{example}
this is another test
\end{example}
this is a test
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.