1 - Introduction To Python
1 - Introduction To Python
Introduction to
Python
© ADVANCED RESOURCES AND RISK TECHNOLOGY This document can only be distributed within UFRGS
Motivation for
o The modularity of Python has led to developments in all scientific computation domains, including
machine learning
1 – Python Set-up
Python installation, trying the interpreter, setting up a text editor for Python
2 – Python syntax elements
Variables, operators, controls, conditions, classes, objects
3 – Non-numerical examples
String manipulations, file I/O and data management
4 – Numerical examples
Numpy and SciPy modules: using Python for scientific purposes
5 – Geostatistics with Python
Python in AR2Gems: geostatistics with Python, and more
6 - Resources and references
1. Open a DOS terminal: in the Windows Start menu, look for cmd in the Search program and files bar
2. In the prompt, type “python” + enter to fire up the Python interpreter. Notice the Python version
information.
3. Three >>> will appear. All commands should now be in written in Python syntax.
Python is based on the vision that code should be easy to write and read
o One (preferably obvious) way to do things
o No optimization at the cost of clarity
o Design a small core of functions and make it easy for the community to create extensions
o Free, open-source Notepad++ is our preferred free text editor: tabs, syntax highlights, auto-indent…
o Do not use Notepad or Wordpad: problems with line breaks, indentation, automatic case, no highlights
o New Document > Format (Line Ending) : Unix/OSX (more portable than Windows EOL)
o Tab Settings: [Default] Tab size : 2, check “replace by space” (no tabulation)
o Font: prefer monospaced (=fixed-width) fonts for readability : all characters and spaces have the the same
width: Courier New (our default), Lucinda Console, Consolas, Inconsolata, … (avoid fonts that
are optimized for text/headings like Arial, Times… and be wary of 1, l, i, 0, o and O looking alike).
o Syntax highlighting : highlight colors for keywords, comments and operators should be used
o Color scheme: Settings > Style Configurator (vim dark blue is easy on the eyes)
http://notepad-plus-plus.org/
http://hivelogic.com/articles/top-10-programming-fonts/
A useful action
Assigned to CTRL + R
o Official Python style guideline (PEP8): https://www.python.org/dev/peps/pep-0008 - exhaustive and detailed. A good
read on your spare time to pick up good syntax habits right from the start.
o End of a statement = end of line (no need for “;” like in C++ but Python tolerates it)
o Comments begin with a pound/hash sign # and run until the end of the line
o Indentation matters: blocks that go together must have the same number of leading whitespaces
o PEP8 recommends indentation with 4 spaces, Google uses 2 spaces, AR2Tech likes 2 spaces also (clear enough)
o Do not indent with tabs, tabs are not consistently ported across systems and text editors
o Maximum suggested code line length is 72 characters (comments or docstrings must not exceed 72 characters)
o Implied line continuation inside parentheses, brackets, braces: indent rest of expression such that it aligns with ( or [ or {
o Use backslashes at end of line (to be avoided whenever possible, for readability)
import sys
def word_count_dict(filename):
"""Returns a word/count dict for this filename."""
# Utility used by count() and Topcount().
word_count = {} # Map each word to its count
input_file = open(filename, 'r')
for line in input_file:
words = line.split()
for word in words:
word = word.lower()
# Special case if we're seeing this word for the first time.
if not word in word_count:
word_count[word] = 1
else:
word_count[word] = word_count[word] + 1
input_file.close() # Not strictly required, but good form.
return word_count
Things to notice
indentation, comments, docstrings, the function definition syntax, the keywords import, for, if, else, in,
return, the use of : , the .dosomethingonprefix() syntax, and a usage of fairly complex objects (such as
maps here: every word is mapped to an integer, reflecting its count in filename)
o There are no variable declaration and no variable initialization in Python (dynamic language, all checks at run-time)
o Just assign identifiers directly using the equal sign = (this means that, regardless of what it meant before)
o Python identifiers are case-sensitive : A and a are two different things
o The scope of identifiers is the interpreter session, in the session no names are private (caution)
o Because types are not specified, use identifier names judiciously (use plurals for lists of things, for instance)
o “Throwaway” by convention are 2 underscores: __ (when a function returns values you do not need)
assign
Indexing starts at 0 and goes until n-1, same slice [ : ], + and * operators as strings.
o Tuples : tuples are read-only lists : no element can be altered, removed or appended. Assigned with ( , , , ), same slice
operator [ : ], + and *. Very Pythonic!
This syntax is not allowed with tuples: mytuple[4] = 1000 because it alters one element. It would be allowed in lists.
o Dictionaries : key-value pairs, assigned with { } braces. Elements can be accessed and added using [ ] square brackets
dict[key] = value
Any type of variable can be used as a key or value (although keys are generally numbers or strings).
http://www.tutorialspoint.com/python/python_variable_types.htm
Lists are containers just like C++ arrays but can contain different types of objects
x = [12,3,4,"test"]
print "Length of the list :",len(x)
print "The element at position 0 is ",x[0]
print "The element at position 3 is ",x[3]
The square brackets [ ] are used to access an element in the list using an index value (integer)
The command range(n) generates a list of integers from 0 to n-1 :
print range(10)
x = range(10)
print "The element at position 5 is ",x[5]
Very useful!
Allows users to get the value of
the last element without needing
to know how many elements the
list contains.
Python’s for statement iterates over the items of any sequence (a list or a string) in the
order that they appear in the sequence (rather than iterating with a step and halting
condition like in C):
words = ['we', 'are', 'ar2tech']
RESULT:
we 2
for w in words:
are 3
print w, len(w) ar2tech 7
sq = []
FOR
prime = [2,3,5,7,11,13,17,19]
FOR + IF
print 'The value you entered is...‘
o Indentation delimits the scope of statements
if n<0:
print 'negative! so we changed it to 0.‘
o Use colon (:) after the logical test, it is common to forget it
n=0
elif n==0:
o Use elif, not elseif (think elif and else as 4-letter words).
print 'zero‘
o No { } are used to delimit blocks, only indentation
elif n<=4:
print 'lesser than or equal to four‘
o No ( ) are used in the condition tests
else:
print 'Larger than four‘
o Boolean are spelled out completely : and or not
print 'Thanks for checking.'
o Evaluate long conditions outside of the if-line
o Any of the following evaluate to False: 0, None, empty (string, list, dictionary), and of course False.
o Hint: empty means false, so do not test len(str)==0 instead, just check if str.
o Never compare with True or False (bad form), just use if var.
o Useful:
o break : get out of the current if or for
o pass: do nothing (useful if the syntax requires something to be present)
o continue: stop current iteration here and start the next iteration of the loop
The docstring is used to generate the text seen when typing help(..) on a function or module:
help(module) prints the docstring located at the header of the module.py file
class IndicatorTransform :
'''Indicator transform for Multidimensional array''' o Python is an object-oriented
def __init__(self,threshold) :
'''input the list of thresholds''' programming language
self.tresh = threshold
Python Modules
A file of Python code is called a module. The file “alex.py" is also known as the module “alex".
A module contains variable definitions like, "x = 6" and "def foo()".
Suppose the file “alex.py" contains a "def foo()". The fully qualified name of that foo function is “alex.foo".
Various Python modules can name their functions and variables whatever they want, and the variable names will not conflict --
module1.foo is different from module2.foo.
We have the standard "sys" module that contains some standard system facilities, like the argv list, and exit() function. With the
statement "import sys" you can access the definitions in the sys module and makes them available by their fully-qualified name, e.g.
sys.exit().
import sys
# Now can refer to sys.xxx facilities
sys.exit(0)
There is another import form that looks like this: "from sys import argv, exit". That makes argv and exit() available by their
short names (no prefix). However, we recommend the original form with the fully-qualified names because it makes it easier to
determine where a function or identifier came from.
There are many modules and packages which are bundled with a standard installation of the Python interpreter, so you don't have do
anything extra to use them. These are collectively known as the "Python Standard Library." Commonly used modules/packages include:
You can find the documentation of all the Standard Library modules and packages at http://docs.python.org/library.
3. When no ambiguity exists, it is possible to import all the module without namespace using wildcard *
from math import * # imports the entire module, all objects (like 1)
print sin(pi), log(e) # no namespace necessary but beware of name conflicts
o To scroll through the help reader line by line : use <space> or <enter>,
o To skip lines, use s or p and specify the number of screens or positions (lines) to scroll down
by,
o To quit the help and return to the command, use q or f.
Also useful is the interactive help. From anywhere in the Python command, type help() and
follow the instructions. Just hit <enter> in a empty help> line to return to the Python command.
Each module is only imported once per interpreter session. Therefore, if you change your
modules, you must restart the interpreter – or, if it’s just one module you want to test
interactively, use reload(modulename).
o To write:
import csv
with open('eggs.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=' ',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
NB: Also used for serialization is the simplejson module, with more custom options (thus
less portability):
https://github.com/simplejson/simplejson
Use the subprocess module (preferred over older modules such as os.system)
subprocess.call(args, shell=True)
Example of system call:
status = subprocess.call("mycmd" + " myarg", shell=True)
Try calling AR2Gems from the Python command line.
https://docs.python.org/2/library/subprocess.html
http://stackoverflow.com/questions/89228/calling-an-external-command-in-python
https://docs.python.org/2/library/subprocess.html#replacing-older-functions-with-the-subprocess-module
These libraries are not mandatory, but are required for more advanced numerical
computing.
import scipy
SciPy is an open source library of scientific and engineering tools for Python. SciPy provides modules for:
o statistics
o optimization
o numerical integration
o linear algebra
o Fourier transforms
o signal processing
o image processing
o genetic algorithms
o ODE solvers
From http://www.scipy.org/more_about_SciPy
import scipy
data = scipy.empty( (100), dtype=float )
print data[0:5]
print ’Size of array: ’,data.shape
A SciPy array must have a fixed number of elements, set when the array is created.
This is different from Python lists that can be extended anytime (no append() in SciPy arrays) .
The shape of an array can be retrieved with the method shape:
data = scipy.ones((100,3), dtype=int
print data.shape
The shape of an array can be changed with the method reshape(...)
data = scipy.arange(50)
data.reshape((2,25))
print data.shape
Note that the total size of the array is unchanged 2*25 = 50.
Submatrices
The : operator allows slicing an array and define a subblock within an array.
Be careful that slicing is not copying, see for instance :
x = scipy.arange(50).reshape((2,25))
y = print data[0,:]
y[0] = -99
print x
import sgems
import sgems
execute(string_command)
Run in the Python interface any command that you would normally run from the command prompt. Use ‘single quotes’ to delimit the
command string. Returns the number of arguments received by the process. This is a very useful and versatile Python command.
get_property(string_gridname,string_property)
Returns a property vector
set_property(string_gridname,string_property,tuple_values)
Change or create a property of a grid
get_dims(string_gridname)
Get ni,nj,nk dimension of a regular grid
get_grid_size(string_gridname)
Get the number of blocks of a grid
set_region(string_gridname,string_region, tuple_values)
Import a region to a grid
get_region(string_gridname,string_region)
Export a region from a grid
set_active_region(string_gridname,string_region)
Select an active region on a grid (NONE unselect region)
nan()
Return the SGeMS value for NaN
get_property_list(string_gridname)
Return the list of property name in a grid
get_location(string_gridname,integer_nodeid)
Return the x,y,z location of a grid based on the nodeid
get_nodeid(string_gridname, float_x, float_y, float_z)
Return the nodeid from a x,y,z location
get_closest_nodeid(string_gridname, float_x, float_y, float_z)
Return the closest nodeid from a x,y,z location
set_categorical_property_int(string_gridname,string_property,tuple_values)
Set a categorical property from a list of integer
set_categorical_property_alpha(string_gridname,string_property,string_catdefinition,tuple_values)
Set a categorical property from a list of alphanumeric entries (string)
get_categorical_definition(string_gridname,string_property)
Get the categorical definition from a categorical property
get_properties_in_group(string_gridname,string_group)
Get the name of the member property for a group
new_point_set(string_pointsetname, tuple_float_x, tuple_float_y, tuple_float_z)
Create a new point set given a set of x,y,z coordinates
The coordinates of all nodes in a grid can be obtained by getting the standard properties _X_, _Y_ or _Z_
https://developers.google.com/edu/python/
http://en.wikipedia.org/wiki/Python_%28programming_language%29
Unofficial Windows Binaries for Python Extension Packages
http://www.lfd.uci.edu/~gohlke/pythonlibs/
Data Analysis:
http://pandas.pydata.org/
Windows
o Do not use Notepad and Wordpad
o Free and Open-Source:
o Notepad++: http://notepad-plus-plus.org/
o Jedit: http://www.jedit.org/
Mac
o TextWrangler: http://www.textwrangler.com/products/textwrangler/ (free)
o Jedit
Unix
o Any text editor works (vi, pico, Kate, Nedit)